Back to blog
An introduction to Hugging Face transformers for NLP

An introduction to Hugging Face transformers for NLP

Ran Romano
May 5, 2022

If you have been paying attention to the latest developments in machine learning (ML) and artificial intelligence (AI) over the last few years, you will already be familiar with Natural Language Processing (NLP), largely in part due to the development of burgeoning transformer models—and we’re not talking about the popular shape-shifting cartoon robots. 

Rather, we are talking about attention-based language models that gained notoriety within the ML community in 2017. In June of that year, the research paper Attention Is All You Need, was published by a group of students from Cornell University. Within the next year, the Transformers blueprint developed in the paper had experienced widespread adoption. 

The proliferation of transformers

Transformers have been the main deep learning models used in NLP for several years. Well-known examples include GPT-3 from OpenAI, the Bidirectional Encoder Representations from Transformers model (BERT) from Google, and XLNet from Carnegie Mellon. 

As transformer models have gotten bigger, better, and much closer to generating text that can pass for human writing, their training datasets have also grown in both size and scope. The original Transformer, for example, was followed by the much bigger TransformerXL, BERT-Base grew from 110 million to 340 million in BERT-Large, and 1.5 billion parameters in GPT-2 to 175 billion parameters in GPT-3. The current king of the hill when it comes to the largest transformer model is Microsoft’s aptly-named Megatron-Turing Natural Language Generation model (MT-NLG) at 530 billion parameters. 

While transformers have been heavily used in NLP, it’s not their sole use case. Transformers have been adapted for use in protein and DNA sequencing (e.g., ProteinBERT), image and video processing with vision transformers, reinforcement learning problems, and many other applications.

Training a huge, state-of-the-art transformer model for NLP comes with a hefty price tag that can stretch into the tens of millions, with huge energy consumption and environmental costs that also need to be considered. We can take advantage of online learning machines to continuously train and update NLP with changing data. This obviously puts a huge roadblock in the way of smaller businesses and start-ups that want to launch their own NLP projects, which has led to the rise of start-ups like Hugging Face which provides shared, pre-trained models. 

What is Hugging Face?

In in just a few short years, with more than 1,219 contributors, 25,800 users, 61,000 stars, and 14,700 forks on GitHub, AI community Hugging Face’s transformers has established itself as the go-to provider for all things NLP. 

Hugging Face is a start-up, AI community, and the self-described “home of machine learning” that was initially founded as a messaging app. 

Now focusing exclusively on transformers, the company provides open-source NLP technologies and thousands of pre-trained models to perform tasks on different modalities such as text, vision, and audio. It also provides courses and datasets and has a large community following. In 2019, it raised US$15 million in venture funding to build a definitive NLP library before raising a further US$40 million in a 2021 Series B funding round. 

Some of the benefits of using the Hugging Face transformers library include:

  • Easy-to-use, state-of-the-art models
  • High-performance natural language understanding and generation
  • High-performance computer vision and audio tasks
  • Much lower computing costs and smaller carbon footprint due to model sharing
  • Choose the proper framework for every part of a model’s lifetime
  • Models are easily customizable and adaptable to different use cases

The Hugging Face ecosystem

Hugging Face has been built around the concept of attention-based transformer models. At the core of its ecosystem, then, is its transformers library which is supported by its datasets and tokenizers libraries. 

Since transformer models don’t understand text sequences in their native form of a string of characters, they must be converted into vectors, matrices, and tensors, and thus a tokenizer is a core component of the Hugging Face transformer ecosystem and its pipelines. 

Hugging Face also comes with the accelerate library. This integrates with existing Hugging Face training flows and generic PyTorch training scripts in order to easily empower distributed training with various hardware acceleration devices like GPUs and TPUs. This means that the same training script can be used on a dedicated training run with multiple GPUs or on a laptop CPU. 

Supporting all of Hugging Face’s libraries is a dedicated community, the Hugging Face Hub, which creates and shares community resources. The Hub adds value to projects with tools for versioning and an API for hosted inference.

Hugging Face transformers in action

Now that we’ve covered what the Hugging Face ecosystem is, let’s look at Hugging Face transformers in action by generating some text using GPT-2. While GPT-2 has been succeeded by GPT-3, GPT-2 is still a powerful model that is well-suited to many applications, including this simple text generation demo.

First, we need to set up a virtual environment and install the transformers and tokenizers library. There are many virtual environment managers for Python, and you should use whatever you are used to working with. In this example, we are going to use virtualenv. 

# command line
virtualenv huggingface_demo –python=python3
source huggingface_demo/bin/activate
pip install torch
pip install git+

Hugging Face libraries make text generation easy. The demonstration that we are showcasing here is based on examples from Hugging Face on the TensorFlow Blog, the Hugging Face Blog, and the Hugging Face Models documentation.

# python
import torch
import transformers
from transformers import GPT2Tokenizer
from transformers import GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

model = GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id = tokenizer.eos_token_id)

input_string = "Yesterday I spent several hours in the library, studying"
input_tokens = tokenizer.encode(input_string, return_tensors = "pt")

output_greedy = model.generate(input_tokens, max_length = 256)

output_string = tokenizer.decode(output_greedy[0], \

print(f"Input sequence: {input_string}")

print(f"Output sequence: {output_string}")
# Output
Input sequence: Yesterday I spent several hours in the library, studying
Output sequence: Yesterday I spent several hours in the library, studying the books, and I was amazed at how much I had learned. I was amazed at how much I had learned. I was amazed at how much I had learned.

While the text at the beginning of the above sequence seems fine, the loop goes on forever and repeats for as long as you allow it to. We can integrate beam search and a penalty for repetition to make the output more sensible, and hopefully solve the repetition problem. 

Beam search works by following several probable text branches along with the most likely token sequences before it settles. By replacing the model.generate line above with: 

output_beam = model.generate(input_tokens, \
max_length = 64, \
num_beams = 32, \
       no_repeat_ngram_size=2, \
       early_stopping = True)

The output becomes:

Output sequence: Yesterday I spent several hours in the library, studying the books, reading the papers, and listening to the music. It was a wonderful experience.

I have to say that I am very happy with the book. I think it is a very good book and I would recommend it to anyone who is interested in learning more about the world of science and technology. If you are looking for a book that will help you to understand what it means to be a scientist, then this book is for you.

In the above example, no_repeat_ngram_size penalizes repetitive sequences and thus the model produces a piece of text that is more along the lines of something that a human might write.

Getting started with Hugging Face transformers

With more than 61,000 stars and 25,000 users, the Hugging Face ecosystem has gained huge traction in the NLP space during the last few years.

 On top of its recent US$40 million Series B funding round, the start-up has also recently acquired Gradio, a platform that enables anyone to demo their ML models through a web-based interface; Hugging Face has huge momentum. 

Although Hugging Face transformers is a great way to get started with NLP and benefit from the collective brainpower of the almost 30,000-strong community, there is one thing that you need to take care of first—your data.

To use your data with Hugging Face transformers, its quality will have to match that of the neat examples. In reality, however, your datasets are unlikely to come nicely packaged up, clean, tidy, and ready to go from the moment you acquire them; you need to do this yourself.

As you will be acutely aware, data scientists and machine learning teams spend a lot of time and effort in tidying up their data and getting it in shape. According to Anaconda’s 2021 State of Data Science survey, respondents said that they spend a whopping 39% of their time on data preparation and data cleansing, which is more than the amount of time spent on model training, selection, and deployment combined. 

During this data preparation process, however, bumps in the road of manual processing such as human error can knock you off course and cause you to mess up your model entirely. While this isn’t something that is guaranteed to happen, the more time you spend processing your data, the higher the likelihood of it happening. Given that you must constantly process new datasets (and in some cases re-process older ones) you should think about establishing a pipeline that can automatically process your data for you.

With Qwak, it is possible to run your data cleaning and other MLOps processes in a dedicated pipeline. You can then train your Hugging Face model in the same process, deploy your model into production and see how the deployed model performs over time.

Qwak is a leading ML platform that enables teams to take their models and transform them into well-engineered products. Our cloud-based platform removes the friction from ML development and deployment while enabling fast iterations, limitless scaling, and customizable infrastructure. 

Want to learn more about the power of Qwak alongside Hugging Face transformers and how it could help you productionize and deploy better and more powerful ML models?

Get in touch for your free demo! 

Related articles