Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Use Hugging Face for Text Generation in NLP

Use the transformers library from Hugging Face by loading a pre-trained text generation model and tokenizer, then call the generate method on input text. This lets you create new text based on the model's learned patterns easily with just a few lines of code.
📐

Syntax

To generate text with Hugging Face, you typically:

  • Import the pipeline or AutoModelForCausalLM and AutoTokenizer classes.
  • Load a pre-trained model and tokenizer (e.g., GPT-2).
  • Use the generate method or the pipeline for text generation.

This process converts input text into tokens, generates new tokens, and decodes them back to readable text.

python
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

output = generator('Today is a beautiful day,', max_length=30, num_return_sequences=1)
print(output[0]['generated_text'])
💻

Example

This example shows how to generate text starting from a prompt using Hugging Face's pipeline with the GPT-2 model.

python
from transformers import pipeline

def generate_text(prompt: str):
    generator = pipeline('text-generation', model='gpt2')
    results = generator(prompt, max_length=50, num_return_sequences=1)
    return results[0]['generated_text']

if __name__ == '__main__':
    prompt_text = 'Once upon a time'
    generated = generate_text(prompt_text)
    print('Generated Text:')
    print(generated)
Output
Generated Text: Once upon a time, the world was full of magic and wonder. People lived in harmony with nature, and every day brought new adventures and discoveries.
⚠️

Common Pitfalls

Common mistakes when using Hugging Face for text generation include:

  • Not installing the transformers library or missing dependencies.
  • Using a model that is not designed for text generation (e.g., classification models).
  • Setting max_length too low, which cuts off generated text early.
  • Ignoring the need to handle token limits for large inputs.
  • Not specifying num_return_sequences if multiple outputs are desired.

Always check the model documentation and use the pipeline for simpler usage.

python
from transformers import pipeline

# Wrong: Using a classification model for generation
# generator = pipeline('text-generation', model='bert-base-uncased')  # This will cause errors

# Right: Use a causal language model like GPT-2
generator = pipeline('text-generation', model='gpt2')

output = generator('Hello world', max_length=20)
print(output[0]['generated_text'])
Output
Hello world! How are you doing today? I hope everything is going well.
📊

Quick Reference

Key tips for Hugging Face text generation:

  • Use pipeline('text-generation') for easy setup.
  • Choose models like gpt2, distilgpt2, or other causal language models.
  • Adjust max_length to control output length.
  • Use num_return_sequences to get multiple outputs.
  • Install with pip install transformers and optionally pip install torch.

Key Takeaways

Use Hugging Face's transformers library and the text-generation pipeline for easy text generation.
Load a causal language model like GPT-2 for best results in generating coherent text.
Set max_length to control how long the generated text will be.
Avoid using models not designed for generation, like classification models.
Install all dependencies and check model documentation for usage details.