Bird
Raised Fist0
Prompt Engineering / GenAIml~5 mins

Key models overview (GPT, DALL-E, Stable Diffusion) in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is GPT and what is it mainly used for?
GPT (Generative Pre-trained Transformer) is a language model designed to understand and generate human-like text. It is mainly used for tasks like writing, answering questions, and chatting.
Click to reveal answer
beginner
What does DALL-E do?
DALL-E is a model that creates images from text descriptions. You give it words, and it draws pictures that match those words.
Click to reveal answer
intermediate
How does Stable Diffusion generate images?
Stable Diffusion creates images by starting with random noise and gradually improving it to match a text description, like slowly focusing a blurry photo until it becomes clear.
Click to reveal answer
beginner
What is a common feature of GPT, DALL-E, and Stable Diffusion?
All three models generate new content based on input prompts: GPT generates text, DALL-E and Stable Diffusion generate images from text prompts.
Click to reveal answer
beginner
Why are models like GPT, DALL-E, and Stable Diffusion called 'generative'?
They are called generative because they create new content (text or images) rather than just analyzing or classifying existing data.
Click to reveal answer
What type of data does GPT primarily work with?
AText
BImages
CAudio
DVideo
Which model creates images from text descriptions?
ABoth B and D
BDALL-E
CGPT
DStable Diffusion
How does Stable Diffusion start the image creation process?
ABy writing text
BBy recording audio
CBy selecting existing images
DBy starting with random noise
What is the main purpose of generative AI models?
ATo delete data
BTo classify data
CTo generate new content
DTo compress files
Which model would you use to generate a story?
ADALL-E
BGPT
CNone of these
DStable Diffusion
Explain in simple terms how GPT, DALL-E, and Stable Diffusion differ in what they create.
Think about what each model produces as output.
You got /3 concepts.
    Describe why generative models like GPT, DALL-E, and Stable Diffusion are important in AI.
    Consider how these models help people make things.
    You got /3 concepts.

      Practice

      (1/5)
      1. Which model is mainly used to generate human-like text?
      easy
      A. GPT
      B. DALL-E
      C. Stable Diffusion
      D. None of the above

      Solution

      1. Step 1: Understand GPT's purpose

        GPT is designed to generate and understand human-like text.
      2. Step 2: Compare with other models

        DALL-E and Stable Diffusion create images, not text.
      3. Final Answer:

        GPT -> Option A
      4. Quick Check:

        Text generation = GPT [OK]
      Hint: Text output? Think GPT first. [OK]
      Common Mistakes:
      • Confusing DALL-E as text model
      • Thinking Stable Diffusion generates text
      • Choosing 'None of the above'
      2. Which of the following is the correct way to describe DALL-E's function?
      easy
      A. It generates text based on images.
      B. It compresses images for storage.
      C. It creates images from text descriptions.
      D. It translates text from one language to another.

      Solution

      1. Step 1: Identify DALL-E's main function

        DALL-E creates images from text prompts given by users.
      2. Step 2: Eliminate incorrect options

        It does not generate text, translate languages, or compress images.
      3. Final Answer:

        It creates images from text descriptions. -> Option C
      4. Quick Check:

        Text to image = DALL-E [OK]
      Hint: DALL-E = text to image creator. [OK]
      Common Mistakes:
      • Thinking DALL-E generates text
      • Confusing with translation models
      • Assuming it compresses images
      3. Given the following code snippet using a model, what type of output should you expect?
      model = 'Stable Diffusion'
      input_text = 'A sunny beach with palm trees'
      output = model.generate(input_text)
      medium
      A. A photo-realistic image of a sunny beach
      B. A summary of the text input
      C. A written story about a beach
      D. An error because Stable Diffusion cannot generate output

      Solution

      1. Step 1: Identify Stable Diffusion's output type

        Stable Diffusion generates images from text prompts.
      2. Step 2: Match input and output

        Input is a text description; output will be an image matching that description.
      3. Final Answer:

        A photo-realistic image of a sunny beach -> Option A
      4. Quick Check:

        Text input + Stable Diffusion = Image output [OK]
      Hint: Stable Diffusion turns words into pictures. [OK]
      Common Mistakes:
      • Expecting text output
      • Thinking it summarizes text
      • Assuming it causes an error
      4. You tried to use GPT to create an image by running this code:
      model = 'GPT'
      input_text = 'A cat sitting on a sofa'
      output = model.generate_image(input_text)
      What is the main problem here?
      medium
      A. The input text is too short for GPT to understand.
      B. GPT cannot generate images; it only generates text.
      C. The method name should be generate_text, not generate_image.
      D. There is no problem; the code will work fine.

      Solution

      1. Step 1: Understand GPT's capabilities

        GPT is designed to generate text, not images.
      2. Step 2: Analyze the method call

        Calling generate_image on GPT is invalid because GPT lacks image generation ability.
      3. Final Answer:

        GPT cannot generate images; it only generates text. -> Option B
      4. Quick Check:

        GPT = text only, no images [OK]
      Hint: GPT does text, not images. [OK]
      Common Mistakes:
      • Thinking GPT can create images
      • Believing method name is wrong only
      • Ignoring model capability limits
      5. You want to build an app that lets users type a prompt to generate a story and then see an image illustrating it. Which combination of models should you use?
      hard
      A. Use GPT for image generation and DALL-E for text generation.
      B. Use DALL-E to generate the story and GPT to create the image.
      C. Use Stable Diffusion for both story and image generation.
      D. Use GPT to generate the story and Stable Diffusion to create the image.

      Solution

      1. Step 1: Identify model roles for text and image

        GPT is best for generating human-like text stories.
      2. Step 2: Identify model for image creation

        Stable Diffusion creates images from text descriptions, perfect for illustrating stories.
      3. Final Answer:

        Use GPT to generate the story and Stable Diffusion to create the image. -> Option D
      4. Quick Check:

        Text by GPT + Image by Stable Diffusion = App [OK]
      Hint: Text with GPT, images with Stable Diffusion. [OK]
      Common Mistakes:
      • Swapping roles of GPT and DALL-E
      • Using one model for both tasks
      • Confusing image and text generation roles