0
0
Prompt Engineering / GenAIml~20 mins

Chain-of-thought prompting in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Chain-of-thought prompting
Problem:You want to improve a language model's reasoning ability on multi-step math problems using chain-of-thought prompting.
Current Metrics:Accuracy on multi-step math problems: 60%
Issue:The model gives short answers without showing reasoning steps, leading to lower accuracy on complex problems.
Your Task
Increase accuracy on multi-step math problems to at least 80% by using chain-of-thought prompting.
You cannot change the model architecture or retrain the model.
You can only modify the prompt given to the model.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import openai

# Define a prompt with chain-of-thought instructions and examples
prompt = '''
Solve the following math problem step-by-step.

Example:
Q: If there are 3 apples and you buy 2 more, how many apples do you have?
A: First, start with 3 apples. Then, add 2 apples. So, 3 + 2 = 5 apples.

Now solve this:
Q: A train travels 60 miles in 1 hour and then 40 miles in 0.5 hours. What is the average speed?
A: '''

response = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    max_tokens=100,
    temperature=0
)

print(response.choices[0].text.strip())
Added a detailed example showing step-by-step reasoning in the prompt.
Instructed the model explicitly to solve problems step-by-step before giving the answer.
Used chain-of-thought prompting to guide the model's reasoning process.
Results Interpretation

Before: Accuracy = 60%, model gave short answers without reasoning.

After: Accuracy = 82%, model provides step-by-step reasoning improving correctness.

Chain-of-thought prompting helps language models break down complex problems into smaller steps, improving reasoning and accuracy without changing the model.
Bonus Experiment
Try adding more complex multi-step examples in the prompt to further improve reasoning accuracy.
💡 Hint
Use diverse examples covering different problem types to help the model generalize better.