Bird
Raised Fist0
Prompt Engineering / GenAIml~15 mins

Token counting and cost estimation in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Token counting and cost estimation
Problem:You want to estimate the cost of using a language model API by counting tokens in input and output texts. Currently, you do not know how to count tokens correctly or estimate the cost based on token usage.
Current Metrics:No token counting or cost estimation implemented yet.
Issue:Without token counting, you cannot predict or control the cost of API usage, which may lead to unexpected high expenses.
Your Task
Implement a token counting function that accurately counts tokens in given texts and estimate the total cost for a language model API call. The goal is to estimate cost within 5% accuracy of actual usage.
Use a simple tokenization method that approximates the model's token counting (e.g., splitting by spaces and punctuation).
Assume a fixed cost per 1000 tokens (e.g., $0.02 per 1000 tokens).
Do not use external tokenization libraries.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import re

def count_tokens(text: str) -> int:
    # Split text by spaces and punctuation to approximate tokens
    tokens = re.findall(r"\w+|[^\s\w]", text)
    return len(tokens)

# Example input and output texts
input_text = "Hello, how are you today?"
output_text = "I am fine, thank you!"

# Count tokens
input_tokens = count_tokens(input_text)
output_tokens = count_tokens(output_text)

total_tokens = input_tokens + output_tokens

# Cost per 1000 tokens
cost_per_1000 = 0.02

# Calculate cost
cost = (total_tokens / 1000) * cost_per_1000

print(f"Input tokens: {input_tokens}")
print(f"Output tokens: {output_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.5f}")
Implemented a token counting function using regex to split text into tokens.
Counted tokens for both input and output texts.
Calculated estimated cost based on total tokens and fixed cost per 1000 tokens.
Results Interpretation

Before: No token counting or cost estimation.

After: Input tokens: 7, Output tokens: 7, Total tokens: 14, Estimated cost: $0.00028

Counting tokens helps estimate API usage cost. Even a simple token counting method can give a close cost estimate, helping control expenses.
Bonus Experiment
Try improving token counting accuracy by handling contractions and special characters better.
💡 Hint
Use more advanced regex patterns or simple rules to split contractions like "don't" into two tokens.

Practice

(1/5)
1. What is a token in the context of AI language models?
easy
A. A hardware component
B. A small piece of text like a word or part of a word
C. A programming language
D. A type of AI model

Solution

  1. Step 1: Understand token meaning

    Tokens are the smallest pieces of text that AI models read, such as words or parts of words.
  2. Step 2: Identify correct definition

    Among the options, only A small piece of text like a word or part of a word correctly describes tokens as small text pieces.
  3. Final Answer:

    A small piece of text like a word or part of a word -> Option B
  4. Quick Check:

    Token = small text piece [OK]
Hint: Tokens are text chunks, not models or hardware [OK]
Common Mistakes:
  • Confusing tokens with AI models
  • Thinking tokens are programming languages
  • Assuming tokens are hardware parts
2. Which of the following Python code snippets correctly counts tokens using a simple split by spaces?
easy
A. tokens = text.split(' ') count = len(tokens)
B. tokens = text.count(' ') count = tokens + 1
C. tokens = len(text) count = tokens
D. tokens = text.split() count = tokens

Solution

  1. Step 1: Understand token counting by splitting

    Splitting text by spaces returns a list of tokens; counting tokens is length of that list.
  2. Step 2: Check each option

    tokens = text.split(' ') count = len(tokens) splits by space and counts tokens correctly. tokens = text.count(' ') count = tokens + 1 counts spaces but needs +1 for tokens. tokens = len(text) count = tokens counts characters, not tokens. tokens = text.split() count = tokens assigns list to count, which is incorrect.
  3. Final Answer:

    tokens = text.split(' ') count = len(tokens) -> Option A
  4. Quick Check:

    Split by space + len() = token count [OK]
Hint: Use split(' ') and len() to count tokens simply [OK]
Common Mistakes:
  • Counting characters instead of tokens
  • Forgetting to add 1 when counting spaces
  • Assigning list directly to count variable
3. Given the text: "Hello world! This is AI." and a token counting method that splits by spaces, what is the token count?
medium
A. 7
B. 6
C. 4
D. 5

Solution

  1. Step 1: Split the text by spaces

    Splitting "Hello world! This is AI." by spaces gives: ['Hello', 'world!', 'This', 'is', 'AI.']
  2. Step 2: Count the tokens

    There are 5 tokens in the list.
  3. Final Answer:

    5 -> Option D
  4. Quick Check:

    5 tokens from splitting by space [OK]
Hint: Count words separated by spaces for quick token count [OK]
Common Mistakes:
  • Counting punctuation as separate tokens
  • Adding extra tokens incorrectly
  • Miscounting spaces
4. You wrote this code to count tokens but it gives an error:
text = "AI is fun"
tokens = text.split
count = len(tokens)

What is the error and how to fix it?
medium
A. Missing parentheses in split method call; fix with text.split()
B. len() cannot be used on list; use count() instead
C. text should be a list, not string
D. split method does not exist for strings

Solution

  1. Step 1: Identify the error in method call

    text.split is a method reference, not a call. It needs parentheses to execute.
  2. Step 2: Fix the code

    Change text.split to text.split() to get the list of tokens, then len() works correctly.
  3. Final Answer:

    Missing parentheses in split method call; fix with text.split() -> Option A
  4. Quick Check:

    Use split() with parentheses to call method [OK]
Hint: Always add () to call string methods like split() [OK]
Common Mistakes:
  • Forgetting parentheses on method calls
  • Using len() on method instead of list
  • Thinking split is not a string method
5. You want to estimate the cost of an AI request. The model charges $0.0001 per token. If your input has 120 tokens and output is expected to be 80 tokens, what is the total estimated cost?
hard
A. $0.012
B. $0.01
C. $0.02
D. $0.008

Solution

  1. Step 1: Calculate total tokens used

    Total tokens = input tokens + output tokens = 120 + 80 = 200 tokens.
  2. Step 2: Multiply total tokens by cost per token

    Cost = 200 tokens * $0.0001 = $0.02.
  3. Step 3: Check options carefully

    $0.02 shows $0.02, but $0.012 shows $0.012 which is incorrect. Recalculate carefully: 200 * 0.0001 = 0.02, so $0.02 is correct.
  4. Final Answer:

    $0.02 -> Option C
  5. Quick Check:

    200 tokens * $0.0001 = $0.02 [OK]
Hint: Add input and output tokens, multiply by cost per token [OK]
Common Mistakes:
  • Multiplying only input tokens by cost
  • Multiplying only output tokens by cost
  • Misreading decimal places in cost