0
0
Prompt Engineering / GenAIml~15 mins

Token counting and cost estimation in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Token counting and cost estimation
Problem:You want to estimate the cost of using a language model API by counting tokens in input and output texts. Currently, you do not know how to count tokens correctly or estimate the cost based on token usage.
Current Metrics:No token counting or cost estimation implemented yet.
Issue:Without token counting, you cannot predict or control the cost of API usage, which may lead to unexpected high expenses.
Your Task
Implement a token counting function that accurately counts tokens in given texts and estimate the total cost for a language model API call. The goal is to estimate cost within 5% accuracy of actual usage.
Use a simple tokenization method that approximates the model's token counting (e.g., splitting by spaces and punctuation).
Assume a fixed cost per 1000 tokens (e.g., $0.02 per 1000 tokens).
Do not use external tokenization libraries.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import re

def count_tokens(text: str) -> int:
    # Split text by spaces and punctuation to approximate tokens
    tokens = re.findall(r"\w+|[^\s\w]", text)
    return len(tokens)

# Example input and output texts
input_text = "Hello, how are you today?"
output_text = "I am fine, thank you!"

# Count tokens
input_tokens = count_tokens(input_text)
output_tokens = count_tokens(output_text)

total_tokens = input_tokens + output_tokens

# Cost per 1000 tokens
cost_per_1000 = 0.02

# Calculate cost
cost = (total_tokens / 1000) * cost_per_1000

print(f"Input tokens: {input_tokens}")
print(f"Output tokens: {output_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.5f}")
Implemented a token counting function using regex to split text into tokens.
Counted tokens for both input and output texts.
Calculated estimated cost based on total tokens and fixed cost per 1000 tokens.
Results Interpretation

Before: No token counting or cost estimation.

After: Input tokens: 7, Output tokens: 7, Total tokens: 14, Estimated cost: $0.00028

Counting tokens helps estimate API usage cost. Even a simple token counting method can give a close cost estimate, helping control expenses.
Bonus Experiment
Try improving token counting accuracy by handling contractions and special characters better.
💡 Hint
Use more advanced regex patterns or simple rules to split contractions like "don't" into two tokens.