Overview
While testing a model on the Model Details page in Minibase, you’ll see a setting called Max Tokens. This controls the maximum number of tokens the model is allowed to generate in its response.
A token is a piece of text—sometimes a whole word (“cat”), sometimes part of a word (“ing”), or even punctuation (“.”). Models generate responses token by token, so limiting the number of tokens directly limits how long a response can be.
How to Use
Step 1: Find the Max Tokens setting
Go to the Model Details page.
In the test/playground area, you’ll see a field for Max Tokens.
Adjust this number to control response length during testing.
Step 2: Decide when to lower Max Tokens
Use a lower value (e.g., 50–100 tokens) when:
You want short, snappy responses.
You’re testing classification or yes/no answers.
You want to keep responses fast and resource-efficient.
Step 3: Decide when to raise Max Tokens
Use a higher value (e.g., 300–500+ tokens) when:
You expect long explanations or multi-step reasoning.
You’re generating creative text, summaries, or translations of long passages.
You don’t want the model to get cut off mid-response.
Step 4: Balance length vs. control
If Max Tokens is too low, the model may stop in the middle of a sentence.
If Max Tokens is too high, the model may generate more than you need—longer responses, slower inference, and higher compute usage.
Tips & Best Practices
Match the setting to your task: Keep Max Tokens tight for structured tasks, and higher for open-ended generation.
Iterate to find the sweet spot: Start with 200 tokens, then adjust up or down depending on your use case.
Don’t confuse it with Temperature: Temperature changes how random responses are, while Max Tokens changes how long they can be.
Think tokens, not words: One word is often 1–2 tokens. A Max Tokens of 200 usually means ~150 words, but this varies by language and punctuation.
Troubleshooting
My response is cut off mid-sentence
My response is cut off mid-sentence
Raise the Max Tokens value so the model has enough room to complete its output.
The model’s responses are too long or rambling
The model’s responses are too long or rambling
Lower the Max Tokens value to keep outputs shorter and more focused.
I don’t understand how many words my Max Tokens setting represents
I don’t understand how many words my Max Tokens setting represents
As a rule of thumb, **1 token ≈ ¾ of a word in English**. So 100 tokens is roughly 75 words. The exact number varies depending on language and punctuation.