Skip to main content

What is Max Tokens?

How to adjust it on the Model Details page, when to raise or lower it, how it affects response length, and how to troubleshoot.

N
Written by Niko McCarty
Updated this week

Overview

While testing a model on the Model Details page in Minibase, you’ll see a setting called Max Tokens. This controls the maximum number of tokens the model is allowed to generate in its response.

A token is a piece of text—sometimes a whole word (“cat”), sometimes part of a word (“ing”), or even punctuation (“.”). Models generate responses token by token, so limiting the number of tokens directly limits how long a response can be.

How to Use

Step 1: Find the Max Tokens setting

  • Go to the Model Details page.

  • In the test/playground area, you’ll see a field for Max Tokens.

  • Adjust this number to control response length during testing.

Step 2: Decide when to lower Max Tokens

  • Use a lower value (e.g., 50–100 tokens) when:

    • You want short, snappy responses.

    • You’re testing classification or yes/no answers.

    • You want to keep responses fast and resource-efficient.

Step 3: Decide when to raise Max Tokens

  • Use a higher value (e.g., 300–500+ tokens) when:

    • You expect long explanations or multi-step reasoning.

    • You’re generating creative text, summaries, or translations of long passages.

    • You don’t want the model to get cut off mid-response.

Step 4: Balance length vs. control

  • If Max Tokens is too low, the model may stop in the middle of a sentence.

  • If Max Tokens is too high, the model may generate more than you need—longer responses, slower inference, and higher compute usage.

Tips & Best Practices

  • Match the setting to your task: Keep Max Tokens tight for structured tasks, and higher for open-ended generation.

  • Iterate to find the sweet spot: Start with 200 tokens, then adjust up or down depending on your use case.

  • Don’t confuse it with Temperature: Temperature changes how random responses are, while Max Tokens changes how long they can be.

  • Think tokens, not words: One word is often 1–2 tokens. A Max Tokens of 200 usually means ~150 words, but this varies by language and punctuation.

Troubleshooting

My response is cut off mid-sentence

Raise the Max Tokens value so the model has enough room to complete its output.

The model’s responses are too long or rambling

Lower the Max Tokens value to keep outputs shorter and more focused.

I don’t understand how many words my Max Tokens setting represents

As a rule of thumb, **1 token ≈ ¾ of a word in English**. So 100 tokens is roughly 75 words. The exact number varies depending on language and punctuation.

Did this answer your question?