How to Use
TL;DR: Use Small (135M) for speed and simple tasks, Standard (360M) for balance and general use, or Large (1.7B) for complex, nuanced, and longer tasks.
Step 1: Start by assessing your task
Is it simple and repetitive, or does it require nuance and detailed responses?
How long are the outputs you expect (short answers vs. paragraphs)?
Will you be running the model on limited hardware, or do you have capacity for larger models?
Step 2: Match your needs to the model size
Small Base Model (135M)
Fastest and lightest option — runs smoothly on limited hardware.
Best for simple, structured tasks that don’t require nuanced reasoning.
Ideal use cases:
Text classification (e.g., labeling, spam detection).
Keyword or data extraction.
Simple factual Q&A with short answers.
Limitations: Struggles with long responses, subtle language, or multi-step reasoning.
Standard Base Model (360M)
Balanced model — more accurate and expressive than the Small model, but still efficient.
Great general-purpose choice for most fine-tuning projects.
Ideal use cases:
Instruction following with short or medium-length responses.
Sentiment analysis with multiple categories.
Text normalization or rewriting.
Multi-step but narrow reasoning (e.g., extract + reformat data).
Limitations: Not as strong for highly nuanced or long-form tasks.
Large Base Model (1.7B)
Most capable option — excels at nuance and detailed output.
Best for complex, varied, or open-ended tasks.
Ideal use cases:
Conversational agents or chatbots.
Summarization of longer text.
Knowledge-based Q&A where detail matters.
Complex classification or reasoning across longer context.
Limitations: Slower and requires more memory than smaller models.
Step 3: Iterate if unsure
If you’re not sure which model to pick, start small.
Prototype with the Standard Base Model, then move to the Large Base Model if you need more nuance.
Tips & Best Practices
Prototype fast, then scale: Start with the smaller or standard model for quick feedback, then train larger models as needed.
Consider deployment environment: Use the Small model for mobile or edge use, and larger models for server or cloud deployment.
Think about output length: Longer, more complex outputs generally require the Large Base Model.
Troubleshooting
I don’t know which model to start with
I don’t know which model to start with
Pick the Standard Base Model — it’s the most balanced option and works for a wide variety of tasks.
The outputs are too shallow or simplistic
The outputs are too shallow or simplistic
Move up to a larger model (Standard or Large Base) for better reasoning and nuance.
My model runs too slowly
My model runs too slowly
Switch to a smaller model (Small or Standard Base) to improve speed.