Overview
When you upload a dataset to Minibase, you’re uploading a fine-tuning dataset. Fine-tuning datasets are used to train models to perform specific tasks by showing them examples of inputs and desired outputs.
In the future, Minibase will also support pre-training datasets and reward model training datasets, but for now, all uploads are structured as fine-tuning datasets.
How to Use
Prepare your dataset
Datasets can be uploaded in CSV, Excel, JSON, or JSON-L format.
Each row (or JSON object) should contain fields that map to how you want the model to behave.
When you upload, Minibase automatically converts your dataset into JSON-L for consistency.
Include the correct fields
A fine-tuning dataset in Minibase has up to three fields:
Instruction – A description of what you want the model to do.
Example: “Translate this text from English to Spanish.”
Input (Prompt) – The content or question you’re giving to the model.
Example: “Where is the Eiffel Tower located?” or “The cat is on the table.”
Response (Output) – The correct or ideal answer you want the model to return.
Example: “The Eiffel Tower is in Paris.” or “El gato está sobre la mesa.”
Decide which fields to use
You can include Instruction + Input + Response, or just Input + Response, or Instruction + Response.
All three fields are not always required.
Example:
For translation tasks, you’ll want all three fields.
For Q&A or trivia, you might only need Input + Response.
Upload and review
Go to the Datasets tab and upload your file.
Minibase will extract the columns and format them correctly in JSON-L.
Double-check that your fields are clean and aligned before training.
Tips & Best Practices
Keep it clean: Datasets should be free of typos, duplicates, and irrelevant entries. A clean dataset produces better fine-tuned models.
Stay consistent: Use the same field structure across your dataset (e.g., don’t mix some rows with Instruction and some without unless intentional).
Think about generalization: The more varied your examples are, the better your model will perform on unseen inputs.
Troubleshooting
Do I have to include both “Instruction” and “Input”?
Do I have to include both “Instruction” and “Input”?
No. You can use either one or both, depending on your task. For some use cases (like translation), both are useful. For others (like trivia or Q&A), Input + Response is enough.
Why is my file converted to JSON-L after upload?
Why is my file converted to JSON-L after upload?
Minibase automatically standardizes all datasets into JSON-L, since it’s the most common format for training models and ensures consistency across the platform.
What happens if my dataset has extra columns?
What happens if my dataset has extra columns?
Only the recognized columns (Instruction, Input, Response) are used. Any additional columns will be ignored during processing.