Skip to main content

What are fine-tuning datasets?

This article explains what fine-tuning data is and how to format it for use in Minibase.

M
Written by Michael McCarty
Updated over a week ago

Minibase Help: Fine-Tuning Data

What is fine-tuning data

Fine-tuning data is a collection of examples that teach a base model how to behave for your tasks. Each example pairs an instruction or input with the response you want the model to produce. Better examples lead to more consistent task execution and instruction following.

Required schema

Minibase supports three fields per example:

  • instruction: A task or request for the model.

  • input: Optional context the model should use.

  • response: The expected output for the instruction and or input.

Requirements

  • At least one of instruction or input must be provided.

  • response is required for every example.

Size limits

  • Target dataset size: about 10,000 examples.

  • Maximum examples: 100,000.

  • Maximum file size: 512 MB.

Supported upload formats

  • CSV

  • Excel (.xlsx or .xls)

  • JSON

  • JSONL

All uploads are automatically converted to JSONL to keep a common internal structure.


Samples

JSONL

{"instruction":"Summarize the text","input":"The quick brown fox jumps over the lazy dog.","response":"A fox jumps over a dog."} {"instruction":"Translate to French","input":"Hello, how are you?","response":"Bonjour, comment ça va ?"} {"instruction":"","input":"User: What time is the meeting? Assistant: It starts at 3 PM.","response":"It starts at 3 PM."}

JSON

[ { "instruction": "Format the name as Last, First", "input": "Ada Lovelace", "response": "Lovelace, Ada" }, { "instruction": "", "input": "Item: Apples, Qty: 4, Price: 0.50", "response": "Total: 2.00" }, { "instruction": "Generate a brief greeting email", "input": "", "response": "Hi there, thanks for reaching out. I’ll follow up with details shortly." } ]

Privacy and sharing

You can mark datasets as Private or Team.

  • Private means only you can see and use the dataset.

  • Team makes the dataset available to everyone on your team for training.


Need More Help?

Need More Help?
Join our Discord support server to chat with our team and get real-time assistance.

Did this answer your question?