Overview
Currently, datasets on Minibase can only be downloaded in JSON-L format. JSON-L stands for “JSON Lines,” which is essentially JSON presented as a list. Each line in the file contains one JSON object, making it an efficient and widely used format for handling structured data.
The reason Minibase uses JSON-L as the default (and only) export format is that it’s the most common standard for preparing data for training AI models. By converting all uploaded datasets into JSON-L, Minibase ensures that everything in our database is consistent, reliable, and ready to be used in machine learning workflows.
How to Use
Step 1: Export your dataset
Navigate to the dataset you’d like to download.
Click the Download button.
Your file will automatically download in JSON-L format.
Step 2: Open or process your dataset
JSON-L files can be opened with any text editor (like VS Code, Sublime, or Notepad++) or imported into Python, R, or other data analysis tools.
Each line in the file represents a separate JSON object, which makes it easy to stream large files without loading everything into memory at once.
Step 3: Use your dataset for training
Most machine learning libraries (such as PyTorch, TensorFlow, or Hugging Face datasets) support JSON-L as an input format.
Because the structure is consistent, you won’t need to worry about converting files from one format to another before starting training.
Tips & Best Practices
Validate your JSON-L file: Make sure every line is valid JSON before using it in a training pipeline. Tools like
jq
or Python’sjson
library can help.Keep your dataset clean: Consistent formatting and well-structured fields will yield better results during training.
Leverage streaming: For large datasets, read JSON-L line by line rather than loading it all at once. This avoids memory issues.
Troubleshooting
Why can’t I download my dataset in CSV or other formats?
Why can’t I download my dataset in CSV or other formats?
Right now, Minibase only supports JSON-L. This ensures consistency across the platform and avoids the need for conversion before training. Support for additional formats may be added in the future.
My JSON-L file looks “broken” in a text editor. Is that normal?
My JSON-L file looks “broken” in a text editor. Is that normal?
Yes, it’s normal for JSON-L files to appear as one JSON object per line. They don’t look as neatly indented as standard JSON, but the structure is correct and widely supported.
How do I know if my JSON-L file is valid?
How do I know if my JSON-L file is valid?
Try loading it with a tool like `jq` (`jq . file.jsonl`) or Python’s `json` module. If there are formatting issues, you’ll see an error message that helps you identify the problem.