Skip to main content

What formats can I download datasets in?

M
Written by Michael McCarty
Updated over a week ago

Overview

Currently, datasets on Minibase can only be downloaded in JSON-L format. JSON-L stands for “JSON Lines,” which is essentially JSON presented as a list. Each line in the file contains one JSON object, making it an efficient and widely used format for handling structured data.

The reason Minibase uses JSON-L as the default (and only) export format is that it’s the most common standard for preparing data for training AI models. By converting all uploaded datasets into JSON-L, Minibase ensures that everything in our database is consistent, reliable, and ready to be used in machine learning workflows.

How to Use

Step 1: Export your dataset

  • Navigate to the dataset you’d like to download.

  • Click the Download button.

  • Your file will automatically download in JSON-L format.

Step 2: Open or process your dataset

  • JSON-L files can be opened with any text editor (like VS Code, Sublime, or Notepad++) or imported into Python, R, or other data analysis tools.

  • Each line in the file represents a separate JSON object, which makes it easy to stream large files without loading everything into memory at once.

Step 3: Use your dataset for training

  • Most machine learning libraries (such as PyTorch, TensorFlow, or Hugging Face datasets) support JSON-L as an input format.

  • Because the structure is consistent, you won’t need to worry about converting files from one format to another before starting training.

Tips & Best Practices

  • Validate your JSON-L file: Make sure every line is valid JSON before using it in a training pipeline. Tools like jqor Python’s json library can help.

  • Keep your dataset clean: Consistent formatting and well-structured fields will yield better results during training.

  • Leverage streaming: For large datasets, read JSON-L line by line rather than loading it all at once. This avoids memory issues.

Troubleshooting

Why can’t I download my dataset in CSV or other formats?

Right now, Minibase only supports JSON-L. This ensures consistency across the platform and avoids the need for conversion before training. Support for additional formats may be added in the future.

My JSON-L file looks “broken” in a text editor. Is that normal?

Yes, it’s normal for JSON-L files to appear as one JSON object per line. They don’t look as neatly indented as standard JSON, but the structure is correct and widely supported.

How do I know if my JSON-L file is valid?

Try loading it with a tool like `jq` (`jq . file.jsonl`) or Python’s `json` module. If there are formatting issues, you’ll see an error message that helps you identify the problem.

Did this answer your question?