Minibase

After you’ve trained a model in Minibase, you have two primary options for using and deploying it:

Download the model for local or edge deployment.

Use the model via Minibase Cloud APIs, without needing to set up your own infrastructure.

1. Download the model for local or edge deployment.
2. Use the model via Minibase Cloud APIs, without needing to set up your own infrastructure.

These options give you flexibility depending on your needs: self-hosted control, or fast access through our managed inference service.

Task-based models are available for download in Hugging Face (HF) format.

All other models (Chat, Language, Micro) are available in GGUF format.

Once downloaded, you can deploy the model on your own machine, server, or edge device.

Note: You are responsible for setting up the inference runtime locally (e.g., using libraries like Hugging Face Transformers or llama.cpp). Or, join our Discord's <a href="https://discord.gg/6Th5Vda6" rel="nofollow noopener noreferrer" target="_blank">#support channel</a> to get help.

- Task-based models are available for download in Hugging Face (HF) format.
- All other models (Chat, Language, Micro) are available in GGUF format.
- Once downloaded, you can deploy the model on your own machine, server, or edge device.
- Note: You are responsible for setting up the inference runtime locally (e.g., using libraries like Hugging Face Transformers or llama.cpp). Or, join our Discord's <a href="https://discord.gg/6Th5Vda6" rel="nofollow noopener noreferrer" target="_blank">#support channel</a> to get help.

Option 2: Use your model with Minibase Cloud

Skip local deployment by using our Minibase Cloud APIs.

Train your model → deploy to Minibase Cloud → send API requests → receive responses. For more details on the Minibase Cloud, see our <a href="https://help.minibase.ai/en/articles/12040626-how-to-create-an-api-key-and-run-inference-using-the-minibase-inference-cloud">Help Article</a>.

This is the fastest way to get started if your device has internet access.

Ideal for prototyping, web apps, and integrations where you don’t want to manage infrastructure yourself.

- Skip local deployment by using our Minibase Cloud APIs.
- Train your model → deploy to Minibase Cloud → send API requests → receive responses. For more details on the Minibase Cloud, see our <a href="https://help.minibase.ai/en/articles/12040626-how-to-create-an-api-key-and-run-inference-using-the-minibase-inference-cloud">Help Article</a>.
- This is the fastest way to get started if your device has internet access.
- Ideal for prototyping, web apps, and integrations where you don’t want to manage infrastructure yourself.

- Use HF format for integrations with Hugging Face and Python-based environments.
- Use GGUF format for efficient inference with tools like llama.cpp or edge deployments.

Start with Cloud for speed: If you just want to test your model quickly, use Minibase Cloud APIs before setting up local infrastructure.

Plan ahead for scaling: For larger deployments, containerization or orchestration (e.g., Docker, Kubernetes) may be useful. Support for containerized deployment is on our roadmap.

- Pick the right format:
 - Use HF format for integrations with Hugging Face and Python-based environments.
 - Use GGUF format for efficient inference with tools like llama.cpp or edge deployments.
- Start with Cloud for speed: If you just want to test your model quickly, use Minibase Cloud APIs before setting up local infrastructure.
- Plan ahead for scaling: For larger deployments, containerization or orchestration (e.g., Docker, Kubernetes) may be useful. Support for containerized deployment is on our roadmap.

You’ll need to use an inference library such as **llama.cpp** or compatible tools that support GGUF. Refer to their documentation for setup instructions.

I don’t know how to run a GGUF model locally

Task-based models use the Hugging Face (HF) format because it aligns with the sequence-to-sequence architecture that underpins them. Other models (like Chat or Language) are optimized for GGUF.

Why are task-based models in a different format?

Yes. Many users prototype with Minibase Cloud, then move to a local or edge deployment once they’re ready to integrate into production.

Can I use both Cloud and local deployment?

Try running it in Minibase Cloud, which provides optimized inference infrastructure and GPU access. Or switch to a smaller model variant if resources are limited.

Learn how to use and deploy your Minibase models, including with Minibase Cloud APIs.

How do I use and deploy my models?

Join our Discord!

Find answers and get help from Intercom Support and Community Experts

This site employs cookies and other technologies that we and our third party vendors use to monitor and record personal information about you and your interactions with the site (including content viewed, cursor movements, screen recordings, and chat contents) for the purposes described in our Cookie Policy. By continuing to visit our site, you agree to our {websiteTermsLink}, {privacyPolicyLink} and {cookiePolicyLink}.

This site uses cookies and similar technologies ("cookies") as strictly necessary for site operation. We and our partners also would like to set additional cookies to enable site performance analytics, functionality, advertising and social media features. See our {cookiePolicyLink} for details. You can change your cookie preferences in our Cookie Settings.

We use cookies to make our site work and also for analytics and advertising purposes. You can enable or disable optional cookies as desired. See our {cookiePolicyLink} for more details.

Advertising cookies are set by our advertising partners to collect information about your use of the site, our communications, and other online services over time and with different browsers and devices. They use this information to show you ads online that they think will interest you and measure the ads' performance. Social media cookies are set by social media platforms to enable you to share content on those platforms, and are capable of tracking information about your activity across other online services for use as described in their privacy policies.

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

These cookies are necessary for the website to function and cannot be switched off in our systems.

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

You have the right to opt out of the sale of your personal information. See our {cookiePolicyLink} for more details about how we use your data.

Your Privacy Choices

We use cookies to enhance your experience. You can customize your cookie preferences below. See our {cookiePolicyLink} for more details.

Cookie Settings

Link, Press control-option-right-arrow to exit

Empty Help Center

Uh oh. That page doesn’t exist.

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Tickets submitted through the messenger or by a support agent in your conversation will appear here.

How do I use and deploy my models?

Overview

How to Use

Tips & Best Practices

Troubleshooting

I don’t know how to run a GGUF model locally

Why are task-based models in a different format?

Can I use both Cloud and local deployment?

My model is too slow locally