Skip to main content

How do I use and deploy my models?

Learn how to use and deploy your Minibase models, including with Minibase Cloud APIs.

N
Written by Niko McCarty
Updated over a week ago

Overview

After you’ve trained a model in Minibase, you have two primary options for using and deploying it:

  1. Download the model for local or edge deployment.

  2. Use the model via Minibase Cloud APIs, without needing to set up your own infrastructure.

These options give you flexibility depending on your needs: self-hosted control, or fast access through our managed inference service.

How to Use

Option 1: Download your model

  • Task-based models are available for download in Hugging Face (HF) format.

  • All other models (Chat, Language, Micro) are available in GGUF format.

  • Once downloaded, you can deploy the model on your own machine, server, or edge device.

  • Note: You are responsible for setting up the inference runtime locally (e.g., using libraries like Hugging Face Transformers or llama.cpp). Or, join our Discord's #support channel to get help.

Option 2: Use your model with Minibase Cloud

  • Skip local deployment by using our Minibase Cloud APIs.

  • Train your model → deploy to Minibase Cloud → send API requests → receive responses. For more details on the Minibase Cloud, see our Help Article.

  • This is the fastest way to get started if your device has internet access.

  • Ideal for prototyping, web apps, and integrations where you don’t want to manage infrastructure yourself.

Tips & Best Practices

  • Pick the right format:

    • Use HF format for integrations with Hugging Face and Python-based environments.

    • Use GGUF format for efficient inference with tools like llama.cpp or edge deployments.

  • Start with Cloud for speed: If you just want to test your model quickly, use Minibase Cloud APIs before setting up local infrastructure.

  • Plan ahead for scaling: For larger deployments, containerization or orchestration (e.g., Docker, Kubernetes) may be useful. Support for containerized deployment is on our roadmap.

Troubleshooting

I don’t know how to run a GGUF model locally

You’ll need to use an inference library such as **llama.cpp** or compatible tools that support GGUF. Refer to their documentation for setup instructions.

Why are task-based models in a different format?

Task-based models use the Hugging Face (HF) format because it aligns with the sequence-to-sequence architecture that underpins them. Other models (like Chat or Language) are optimized for GGUF.

Can I use both Cloud and local deployment?

Yes. Many users prototype with Minibase Cloud, then move to a local or edge deployment once they’re ready to integrate into production.

My model is too slow locally

Try running it in Minibase Cloud, which provides optimized inference infrastructure and GPU access. Or switch to a smaller model variant if resources are limited.

Did this answer your question?