Using Free Cloud-Based LLMs via Ollama on Ubuntu

Posted on Sun 19 April 2026 in GenAI Engineering

Ollama is a lightweight, open-source LLM runner. Its :cloud model suffix lets you route prompts to free-tier hosted models — no GPU, no paid API key required. Useful for learning, prototyping, and small projects on modest hardware.

This post covers the full Ubuntu setup: manual install, service startup, chatting with Kimi K2.5, fixing a common CLI error, and exposing the API to your local network.

Why Manual Install?

Official one-liner risk — The curl ... | sh script fetches assets from GitHub at runtime. Unstable GitHub connectivity breaks it mid-install.

Offline alternative — Download ollama-linux-amd64.tar.zst (~1.9 GB) from the GitHub Releases page separately, transfer via USB or SCP, install without runtime dependency.

Step 1 — Download and Transfer

From a stable machine — Visit https://github.com/ollama/ollama/releases, grab ollama-linux-amd64.tar.zst from the latest release, copy to USB or SCP to Ubuntu.

VMware USB tip — If Ubuntu inside VMware doesn't detect the USB drive, go to VM Settings → USB Controller → set compatibility to USB 3.2.

Step 2 — Extract and Organize

Create a local bin — In your home directory, create ~/bin/ollama-install/ and paste the archive there.

Extract — Run tar -xf ollama-linux-amd64.tar.zst. You'll get bin/ and lib/ subfolders. The ollama executable lives inside bin/.

No system install needed — Run Ollama directly from this path. No sudo, no /usr/local/bin required.

Step 3 — Start the Service

cd ~/bin/ollama-install/bin
./ollama serve

Verify in browser — Open http://127.0.0.1:11434. You should see Ollama is running.

Verify in terminal — Run ps -ef | grep ollama and confirm ./ollama serve is listed.

Step 4 — Chat With a Free Cloud Model

Launch the interactive menu — In a second terminal:

./ollama launch

Select a cloud model — Choose chat with a model, then pick any model with a :cloud suffix, e.g., kimi-k2.5:cloud.

Authenticate — Ollama opens a browser tab for login. If it doesn't open, copy the URL from terminal manually (remove any stray spaces).

Example session:

You:   What LLM are you using?
Model: I am Kimi, developed by Dark Side of the Moon Technology Co., Ltd.,
       part of the Kimi K2.5 series.

The model runs on Kimi's distributed cloud servers — not your local machine.

Step 5 — Fix the `launch` Error

Error you may see — error running model flag accessed but not defined verbose

Root cause — The launch subcommand has parameter compatibility issues with certain cloud models.

Fix — Use run directly:

./ollama run kimi-k2.5:cloud

Best practice — Always prefer ollama run <model> for scripted or production use. Use launch for exploration only.

Step 6 — Expose Ollama to Other Machines

Find your Ubuntu IP:

sudo apt install net-tools  # if ifconfig is missing
ifconfig
# e.g. 192.168.204.129

Bind to all interfaces:

export OLLAMA_HOST=0.0.0.0:11434

Restart the service:

pkill -9 ollama
./ollama serve

Test locally first — Hit http://192.168.204.129:11434 from Ubuntu's browser. Then point any Ollama-compatible client (OpenClaw, Open WebUI, Continue) on another machine to the same URL.

Security note — This binds to your local network only. For internet-facing access, add firewall rules and authentication.

What's Next

Pull local models — ollama run llama3 runs fully offline once downloaded.

Python integration — Use the requests library against http://localhost:11434/api/generate.

Connect AI clients — Continue, OpenClaw, and Open WebUI all support Ollama as a backend out of the box.