TinyLlama and Llama3 with Raspberry Pi 5

Run basic AI tasks on your Raspberry Pi 5 with the lightweight and speedy TinyLlama large language model.

Written By: Cherie Tan

Dash icon
Difficulty
Easy
Steps icon
Steps
5
TinyLlama is a lightweight LLM, making it one of the fastest options for your Raspberry Pi. While it may not generate results as intricate as larger models, it's perfectly capable of answering basic questions. Plus, its speed makes it ideal for quick interactions.

Step 1 Run TinyLlama

If you haven't already, follow our guide on How to install Ollama on the Raspberry Pi 5.

Next, simply type this command into the terminal: 
ollama run tinyllama
This downloads the model and gets it ready for use. Due to its small size, the download should be relatively quick.

Step 2 Interact with Tinyllama

Once the model is running, you can ask it questions and have conversations using the AI interface.

Step 3 Exploring Llama3

Llama3 is a heavyweight LLM, offering a significant leap in capabilities compared to TinyLlama. It can generate high-quality results, but keep in mind that it requires more processing power and takes longer to run on a Raspberry Pi.

Note: Make sure you have at least 4.7GB of free space on your Raspberry Pi to accommodate the larger model size.

If you're ready to test Llama3's potential, use this command in your terminal:
ollama run llama3
Once the download and setup are complete, you can start interacting with the model and witness its advanced capabilities. The complexity of your prompts will affect both the model's processing time and your Pi's resources!

Step 4 Interacting with Ollama's API on Raspberry Pi 5

Ollama's API lets you interact with different language models programmatically. Here's how to send requests and get results!

Crafting the Request:
Use curl to send a POST request to the Ollama API at http://localhost:11434/api/generate. The request body should be JSON containing:
  • "model" (string): Specify the LLM you want to use (e.g., "tinyllama").
  • "prompt" (string): The question or text prompt for the model (e.g., "Write a short and catchy limerick about a programmer").
  • "stream" (boolean, optional): Set to false to receive the entire response at once, or true for a stream of data (one word at a time).

Step 5 Example Request

Here is an example request with TinyLlama:
curl http://localhost:11434/api/generate -d '{
  "model": "tinyllama",
  "prompt": "Write a short and catchy limerick about a programmer",
  "stream": false
}'
Ollama will return a JSON response containing the generated text and additional information like processing time.