How to Run Llama 3 Locally

Name: Lynn Mikami

Published on 4/30/2024

A comprehensive guide on installing and running the powerful Llama 3 language models (8B and 70B versions) on your local machine using the Ollama tool.

Meta's Llama 3 is the latest iteration of their open-source large language model, boasting impressive performance and accessibility. With model sizes ranging from 8 billion (8B) to a massive 70 billion (70B) parameters, Llama 3 offers a potent tool for natural language processing tasks. However, running such massive models locally can be challenging, requiring substantial computational resources and technical expertise. Fortunately, Ollama, a streamlined tool developed by Microsoft, simplifies the process of running open-source LLMs like Llama 3 on local machines.

What is Ollama?

Ollama is a user-friendly solution that bundles model weights, configurations, and datasets into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage, making it easier for developers and researchers to run large language models locally. Ollama supports a wide range of models, including Llama 3, allowing users to explore and experiment with these cutting-edge language models without the hassle of complex setup procedures.

System Requirements for Running Llama 3 Locally

Before diving into the installation process, it's essential to ensure that your system meets the minimum requirements for running Llama 3 models locally. The resource demands vary depending on the model size, with larger models requiring more powerful hardware.

For the 8B model, you'll need at least:

8GB of VRAM
16GB of RAM
A GPU like the NVIDIA RTX 3070 or better is recommended for optimal performance.

As for the 70B model, you'll require:

A high-end GPU with at least 24GB of VRAM, such as the NVIDIA RTX 3090 or A100
At least 64GB of RAM
Sufficient storage space, as these models can consume several gigabytes of disk space.

Installing Ollama

The installation process for Ollama is straightforward and can be accomplished with a single command. Open a terminal on your system and run the following:

curl -fsSL https://ollama.com/install.sh | sh

This command will download and install the latest version of Ollama on your system. Once the installation is complete, you can verify the installation by running ollama --version.

Downloading Llama 3 Models

Ollama provides a convenient way to download and manage Llama 3 models. To download the 8B model, run the following command:

ollama pull llama3-8b

For the 70B model, use:

ollama pull llama3-70b

These commands will download the respective models and their associated files to your local machine. Depending on your internet connection speed and system specifications, the download process may take some time, especially for the larger 70B model.

Running Llama 3 Models

Once you have the models downloaded, you can run them using Ollama's run command. For the 8B model, execute:

ollama run llama3-8b

For the 70B model, use:

ollama run llama3-70b

These commands will start an interactive session with the respective Llama 3 model, allowing you to input prompts and receive generated responses. Ollama will handle the necessary setup and configuration, making it easy to interact with the models without extensive technical knowledge.

Advanced Usage

Ollama offers several advanced features and options to enhance your experience with Llama 3 models. For example, you can specify the number of GPUs to use, enable quantization for faster inference, or adjust the batch size and sequence length for optimal performance.

To explore these advanced options, refer to the Ollama documentation or run ollama run --help for a list of available options and their descriptions.

Integrating Llama 3 with Applications

While running Llama 3 models interactively is useful for testing and exploration, you may want to integrate them into your applications or workflows. Ollama provides a Python API that allows you to programmatically interact with the models, enabling seamless integration into your projects.

Here's an example of how to use the Ollama Python API to generate text with the Llama 3 8B model:

import ollama
 
# Load the model
model = ollama.load("llama3-8b")
 
# Generate text
prompt = "Once upon a time, there was a"
output = model.generate(prompt, max_new_tokens=100)
 
print(output)

This code snippet loads the Llama 3 8B model, provides a prompt, and generates 100 new tokens as a continuation of the prompt. You can customize the prompt, output length, and other parameters according to your needs.

Benchmarks and Performance of Llama 3 8B and Llama 3 70B

Benchmarks and Performance of Llama 3

Llama 3 models have shown impressive performance on various benchmarks, often outperforming their predecessors and larger models. Here are some benchmark results:

General Benchmarks

Benchmark	Llama 3 8B	Llama 3 70B
MMLU (5-shot)	66.6	79.5
AGIEval English (3-5 shot)	45.9	63.0
CommonSenseQA (7-shot)	72.6	83.8
Winogrande (5-shot)	76.1	83.1
BIG-Bench Hard (3-shot, CoT)	61.1	81.3
ARC-Challenge (25-shot)	78.6	93.0

Knowledge Reasoning

Benchmark	Llama 3 8B	Llama 3 70B
TriviaQA-Wiki (5-shot)	78.5	89.7

Reading Comprehension

Benchmark	Llama 3 8B	Llama 3 70B
SQuAD (1-shot)	76.4	85.6
QuAC (1-shot, F1)	44.4	51.1
BoolQ (0-shot)	75.7	79.0
DROP (3-shot, F1)	58.4	79.7

These benchmarks demonstrate the impressive capabilities of Llama 3, with the 70B model often outperforming the 8B version, as expected. However, the 8B model still delivers remarkable performance, making it a viable option for those with limited computational resources.

Conclusion

Running large language models like Llama 3 locally has never been easier thanks to Ollama. With its user-friendly interface and streamlined setup process, Ollama empowers developers, researchers, and enthusiasts to harness the power of these cutting-edge models on their local machines. Whether you're working on natural language processing tasks, exploring the capabilities of Llama 3, or integrating it into your applications, Ollama provides a convenient and efficient solution. So, why wait? Download Ollama today and unlock the potential of Llama 3 on your local system!

How to Run Llama 2 Locally on Mac, Windows, iPhone and Android How to Run LLM in Google Colab for Free