How to Use Llama 2 locally

Llama 2 has arrived! The highly anticipated update to Meta’s language model is now available for local installation. We know many of you have been eager to get your hands on this powerful AI assistant. In this post, we’ll walk you through the steps for setting up Llama 2 locally on your own machine.

  • Preparing for Local Use

Whether you’re an AI enthusiast, developer, or business leader, having access to Llama 2 locally unlocks a world of possibilities. You’ll be able to utilize Llama’s advanced natural language capabilities for a wide range of applications, while keeping your data private and secure.

We’re thrilled to help guide you through the local setup process. With some simple configuration, you’ll have this remarkable AI assistant running smoothly in no time. The team at Meta has put in long hours to deliver this major update, and we think you’re going to love exploring everything Llama 2 has to offer.

Preparing for Local Use

Running Llama 2 locally provides a lot of flexibility since it doesn’t require an Internet connection. We’ve seen fascinating examples of its use, such as creating websites to showcase the cool factors of llamas. And with the release of Llama 2, we now have access to open-source tools that allow running it locally. Here are the main ones:

  • Llama.cpp (Mac/Windows/Linux)
  • Ollama (Mac)
  • MLC LLM (iOS/Android)

Let’s dive into each one of them.

Llama.cpp: A Versatile Port of Llama

Llama.cpp is a C/C++ port of the Llama, enabling the local running of Llama 2 using 4-bit integer quantization on Macs. However, it extends its support to Linux and Windows as well.

To install it on your M1/M2 Mac, here is a line you can use:

“`bash curl -L “” | bash “` This installation command will also run fine on an Intel Mac or Linux machine, but without the `LLAMA_METAL=1` flag:

“`bash curl -L “” | bash “`

For Windows on WSL, use:

“`bash curl -L “” | bash “`

Ollama: A macOS App

Ollama is a macOS open-source app that lets you run, create, and share large language models with a command-line interface, and it already supports Llama 2.

To use the Ollama CLI, download the macOS app at Once installed, you can freely download Lllama 2 and start chatting with the model.

Here are the lines you can use to download the model:

```bash download the 7B model (3.8 GB) ollama pull llama2

or the 13B model (7.3 GB) ollama pull llama2:13b ```

And then run the model:

“`bash ollama run llama2 “`

Windows: A Detailed Guide

To install Llama on Windows, you need to follow these steps:

  1. Clone and download the Llama repository.
  2. Visit the Meta website and register to download the model/s. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the script.
  3. Once you get the email, navigate to your downloaded llama repository and run the script. Make sure to grant execution permissions to the script.
  4. During this process, you will be prompted to enter the URL from the email. Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.
  5. Once the model/s you want have been downloaded, you can run the model locally using the command provided in the Quick Start section of the Llama repository.

Windows users have a step-by-step guide for downloading and running the Llama model using Nvidia GPU’s CUDA Toolkit and cloning the relevant GitHub repository.

After following these steps, you can create a powershell function that can quickly run prompts with `llama “prompt goes here”`.


Running Llama 2 locally is becoming easier with the release of Llama 2 and the development of open-source tools designed to support its deployment across various platforms. Whether you are on a Mac, Windows, Linux, or even a mobile device, you can now harness the power of Llama 2 without the need for an Internet connection. As Llama 2 continues to evolve, we can expect even more exciting developments in the near future.


Google Announces A Cost Effective Gemini Flash

At Google's I/O event, the company unveiled Gemini Flash,...

WordPress vs Strapi: Choosing the Right CMS for Your Needs

With the growing popularity of headless CMS solutions, developers...

JPA vs. JDBC: Comparing the two DB APIs

Introduction The eternal battle rages on between two warring database...

Meta Introduces V-JEPA

The V-JEPA model, proposed by Yann LeCun, is a...

Mistral Large is Officially Released – Partners With Microsoft

Mistral has finally released their largest model to date,...

Subscribe to our AI newsletter. Get the latest on news, models, open source and trends.
Don't worry, we won't spam. 😎

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Lusera will use the information you provide on this form to be in touch with you and to provide updates and marketing.