Summarizing articles with Llama 3 on Apple M1

TLDR: We will run local HTTP server which exposes OpenAI compatible API which uses Llama 3 8B on Apple M1 laptop. We will also install Chrome extension which can summarize articles using this API.

Step 1: Run HTTP server compatible with OpenAI API interface

# Clone llama.cpp
git clone <https://github.com/ggerganov/llama.cpp.git>
cd llama.cpp

# Install ccache to cache compilation results
brew install ccache

# Build
LLAMA_METAL=1 make

# Download Llama 3 8B model
curl -L "<https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true>" --output ./models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf

# Build server
make server

# Run HTTP server on <http://127.0.0.1:8080> with 2048 context size
./server -m models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf -c 2048

Step 2: Install and configure Chrome extension

ChatGPT Summary Assistant

Step 3: Configure Chrome extension

Prompt

Define the key facts and main developments (in other words, the most important information) of this text. Don't miss anything which looks important to mention. Write this in a form of bullet points.

Extension configuration

After you've configured the extension, you're all set! You've done an excellent job getting this far.

So, let’s give it a try now!

Untitled

Here's an example of what it looks like when the extension is in use