TLDR: We will run local HTTP server which exposes OpenAI compatible API which uses Llama 3 8B on Apple M1 laptop. We will also install Chrome extension which can summarize articles using this API.
Step 1: Run HTTP server compatible with OpenAI API interface
# Clone llama.cpp
git clone <https://github.com/ggerganov/llama.cpp.git>
cd llama.cpp
# Install ccache to cache compilation results
brew install ccache
# Build
LLAMA_METAL=1 make
# Download Llama 3 8B model
curl -L "<https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true>" --output ./models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf
# Build server
make server
# Run HTTP server on <http://127.0.0.1:8080> with 2048 context size
./server -m models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf -c 2048
Step 2: Install and configure Chrome extension
Step 3: Configure Chrome extension
Prompt
Define the key facts and main developments (in other words, the most important information) of this text. Don't miss anything which looks important to mention. Write this in a form of bullet points.
Extension configuration
After you've configured the extension, you're all set! You've done an excellent job getting this far.
So, let’s give it a try now!
Here's an example of what it looks like when the extension is in use