llama.cpp Archives

Home » llama.cpp

Boost Your Local LLM Speed: A Hands-On Guide to Speculative Decoding

May 6, 2026

Boost your local LLM speed by 2x or more. This guide covers the practical setup for Speculative Decoding using llama.cpp and vLLM on consumer GPUs.

How to Convert LLM Models to GGUF Format with llama.cpp Quantization

April 28, 2026

Running large language models locally requires shrinking their file size without destroying quality. This guide walks through the full llama.cpp pipeline: downloading a Hugging Face model, converting it to GGUF format, and quantizing it to Q4_K_M or other levels to fit consumer hardware.