llama.cpp
An open-source C++ implementation for running LLMs efficiently on CPUs and GPUs.
llama.cpp optimizes for low-level performance with formats like GGUF and commits improving speed, such as 64-byte aligned tile buffers. It differentiates from Ollama through efficiency gains for resource-constrained environments. Its compatibility updates influence competitors like Ollama to enhance model support.
Category: project · Also: llama cpp · Mentioned in 3 Cortex outputs