GGUF

A file format for efficient storage and loading of quantized LLMs in llama.cpp.

GGUF enables fast inference on CPUs by optimizing model quantization and alignment. It's widely used in local runtimes for its compatibility and performance benefits. Adoption in projects like llama.cpp sets standards for low-level efficiency in AI deployment.


Category: framework · Mentioned in 2 Cortex outputs