An independent evaluation on standard hardware shows that large language models can deliver usable performance without a GPU. The tests focused on speed and practicality for everyday tasks.
An Intel i5 laptop with 12 GB of RAM served as the test platform for eight models using Ollama and GGUF quantization. Performance ranged from 34 tokens per second for the smallest models down to around 4 tokens per second for larger ones.