Illustration depicting linguists studying why human language resists compression like computer code, contrasting brain processing with digital efficiency.

Study explores why human language isn’t compressed like computer code

20. Februar 2026

Von KI berichtet

Bild generiert von KI

Fakten geprüft

A new model from linguists Richard Futrell and Michael Hahn suggests that many hallmark features of human language—such as familiar words, predictable ordering and meaning built up step by step—reflect constraints on sequential information processing rather than a drive for maximum data compression. The work was published in Nature Human Behaviour.

Human language is remarkably rich and intricate. From an information-theory standpoint, the same ideas could, in principle, be transmitted in far more compact strings—similar to how computers represent information using binary digits.

Michael Hahn, a linguist at Saarland University in Saarbrücken, Germany, and Richard Futrell of the University of California, Irvine, set out to address why everyday speech does not resemble a tightly compressed digital code. In a paper published in Nature Human Behaviour in November 2025, the researchers present a model in which “natural-language-like” structure arises when communication is constrained by limits on sequential prediction—how much information must be carried forward from what has already been heard to anticipate what comes next.

In that framework, language benefits from patterns that are easy for people to process as a stream. A ScienceDaily summary of the work, citing materials from the University of Osaka, uses examples to illustrate the idea: an invented word such as “gol” for a hybrid concept (half cat and half dog) would be hard to understand because it does not map cleanly onto shared experience, and a scrambled blend like “gadcot” is similarly difficult to interpret. By contrast, “cat and dog” is immediately meaningful.

The researchers also point to word order as a signal that helps listeners reduce uncertainty in real time. The ScienceDaily release highlights the German noun phrase “Die fünf grünen Autos” (“the five green cars”) as an example of how meaning can be built incrementally as each word narrows the set of plausible interpretations. Reordering those words—for example, “Grünen fünf die Autos”—disrupts that predictability and makes comprehension harder.

Beyond explaining why language is not “maximally compressed,” the paper’s discussion connects the findings to machine learning. Futrell and Hahn argue that natural language is structured in a way that makes next-token prediction comparatively easier under cognitive constraints, a point they say is relevant to modern large language models.

Faktenüberprüfung

Konfidenzwert

Konfidenzkommentar

Most concrete claims—including the authors, their affiliations, the paper’s title and publication date, the core “predictive information” argument, and the specific illustrative examples (“gol,” “gadcot,” and the German phrases)—are supported directly by the ScienceDaily release and the underlying Nature Human Behaviour paper. Two elements were softened because they were not cleanly substantiated as written: the article’s framing of a strict tradeoff against “maximum information compression,” and the exact figure of “roughly 7,000” languages, which appears in the release but is not established in the paper itself. Overall reliability is strong because the rewrite relies primarily on the peer-reviewed study and a consistent institutional summary.

Study links step-by-step brain responses during speech to layered processing in large language models

21. Januar 2026 Von KI berichtet Bild generiert von KI Fakten geprüft

A new study reports that as people listen to a spoken story, neural activity in key language regions unfolds over time in a way that mirrors the layer-by-layer computations inside large language models. The researchers, who analyzed electrocorticography recordings from epilepsy patients during a 30-minute podcast, also released an open dataset intended to help other scientists test competing theories of how meaning is built in the brain.

Study explores why human language isn’t compressed like computer code

Verwandte Artikel

Study links step-by-step brain responses during speech to layered processing in large language models

Study uncovers 40,000-year-old signs as early information systems

Computer language spots error in widely cited physics paper

Study uncovers overlap in brain networks for episodic and semantic memory

US commission credits China’s AI edge to open-source models, manufacturing

Northwestern engineers print artificial neurons that can stimulate living brain cells

Scientists say defining consciousness is increasingly urgent as AI and neurotechnology advance

AIs frequently recommend nuclear strikes in war simulations

OpenAI unveils biology-tuned large language model GPT-Rosalind

Quantum systems exhibit memory depending on perspective, scientists find

Cortical Labs to build biological data centres in Melbourne and Singapore

Study points to whole-brain network coordination as a key feature of general intelligence

Human brain cells on chip learn to play Doom in a week

Study shows AI can deanonymize online users from posts

Generative AI outperforms human teams in analyzing medical data

Two-month-old babies categorize objects earlier than thought

Hackers are using LLMs to build next-generation phishing attacks

Diese Website verwendet Cookies