Study isolates memorization from reasoning in AI models

Researchers at AI startup Goodfire.ai have found that language models store memorization and logical reasoning in separate neural pathways. Their preprint paper, released in late October, shows that removing memorization circuits eliminates 97 percent of verbatim recall while preserving nearly all reasoning abilities. Surprisingly, arithmetic tasks rely on memorization pathways, explaining AI's math struggles.

In a preprint paper released in late October, researchers from Goodfire.ai analyzed neural networks to distinguish between memorization—reciting exact training data like quotes—and reasoning, such as solving problems with general principles. They examined models like the Allen Institute for AI’s OLMo-7B, finding a clean separation: at layer 22, the bottom 50 percent of weight components activated 23 percent higher on memorized data, while the top 10 percent did so 26 percent higher on general text.

By surgically removing these memorization pathways using a technique called K-FAC on the model's loss landscape, the team reduced verbatim recall from nearly 100 percent to 3.4 percent. Logical reasoning tasks, including Boolean evaluations, if-then rules, object tracking, BoolQ yes/no questions, Winogrande common sense inference, and OpenBookQA science reasoning, retained 95 to 106 percent of baseline performance.

Arithmetic operations, however, shared pathways with memorization, dropping to 66 percent accuracy after removal. The researchers note that models treat facts like “2+2=4” as memorized items rather than computed logic, akin to a student relying on times tables without understanding multiplication. Common facts like country capitals remained stable, but rare ones like company CEOs dropped 78 percent.

Tested on OLMo-1B and custom Vision Transformers trained on mislabeled ImageNet data, the method restored 66.5 percent accuracy on mislabeled images by removing memorization. It outperformed prior techniques like BalancedSubnet, achieving 16.1 percent memorization on unseen quotes versus 60 percent.

While promising for removing copyrighted or harmful content without harming reasoning, the approach suppresses rather than erases information, which can reactivate with further training. The team cautions that math's vulnerability might stem from shared circuits, and some complex abilities could mimic memorization.

Diese Website verwendet Cookies

Wir verwenden Cookies für Analysen, um unsere Website zu verbessern. Lesen Sie unsere Datenschutzrichtlinie für weitere Informationen.
Ablehnen