DeepSeek may have found a new way to improve AI’s ability to remember

Presently, most massive language fashions break textual content down into hundreds of tiny items known as tokens. This turns the textual content into representations that fashions can perceive. Nonetheless, these tokens shortly change into costly to retailer and compute with as conversations with finish customers develop longer. When a person chats with an AI for prolonged durations, this problem could cause the AI to overlook issues the person has already informed it and get info muddled, an issue some name “context rot.”

The brand new strategies developed by DeepSeek (and printed in its latest paper) may assist to beat this subject. As an alternative of storing phrases as tokens, its system packs written info into picture kind, nearly as if it’s taking an image of pages from a e book. This enables the mannequin to retain almost the identical info whereas utilizing far fewer tokens, the researchers discovered.

Basically, the OCR mannequin is a testbed for these new strategies that let extra info to be packed into AI fashions extra effectively.

In addition to utilizing visible tokens as an alternative of simply textual content ones, the mannequin is constructed on a sort of tiered compression that isn’t not like how human reminiscences fade: Older or much less crucial content material is saved in a barely extra blurry kind in an effort to save house. Regardless of that, the paper’s authors argue that this compressed content material can nonetheless stay accessible within the background, whereas sustaining a excessive degree of system effectivity.

Textual content tokens have lengthy been the default constructing block in AI programs. Utilizing visible tokens as an alternative is unconventional, and consequently, DeepSeek’s mannequin is shortly capturing researchers’ consideration. Andrej Karpathy, the previous Tesla AI chief and a founding member of OpenAI, praised the paper on X, saying that pictures could in the end be higher than textual content as inputs for LLMs. Textual content tokens could be “wasteful and simply horrible on the enter,” he wrote.

Manling Li, an assistant professor of pc science at Northwestern College, says the paper gives a brand new framework for addressing the prevailing challenges in AI reminiscence. “Whereas the thought of utilizing image-based tokens for context storage isn’t fully new, that is the primary examine I’ve seen that takes it this far and reveals it would really work,” Li says.

Source link

Chatbots are surprisingly effective at debunking conspiracy theories

Leveraging the clinician’s expertise with agentic AI

How AGI became most consequential conspiracy theory of our time

Building a high performance data and AI organization (2nd edition)

The AI Hype Index: Data centers’ neighbors are pivoting to power blackouts

Radio Intelligence at the Edge

DHS Agreement Reveals Risks of Using Social Security Data for Voter Citizenship Checks — ProPublica

When to Stream ‘Freakier Friday’ on Disney Plus

Beyond The Familiar: Modernizing Your IP Team For Maximum Impact

35 Best Family Board Games (2025): Catan, Ticket to Ride, Codenames

2 adults, 3 children dead in New Jersey house fire

Top Picks

Boycotting law clerks to pressure schools might ‘cross an important line,’ 8th Circuit chief judge says

8 Best Video Doorbell Cameras (2025): Smart, Battery, AI, Budget, and Subscription-Free

The Biglaw Executive Orders Get The FedSoc Echo Chamber Treatment

Is a Vibration Plate the Ultimate Weight Loss Hack? We Consulted the Pros

DeepSeek may have found a new way to improve AI’s ability to remember

Related Posts