Close Menu
  • Home
  • AI
  • Education
  • Entertainment
  • Food Health
  • Health
  • Sports
  • Tech
  • Well Being

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Cava Is Embracing AI for a Personalized Dining Experience

May 20, 2026

Samsung’s AI Boom Sparks Labor Dispute Over Profit Sharing

May 20, 2026

Layoff Day at Meta Has Arrived

May 20, 2026
Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
IQ Times Media – Smart News for a Smarter YouIQ Times Media – Smart News for a Smarter You
  • Home
  • AI
  • Education
  • Entertainment
  • Food Health
  • Health
  • Sports
  • Tech
  • Well Being
IQ Times Media – Smart News for a Smarter YouIQ Times Media – Smart News for a Smarter You
Home » Running AI models is turning into a memory game
AI

Running AI models is turning into a memory game

IQ TIMES MEDIABy IQ TIMES MEDIAFebruary 17, 2026No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars’ worth of new data centers, the price for DRAM chips has jumped roughly 7x in the last year.

At the same time, there’s a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. The companies that master it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.

Semiconductor analyst Doug O’Laughlin has an interesting look at the importance of memory chips on his Substack, where he talks with Val Bercovici, chief AI officer at Weka. They’re both semiconductor guys, so the focus is more on the chips than the broader architecture; the implications for AI software are pretty significant too.

I was particularly struck by this passage, in which Bercovici looks at the growing complexity of Anthropic’s prompt-caching documentation:

The tell is if we go to Anthropic’s prompt caching pricing page. It started off as a very simple page six or seven months ago, especially as Claude Code was launching — just “use caching, it’s cheaper.” Now it’s an encyclopedia of advice on exactly how many cache writes to pre-buy. You’ve got 5-minute tiers, which are very common across the industry, or 1-hour tiers — and nothing above. That’s a really important tell. Then of course you’ve got all sorts of arbitrage opportunities around the pricing for cache reads based on how many cache writes you’ve pre-purchased.

The question here is how long Claude holds your prompt in cached memory: You can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to draw on data that’s still in the cache, so if you manage it right, you can save an awful lot. There is a catch though: Every new bit of data you add to the query may bump something else out of the cache window.

This is complex stuff, but the upshot is simple enough: Managing memory in AI models is going to be a huge part of AI going forward. Companies that do it well are going to rise to the top.

And there is plenty of progress to be made in this new field. Back in October, I covered a startup called Tensormesh that was working on one layer in the stack known as cache optimization.

Techcrunch event

Boston, MA
|
June 23, 2026

Opportunities exist in other parts of the stack. For instance, lower down the stack, there’s the question of how data centers are using the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, although it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to structure their model swarms to take advantage of the shared cache.

As companies get better at memory orchestration, they’ll use fewer tokens and inference will get cheaper. Meanwhile, models are getting more efficient at processing each token, pushing the cost down still further. As server costs drop, a lot of applications that don’t seem viable now will start to edge into profitability.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
IQ TIMES MEDIA
  • Website

Related Posts

Google just declared itself a contender in AI design at IO 2026

May 19, 2026

Google just declared itself a contender in AI design at IO 2026

May 19, 2026

You can now talk to your Gmail inbox, as seen at Google IO 2026

May 19, 2026
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Is new ethics policy for Michigan State trustees a gag order or commonsense loyalty?

May 19, 2026

Opening statements given in trial of ex-Virginia school official charged after student shot teacher

May 19, 2026

Democratic-led states challenge caps on federal student loans

May 19, 2026

AI pep talks at college commencements prompt boos from graduates

May 19, 2026
Education

Is new ethics policy for Michigan State trustees a gag order or commonsense loyalty?

By IQ TIMES MEDIAMay 19, 20260

Michigan State University trustees have until this weekend to sign an updated board ethics policy…

Opening statements given in trial of ex-Virginia school official charged after student shot teacher

May 19, 2026

Democratic-led states challenge caps on federal student loans

May 19, 2026

AI pep talks at college commencements prompt boos from graduates

May 19, 2026
IQ Times Media – Smart News for a Smarter You
Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 iqtimes. Designed by iqtimes.

Type above and press Enter to search. Press Esc to cancel.