Gertjan Verdickt
Gertjan Verdickt.png

Large language models may seem brilliant at making predictions, but often it’s just memorization. And that has implications for investors relying on LLMs for economic analysis.

What if I told you that I know exactly what inflation was in June 2015, that the S&P500 closed at 2,810 points on May 2, 2020, and that I can tell you—without a single mistake—how many times US unemployment rose above 9 percent since 1990? You’d probably think I’m some kind of economic genius—or… that I peeked. And that’s exactly what large language models like ChatGPT do. That’s where things get tricky.

In a recent academic paper titled The Memorization Problem, researchers from the University of Florida show that LLMs (like GPT-4o) don’t just “remember” data—they store it with near-encyclopedic precision. Economic indicators, stock index levels, even headlines from The Wall Street Journal—they recall it all. Perfectly… as long as it’s in the past.

SP500

Figure: The chart compares LLM-estimated values of the S&P500 with actual values. Panel A plots estimated values against real ones. Panel B shows the estimation error, calculated as (Estimated – Actual) / Actual, expressed in percentage points (where 5 equals 5 percent). No outliers were removed; all observations are included.

Predicting or just remembering?

That may sound impressive—and it is—but it’s also misleading. When we ask these models to make predictions about data from before their so-called cut-off date (for GPT-4o, that’s October 2023), we need to ask ourselves: is the model actually forecasting? Or is it just digging up something it already saw?

The researchers give a clear example: suppose you ask ChatGPT to predict how US GDP will evolve in Q4 of 2008, giving it only data up to Q3. You’re expecting some interpretation, some analysis, maybe a bit of economic intuition. Instead, you get a brilliant-sounding prediction… that also happens to be exactly right. Why? Because the model already knows the Q4 data. It’s not forecasting—it’s regurgitating. Like asking a student to take a test they happened to glance at the day before.

Even when explicitly instructed not to use any data past a certain date, the model still insists on giving correct answers. It’s like telling someone, “Forget what you know about the financial crisis”—good luck with that.

Even when researchers mask the input—replacing company names with “Company X,” omitting dates, fuzzing out specific numbers—the model often still guesses correctly which company is being discussed, and in which quarter the report was published. That’s because it recognizes patterns, even when details are missing. ChatGPT “knows” that “Company X” is actually Meta, and that we’re talking about Q1 of 2018.

What this means for investors

So what does this mean for you, as an investor? Simple: if you’re using LLMs to train models, simulate historical scenarios, or backtest hypothetical strategies, it’s easy to fool yourself. You might think the model has made a clever inference, when in fact it’s just repeating something it once saw. Prediction is not the same as recollection. One clear solution: only use data from after the model’s knowledge cut-off date.

It’s a bit like the famous 1976 “Judgment of Paris” wine competition, where American wines unexpectedly beat out French classics. Everyone thought: America has arrived. But then you dig deeper—was it luck? Was it the setting? Or just an unusually strong vintage? Ask ChatGPT who won in 1976, and it will rattle off the correct wines, scores, and judges. But if you ask it to predict the 1975 outcome—was that true analysis, or just a memory trick?

Gertjan Verdickt is assistant professor of finance at the University of Auckland and a columnist at Investment Officer.

Author(s)
Categories
Access
Members
Article type
Column
FD Article
No