Jul 112025

We test strategies from ChataGPT – will AI help you make money in the stock market?

ChatGPT generates graphics, writes software and articles or analyzes data. And all of this, at least at first glance, it does well. So ideas of building trading strategies with its help quickly arose. Because if AI knows everything, maybe it also knows the secret of making money in the market? Unfortunately, in practice the matter is not as simple as it seems.

Will ChatGPT write a trading system for us?

To test in practice ChatGPT’s ability to build strategies (an ability that is promoted today by YouTube creators and course vendors), I asked the model to give me ideas for trading systems from three groups: beginner, intermediate and advanced.

The model proposed three solutions:

For beginners: SMA Crossover strategy (buy/sell after crossing simple moving averages).
Intermediate: RSI + Price Action (playing overbought and oversold levels).
Advanced: Momentum + Volatility Regime Filter (ATR) – buy after increases excluding periods of high volatility.

In practice, each of the above ideas is designed for beginners or at best – intermediate traders. Although the latest strategy attempts to filter periods of higher volatility using ATR, it is a tool novice traders are quite familiar with. On the plus side, the model proposed the use of Momentum, which is sometimes actually implemented in real trading systems.

Nonetheless, there are no statistically advanced ideas or those with a deeper economic rationale. Simply put – good old technical analysis.

However, the adjective “good” should be taken in quotation marks here, since such methods are known in the environment for their ineffectiveness. Intersections of moving averages are most often a tool that does not do badly in directional trends and gives back all the profits in sideways trends. The opposite is true with the RSI – it works in sideways trends and becomes a money-losing machine when the market settles into a balance.

Not only does the Efficient Market Hypothesis prove that beating indexes is a difficult task, broken by “impossible,” the bot offered us nothing new. To extract ideas that make a little more sense, you need to know yourself exactly what you are looking for.

Strategy for beginners: intersections of moving averages

However, let’s move on to the results. After receiving the code for all strategies, I asked the model to compare the backtesting results with the classic Buy&Hold on the S&P500 index. The first system – based on moving averages, from 2010-01-01, as expected, generated a 28.47% loss. At the same time, holding the S&P500 index (ticker ^GSPC) in the portfolio yielded a 449.87% gain.

By default, the strategy uses moving averages with periods of 20 and 50:

We can extend these periods to 20 and 200, 21 and 200, and 50 and 200 to reduce the frequency of transactions and compare the results of all variants on a single chart.

NOTE!

Testing different variants of moving averages is aimed at finding the most effective settings. In real-world conditions, this can lead to overfitting the strategy to the test data, on which specific settings work best (known as overfitting) and poor performance of the strategy in real trading. To avoid this, traders usually use Walk-Forward optimization, which involves a step-by-step verification of the strategy on successive pieces of data.

This is where the stumbling blocks begin, because a user who does not understand the code generated by LLM may have considerable trouble performing such a task. Chat sometimes generates logical errors during code expansion, and when asked to correct them based on the exact message of the error in question, new errors are made.

If we are lucky – it will eventually hit the right solution. If we are not – it will never hit it.

In this case, the error was simple and Chat eliminated it after several attempts. He generated new code with several strategies differing in the length of their moving averages. Alongside this, he was instructed to implement additional statistics for each of them – maximum capital slippage, annual volatility, Sharpe and Sortino Ratio, number of trades and average holding time. Transaction costs of 0.1% were also simulated. It handled all of these quite well.

The results are already clearly better. The MA20-MA50 base strategy continues to generate a loss – this time of 39.61%, due to the high frequency of trades (169) and the associated costs. The MA20-MA200 and MA21-MA200 variants turned out to be the best, generating 52.99 and 59.48% profit (with only 33 trades). So much for the good news – the profit of almost every strategy was bought with a maximum capital slip of >50%. The exception is MA50-MA200, which had a Max Drawdown of -40.83% (but yielded only 30.64% profit).

The nail in the coffin here, of course, is the S&P500 itself, which gave a 449.87% return over the same period, with a maximum slide of -33.92%. As expected – none of the strategies based on moving averages proved profitable.

The last thing we will try is to eliminate short positions. Thus, moving averages will only act as a filter that will allow us to close a long position at a signal to sell, which will perhaps allow us to wait out periods of declines on the S&P500 and limit capital slips.

As you can see, it’s even better. The MA20-MA50 strategy, which has been generating starts so far, managed to make a 105.99% profit with a Max Drawdown of -30.84%. The variant with the best performance is still MA21-MA200 (217.59%). The average capital slippage in each strategy is now hovering around -30%, which is slightly better than that of the S&P500. Beyond that, however, little has changed. The top system still generates just less than half of the index’s return (217.59% vs. 449.87%), which means using it makes no sense.

It should also not be forgotten that the strategies are still being tested on the S&P500 index, which is gaining in value over the long term, so making any profits is not particularly surprising. Intersections of averages, however, are very popular among traders playing out currency pairs that fluctuate, falling less frequently into longer trends.

So we move on to the EURUSD pair, turn on short positions, and here MA-based strategies show their true face.

Max Drawdown for the worst of them is more than 35%, while the profit generated by the best of them is only 6.34%, and is most likely completely random.

It must be admitted that ChatGPT can be a very useful tool for beginners, because with a minimum of technical knowledge, limited to the ability to use a Jupyter Notebook, it can actually prove that systems promoted on the Internet based on simple indicators do not work. And if they do work, they generate results that don’t even come close to beating benchmarks. Then it’s easy to conclude that instead of playing around with trading, it would be wiser to simply buy the S&P500.

The latest test is taking place in the Bitcoin market.

Perhaps because BTC is a young market and until recently was dominated by retail traders, long/short trading strategies based on moving averages generate some profits here. However, the Max Drowdown for each of them is comparable to Bitcoin itself and exceeds 80%, with profits being rather meager.

Bitcoin’s Buy&Hold since September 2014 yielded a 23,614% profit, when the best tested strategy in the same period produced a profit of only 3805%. This is a result of more than 6 times worse with the same capital slips.

As with the S&P500, we are left to check the Long Only option, which may allow us to improve performance and reduce the Max Drawdown.

In this case, the MA21-MA200 strategy managed to beat Bitcoin’s score (25416% vs. 23646%) while achieving a smaller Max Drawdown (-66% vs. -83.40%). The problem, however, is the minimal number of transactions (21), the speculative mania on Bitcoin that has taken place over the past decade, and its age. It’s hard to say whether the relatively good performance of such simple strategies is due to their real value, or rather to the immaturity of cryptocurrencies and the sensitivity to economic conditions that has been observed for some time and the tendency to rise along with US indices.

Intermediate strategy: classic RSI levels

There are two systems left to test – intermediate and advanced.

The first is based on the RSI indicator. If the RSI value is less than 30, the market is considered oversold and we open a long position. If it is greater than 70, we consider the market overbought and open a short position.

The strategy originally written by ChatGPT by default holds the position for only one day, so I asked the model to expand it with basic risk management. Take Profit was set at ATR x2, and Stop Loss – at ATR x3.

At the same time, this was probably the most interesting of the cases, because the bot made a logical error with it, which completely skewed the backtesting result. With an SL of ATR x 1.5 and a TP equal to ATR x3, i.e. with an RRR ratio of 1:2, the strategy beat the performance of the S&P 500 index on its head, generating several times the profit. On the other hand, with the exclusion of short positions and Stop Loss and Take Profit at ATR x 2… it went bankrupt. Although, due to the lack of leverage and opening only longs on the S&P500, which is rising in the long term, it should earn anything.

After looking at the program, it turned out that ChatGPT, instead of counting the profit/loss of a given position after it was closed, added its return to the Returns series after the creation of each candle, treating it as a separate trade. In this way, it didn’t matter that a long position, with no stop loss, lost x% at the bottom, but ended up coming out on top. Each daily price slide the code added to the posted losses.

Defective lines have been eliminated, and performance has become predictable again. Since 2010, the benchmark has gained 452%, and our long/short strategy with an RRR of 1:2: -42.55%. At the 1990 start, the loss was already -66.64%.

The long-only variances are -16.43 and -46.96%, respectively. After reversing the RRR to 2:1 (2j loss for 1j gain), the results improved slightly: 8.14% since 2010 and -28.64 since 1990. It remained to test a version that often earns pennies and less often loses significant amounts (with an RRR of 12:1). The return here since 1990 is 9%, and since 2010: -11.66%.

The strategy has been a real disaster in the Bitcoin market. With long + short and an RRR of 1:2, it has lost 89% of its capital since 2010. It was of no use to modify SL and TP, which only reduced the drawdown. Due to BTC’s historical gains, it helped to turn off short at RRR 2:1: 436.19% and RRR 1:1: 1024.37%.

On the EURUSD pair, on the other hand, performance hovered around -18%.

Strategy for the “advanced”: momentum and ATR indicator

The advanced strategy generated by ChatGPT also proved to be a waste of time. The system uses the ATR indicator to filter volatility and, based on that, classifies the state the market is in. If the volatility of the instrument is statistically low and its price has been rising over the past 10 days, the strategy opens a long position.

This helped to limit the Max Drawdown to 8.52%, but as you can see from the chart, the system missed a huge part of the upward phase of the S&P500, being under the dash for almost 3 years. The end of the day, it generated a token return of 8.5%, which was probably accidental and due to a small sample.(315 trades and 15 years of market life).

In fact, you can check this by changing the start date from 2010 to 1990.

Now, instead of a small gain, we see a nearly 48% loss.

Strategies from ChataGPT vs. real-world backtesting

None of the strategies generated satisfactory results, but that was not the biggest problem, but the fact that ChatGPT did not guide the user on the right track by offering more serious backtesting. He did not warn him about price slippage in real conditions, overfitting, which can make a strategy’s good performance in tests spill over in live trading. He did not propose an in-sample/out-of-sample breakdown of the data (training and test data), optimization, Walk-Forward validation or a Monte Carlo method to generate different permutations of the capital curve that the system generated.

Perhaps it’s a coincidence or the problem could be fixed with better, more detailed commands. The point, however, is that a trader who expects the model to generate a ready-made strategy, who doesn’t fully understand the code he’s receiving and lacks the knowledge of proper backtesting, probably won’t know what to ask and what commands to give to the model. And this means that ChatGPT is something like a Google search engine on steroids – a tool for someone who knows exactly what he is looking for. The model will not lead us by the hand and build a monetized system for us.

“Real” artificial intelligence on the stock market

In fact, LLMs in trading are used when processing financial news to determine sentiment in the media and little more. When it comes to AI algorithms used in working with financial data, they are often models such as LSTMs (Long-Short-Term-Memory), which are neural networks designed to process and predict sequences of data, thanks to their ability to “remember” information over a longer period of time.

Due to the sensitivity of such models to overfitting, i.e. over-reliance on training data, traders sometimes also choose Random Forest-an algorithm based on so-called decision trees, which, through random selection of data and features, builds multiple independent models (the “Forest” of the title) and then aggregates their predictions, thus reducing the risk of overfitting.

More importantly, the prices of financial instruments depend on a huge number of factors and contain a lot of noise, so algorithms rarely predict them. More often, they try to predict variables that can affect the price. Thus, a trader in stock markets can use AI, for example, to predict the sales performance of a particular company, and a trader of currency pairs can use AI to predict the value of economic indicators.

Attempts to train models on historical prices, especially at home and with the help of ChatGPT, will at best end up creating a strategy that learns the training data, but will stop doing well outside this sample and in live trading, because the “patterns” it has recognized will turn out to be market noise.

ChatGPT should therefore be treated as an assistant that will speed up the work, but will not do it for us. It is certainly not a magic box spitting out strategies beating benchmark results, and the reason is very simple – LLM does not think, but pretends to think, using statistics.

How does the Chatbot really work?

All LLMs, in simple terms, have one condition: they are statistical models of the content on which they were trained and predict the next word in the sequence. They generate something that looks like the creation of a rational entity, but it only looks that way because the texts on which they are taught were created by humans.

In fact, their construction is more complicated and is based on linear algebra. This is because the model does not provide for words as we understand them, but for so-called tokens, or clusters of characters. In addition, these tokens are actually numerical values (vectors) in a matrix (the so-called tensor). In this way, the model performs operations on vectors representing different text fragments, learning on huge data sets to predict token sequences in different contexts. Transformer is responsible for converting tokens into vectors in the matrix (so-called “embeddings”).

For example, in the following example, we see a fragment of the tensor representing a simple sentence, “The cat is lying on the carpet.”

This is why some have called chatbots “stochastic parrots.” LLM in action somewhat resembles such a complicated parrot – it repeats words it does not understand.

For a trader who wants a chatbot to write him a ready-made strategy and who, even worse, can’t program or dissect the logic of his own ideas, this has huge implications. If we ask LLM to write a trading system, the model will refer to the code on which it was trained. Since most of the publicly available strategies (which served as training data) have no chance of beating the market (i.e., it makes no sense to use them), we will get worthless code that is unlikely to perform better than a random strategy from GitHub written by a human. And if it achieves results that are suspiciously good, it’s probably flawed.

Author : Maciej Halikowski

We test strategies from ChataGPT – will AI help you make money in the stock market?

Will ChatGPT write a trading system for us?

Strategy for beginners: intersections of moving averages

NOTE!

Intermediate strategy: classic RSI levels

Strategy for the “advanced”: momentum and ATR indicator

Strategies from ChataGPT vs. real-world backtesting

“Real” artificial intelligence on the stock market

How does the Chatbot really work?

Are you a trader?

Help others and rate your broker!Use the search engine or find it in the list .

Latest articles:

Remember

Warnings

Articles

The news

How to Choose a Broker [eBook]

TOP 5 Brokers

TOP 3 Stock Exchanges

Partners

Meet the broker

Reklama