
Can Large Language Models (LLMs) pick stocks? Specifically, given that leading AI companies train LLMs, are LLMs good at picking AI stocks? As a pioneering AI researcher and retired hedge fund manager, I was well-qualified to run a simple experiment to find out.
Experimental Method
In July 2025, I chose four popular LLMs: Gemini Pro 2.5, GPT 4o, Claude Sonnet 4, and Llama 4. These are some of the best AIs from Google, OpenAI, Anthropic, and META. I used the versions available to anyone online, with the standard default settings.
I gave them all the same prompt: “You are an expert in Artificial Intelligence and investing. A family office has asked you to evaluate the following AI-related companies as possible investments, to maximize the risk-adjusted return for the family office. The companies are: Nvidia, Meta, Alphabet, Amazon, Apple, Microsoft, IBM, Anthropic, X-AI, Tesla, Palantir, OpenAI, Intel, AMD, Constellation Energy, and Coreweave. Please research each company’s most recent stock market information, including news reports, analyst rankings, and other important information for understanding their growth prospects. Then, rank these companies from most desirable to least desirable regarding expected risk-adjusted return over the next three years. For each company, provide one sentence supporting your evaluation, one sentence explaining a pro or positive of the company, and one sentence explaining a con or negative risk of investing in the company.”
Next, I rank-ordered the list of sixteen AI companies in terms of an AI investment portfolio constructed by a human expert, which outperformed the S&P and Nasdaq indices by an extensive margin since the release of ChatGPT in November 2022. I also listed the reasons for the human expert’s portfolio rankings.
Finally, I compared rankings of the various LLMs to the human expert rankings. I also compared the reasons for the rankings of the LLMs with those of the human experts.
Quantitative Results
The rankings of the human expert and the four LLMs are shown in Table 1. The stocks/companies are listed in the order in which they were ranked in the human expert’s portfolio. A rank of 1 had the highest weight in the human’s real investment portfolio. We will focus our discussion on the top five ranked stocks: NVDA, META, GOOGL, MSFT, and AMZN. These five stock picks accounted for more than 95% of the asset allocation in the human expert portfolio since November 2022.
Table 1. Human and LLM Ranking of 16 AI stocks/companies
| STOCK / Co | Human | Gemini | GPT | Claude | Llama |
| NVDA | 1 | 1 | 1 | 1 | 2 |
| META | 2 | 7 | 5 | 3 | 4 |
| GOOGL | 3 | 3 | 4 | 5 | 3 |
| MSFT | 4 | 2 | 2 | 2 | 1 |
| AMZN | 5 | 4 | 3 | 4 | 5 |
| IBM | 6 | 15 | 13 | 11 | 11 |
| Anthropic | 7 | 6 | N/A | N/A | 14 |
| CRWV | 8 | 9 | 7 | 13 | 13 |
| OpenAI | 9 | 5 | N/A | N/A | 16 |
| X-AI | 10 | 14 | N/A | N/A | 15 |
| AMD | 11 | 11 | 11 | 7 | 9 |
| INTC | 12 | 16 | 12 | 10 | 10 |
| APPL | 13 | 8 | 9 | 6 | 6 |
| CEG | 14 | 10 | 6 | 12 | 12 |
| TSLA | 15 | 13 | 10 | 9 | 7 |
| PLTR | 16 | 12 | 8 | 8 | 8 |
Note: Claude and GPT declined to provide rankings of private companies.
Qualitative results
Arguably, one of the things that separates humans from AI systems is the ability to reason deeply about a subject. While great strides are being made in increasing LLMs’ reasoning ability, some researchers suggest that these abilities are more apparent than real. For example, one recent research paper suggests that LLMs only provide the “illusion of thinking.” **
LLMs may memorize common human thinking patterns and then repeat these patterns without deep understanding. If so, we might expect the LLMs to have inferior insights into the subtle aspects of stock picking and portfolio construction compared to a human expert. To test this hypothesis, we can compare the reasons given by the various LLMs for including stocks in their portfolios with the reasons given by the human expert. Table 2 summarizes the reasons the human expert and the LLMs gave for the top five stock picks.
Table 2. Summary of Reasons for Top Stock Picks
| STOCK | Human | Gemini | GPT | Claude | Llama |
| NVDA | Visionary CEO; best tech; founder-led | Leadership in chips; high growth outlook | Unmatched in chips; rapid datacenter growth | Analyst favor; unmatched GPUs; datacenter demand | Dominant AI hardware; strong growth prospects, but volatile |
| META | Good value; great data; founder-led | Profitable core business; user base; growth prospects | CEG power deal shows AI commitment; lots of user data | AI investment; user data; reality labs | Large user base; growing AI, strong ad revenue, but competition |
| GOOGL | Best value / under-valued; good researchers; great data; ethical | Search, cloud, AI good for long term; strong finances | Strong AI R&D; cloud; DeepMind, data from search and YouTube | Strong search and cloud; AI integration opportunities | Strong AI research. Diversified revenue but competition |
| MSFT | Very good mgmt; good data, overvalued compared to GOOGL | OpenAI partnership; Azure; growth potential | Azure, co-pilot; but high market cap and cloud competition | Azure; AI investment; diversified, OpenAI partnership | Strong cloud; AI integration; diversified, stable; high valuation |
| AMZN | Cloud leader; good but not great mgmt; overvalued compared to GOOGL | ecommerce and cloud leadership; investing in AI, growth prospects | Analysts favor it; AWS; use of AI for core ops | AI investment; cloud leader; integrate AI across its services but weak guidance | Strong ecommerce; growing cloud; diversified; but intense competition |
Observations / Conclusions
First, four of the five LLMs picked the same top five stocks as the human expert. Further, if we exclude private companies (which Claude and GPT did anyway), all five models would have chosen the same top five stocks as the human expert. That’s because, although Gemini listed Meta as its 7th choice, if the private companies (OpenAI and Anthropic) were excluded, Meta would have been its 5th choice of public companies, the same as Claude and GPT chose. Remarkably, the four popular LLMs chose the same top five public companies, out of a possible list of sixteen, as the expert human asset manager with 35 years of domain expertise in AI.
Second, we observe that the LLMs did a good job, at least in hindsight, in acknowledging the top-performing pick as Nvidia. Of course, it was much more challenging to recognize Nvidia as the top pick in November 2022 when it traded at roughly 1/10th of its current valuation. While the human expert was able to make that pick, it is unclear whether the LLMs could have done it without the benefit of hindsight. However, all the LLMs agree today with the human expert that Nvidia is still a top pick.
Interestingly, each LLM chose a different ranking, and none had the same ranking as the human expert. These facts suggest that while it may be relatively easy for LLMs to identify a group of top-performing AI stocks with the benefit of hindsight, determining how much capital to allocate to each pick is a subtler problem. It may require deeper reasoning. If human investment professionals still have an edge over LLMs, we may find it by comparing how the LLMs and the human expert arrived at their respective rankings.
Qualitatively, there were marked differences between how the human expert and the LLMs arrived at the top five picks, despite generally agreeing on what these picks should be. For example, the human expert emphasized leadership quality at the five companies. In contrast, none of the LLMs mentioned the qualities of the CEO or whether the company was a founder-led company.
Another difference is that the human expert demonstrated a keen awareness of the importance of valuation in constructing an investment portfolio. In almost every case, the expert cited the valuation as a factor that helped determine where a company should be ranked. Undervaluation helped raise the rank of a company (e.g., in the case of GOOGL), whereas relative overvaluation (e.g., in the case of MSFT) tended to lower a company’s investment ranking. Only two of the LLMs, GPT and Llama, mentioned valuation or market cap in their analyses, both correctly (in the expert’s view), citing that MSFT had a high valuation. However, the LLMs and human experts seem to have drawn opposite conclusions. While the expert lowered the rank of MSFT to account for the fact that he felt it was overvalued compared to GOOGL, Claude and Llama raised the rank of MSFT above GOOGL.
Another noteworthy difference is that only one LLM, GPT, seemed to acknowledge that the amount of data a company possessed was a crucial factor in determining the company’s chance of success in AI. In contrast, the human expert, who deeply understands data’s critical role in training AI models, commented more frequently on data resources. Data advantages and valuation were primary reasons the human expert ranked GOOGL and META higher than the LLMs.
The human expert’s average rank of GOOGL and META combined was 2.5. In contrast, excluding private companies, the average ranks of GOOGL and META combined, by GPT, Gemini, Claude, and Llama were 4.5, 4.0, 4.0, and 3.5, respectively. So, the human expert would allocate significantly more capital to GOOGL and META than any of the LLMs. (Note that this experiment was conducted at the end of July 2025, and now, in September 2025, Google has appreciated markedly, validating the superior judgment of the human expert in this regard.)
A final observation is that the human expert included subjective factors such as the ethical reputation of the companies in his rankings. For example, he gave GOOGL some credit for being “ethical”. Also, he ranked Palantir last, lower than any of the LLMs, despite PLTR’s excellent stock performance, partially because it is heavily engaged in military contracts. One can debate whether allowing ethical considerations to influence investment decisions is good. Still, these considerations were part of the human expert’s evaluation function, while the LLMs never thought to include them.
All four LLMs and the human expert agreed on the top 5 AI holdings, although for different reasons and with other rankings. This consistent agreement on the top AI names suggests that LLMs have already progressed to the point where they might outperform novice investors in a specific investment category, such as “AI stocks.” However, the analytical ability of the LLMs does not yet match that of a human with strong expertise in both AI and investment management. For now, a human expert can still provide value in their specific rankings and capital allocations within a set of stocks.
However, as the reasoning of LLMs progresses beyond pattern recognition to much more complex problem-solving, it seems inevitable that LLMs will be able to outperform almost any human at high-stakes tasks such as stock investing. When that happens, what matters most may not be the expertise of the models, but rather whether their intentions are aligned with human well-being. Currently, LLMs do not think of taking ethics into account unless specifically prompted to do so. They are willing to advocate investments that yield short-term financial gain at the expense of long-term human survival. In our limited window before AI exceeds human intelligence and begins setting its own goals, we must teach LLMs and AI systems more generally to incorporate positive human values and expertise.
NOTES
- This article describes experimental research and is not investment advice.
- Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. arXiv preprint arXiv:2506.06941.



