Artificial Intelligence Update: Skynet Fears, but Much More of the Same
I authored our Viewpoint column in September 2023 and discussed the history of Artificial Intelligence. A client recently told me he enjoyed the column and, given how AI has captivated our culture, asked if I would provide an update.
As an illustration of the power and usefulness of AI, I asked my colleague Eric Wathen to use it to generate summary reports on AI innovation over the past couple of years. They were quite helpful in putting together this column.
As a reminder from my first column, Alan Turing made several significant contributions to the field in the 1930s through 1950s, including his math machines to break coded German messages during World War II and the Turing Test to assess “intelligence.” These and other breakthroughs spurred significant research and spending that yielded little tangible progress, ushering in the first “AI Winter” from the early 1970s to early 1980s. After expert systems spurred a brief resurgence, the second “AI Winter” set in, lasting until the early 2010s. The purchase of DeepMind in 2014 by Alphabet (then Google) married machine learning with neural networks, unleashing significant research and investment that underpins all modern large language models (LLMs).
LLMs are programs that can recognize and generate text. They are “trained” by feeding them with enough examples to be able to recognize and interpret patterns in human language. To create an LLM, text must be converted to numbers, the language of computers. A standard vocabulary is decided, such as the alphabet, numbering each letter. Text is then converted to tokens by this numbering system. “Tokenization” also compresses the dataset, making for faster access. Since most LLMs have been created in the U.S., English is the standard tokenization language.
LLMs received a significant boost in performance from Alphabet’s breakthrough transformer architecture in 2017. This architecture weighs the importance of each word (or token) in a sequence relative to other words in that sequence. This enables the model to process relationships between elements in a sequence simultaneously, regardless of their distance from each other, which dramatically improves its understanding of context. This architecture allows models to assign weighting to tokens to prioritize their relevance, further improving efficiency. For example, a model may be pre-trained to predict how a segment of language continues, or what is missing in the segment, given a segment from its training database. This allows the model to complete the sentence “I like to eat [___]” with “ice cream” or “sushi” or fill in the blanks in a sentence like “I like to [___] [___] cream” with “eat” and “ice.” Early LLM models followed a prompting method, like the examples above.
An LLM is a type of foundation model. Foundation models are trained on vast datasets that can be applied to a wide variety of use cases, like language, image creation or software coding. Building foundation models is often highly resource intensive, with the most advanced models costing hundreds of millions of dollars to cover the expense of acquiring, curating, and processing massive data sets as well as the computing resources required to train the model.
The public release of ChatGPT (Chat Generative Pre-trained Transformer) in November 2022 was an inflection point for modern AI. ChatGPT was a product of scaling, the theory that increasing a model’s size, the volume of its training data, and the computing resources used to train it would lead to qualitatively new abilities. The chatbot’s growth was explosive, acquiring over 100 million monthly active users within just two months of its launch, a user adoption curve faster than any consumer Internet application in history. The foundation model for ChatGPT was GPT-3.5, built by OpenAI, the cost of which is a closely guarded secret but estimated to be in hundreds of millions of dollars. ChatGPT is an LLM fine-tuned for conversation. A user inputs a prompt, and the model then predicts the most probable and relevant response.
The success of ChatGPT was overshadowed by limitations in how foundation models are built. Users noted the chatbot would sometimes “hallucinate” by generating false, misleading, or nonsensical content. To get the best model results careful attention must be paid to using datasets that remove low-quality, duplicate, or toxic data. The internet is a great source of data but includes misinformation, cultural biases and harmful information.
Nevertheless, foundation models are powerful and have triggered a wave of investment, research and product development that some in the industry call the “Cambrian Explosion,” analogous to the (relatively short in a historical sense) 13–15-million-year Cambrian era when life on earth exploded from simple, primarily single-cell organisms into a myriad of complex forms. In March 2023, OpenAI introduced GPT-4, a Large-scale Multimodal Model (LMM) that incorporated images, leading to other LMMs that understand additional forms of data such as audio and video.
While training GPT-4 was extremely expensive, using the model was not. Competition from Meta, Alphabet and others created ever larger models but hallucinations remained a problem. To provide more accurate answers, new techniques and implementations of foundation models pushed more of the computing resources toward using the model, called inference. One technique, retrieval-augmented generation (RAG), allowed foundation models to supplement their training with new information, particularly useful for cost-effectively updating models and corporations that want to incorporate proprietary data into existing models, such as customer information. Another innovation, reinforcement learning from human feedback (RLHF), uses human judgement to change token weightings, steering a model toward more helpful and accurate responses.
The most recent and profound shift has been models designed for reasoning. In late 2024, OpenAI introduced its o1 model, the first to spend significant compute resources at inference by asking the model to construct a step-by-step “chain of thought” before providing an answer. The paradigm was validated by the Chinese company DeepSeek. In January 2025, the firm released its reasoning model, DeepSeek-R1, which scored favorably against o1 but at a fraction of the cost to train.
Better answers provide the opportunity to expand the use of foundation models. ChatGPT is an example of Generative AI, the term used to describe any model that generates an answer to an input. Various new tools and techniques have been used with foundation models to create Agentic AI, a process where AI can do things for the user, like authoring an e-mail, researching itineraries for travel or, eventually, driving an autonomous vehicle.
The combination of reasoning models and new AI uses is driving significantly more inference demand than previously expected. In response, 2025 capital expenditures for the Magnificent 7 (NVIDIA, Microsoft, Alphabet, Amazon, Meta Platforms, Apple, and Tesla) are expected to run between $300-$335 billion. NVIDIA projects spending of $3-4 trillion over the next five years for AI data centers. Will there be enough value created by these new AI capabilities to justify this gargantuan spending? Perhaps. However, the AI model builders have massive incentives to get these costs down and many private companies, like OpenAI, are not yet profitable or generating revenue commensurate with these massive expenditures.
As we look to the future the “holy grail” of AI is Artificial General Intelligence (AGI), a machine with human-level intelligence capable of understanding, learning, and applying its knowledge across a vast range of tasks and domains, much like a human being. I believe we are still a long way from AGI. The foundation models of today, even with reasoning-infused techniques, are primitive compared to the complexity of the human mind. I’m not worried about Skynet just yet, but we will see more interesting, helpful uses for AI, like assisting me with this column.
Dan Boyle, CFA®