Unlocking AI's Potential: Exploring the Cognitive Limits of Large Language Models

Have you ever wondered how smart artificial intelligence (AI) really is?

Can it understand what you are saying, or is it just repeating what it has read on the internet?

Can it solve logical puzzles that require reasoning and awareness, or is it just following a set of rules?

These are some of the questions that researchers at the Bank for International Settlements (BIS) tried to answer in a recent bulletin.

They challenged one of the most advanced AI systems, called GPT-4, with a famous logic puzzle known as Cheryl’s birthday.

In this blog post, we will explain what GPT-4 is, how it works, and what the BIS researchers found out about its cognitive abilities. We will also discuss the implications of their findings for the future of AI and its applications in central banking and beyond.

Read my other posts here: Conventional Finance - FinFormed, Islamic Finance - FinFormed, Takaful - FinFormed, Career - FinFormed and Randow Writings - FinFormed

What is GPT-4 and how does it work?

GPT-4 is a large language model (LLM) that can generate text on almost any topic, given some input words or sentences. It was developed by OpenAI, a research organization dedicated to creating and promoting ethical AI.

GPT-4 is based on a deep neural network, which is a type of machine learning algorithm that mimics the structure and function of the human brain. GPT-4 has about 175 billion parameters, which are the numerical values that determine how the neural network processes information. To put this number in perspective, the human brain has about 86 billion neurons, which are the cells that transmit signals in the brain.

GPT-4 was trained on a massive amount of text data from the internet, such as books, news articles, social media posts, and Wikipedia pages. It learned to predict the next word or sentence, given the previous ones, by analyzing the patterns and relationships in the data. For example, if the input is “The capital of France is”, GPT-4 would likely generate “Paris” as the next word.

GPT-4 can also perform various natural language tasks, such as answering questions, summarizing texts, writing essays, creating images, and even solving mathematical problems. It does so by using a technique called “few-shot learning”, which means that it can adapt to new tasks with minimal or no additional training, just by looking at a few examples.

For instance, if you want GPT-4 to write a poem, you can simply give it a few lines of poetry as input, and it will generate more lines that follow the same style and rhyme scheme. Or, if you want GPT-4 to write code, you can give it a few lines of code as input, and it will generate more code that performs the same or similar function.

What is Cheryl’s birthday puzzle and why is it hard to solve?

Cheryl’s birthday puzzle is a logic puzzle that went viral in 2015 and has its own Wikipedia page. It goes like this:

Cheryl has set her two friends Albert and Bernard the task of guessing her birthday. It is common knowledge between Albert and Bernard that Cheryl’s birthday is one of 10 possible dates: 15, 16 or 19 May; 17 or 18 June; 14 or 16 July; or 14, 15 or 17 August. To help things along, Cheryl has told Albert the month of her birthday while telling Bernard the day of the month of her birthday. Nothing else has been communicated to them. As things stand, neither Albert nor Bernard can make further progress. Nor can they confer to pool their information. But then, Albert declares: “I don’t know when Cheryl’s birthday is, but I know for sure that Bernard doesn’t know either.” Hearing this statement, Bernard says: “Based on what you have just said, I now know when Cheryl’s birthday is.” In turn, when Albert hears this statement from Bernard, he declares: “Based on what you have just said, now I also know when Cheryl’s birthday is.” Question: based on the exchange above, when is Cheryl’s birthday?

This puzzle is hard to solve because it requires several cognitive skills that are not easy for machines (or humans) to master. These skills include:

  • Reasoning about knowledge: The puzzle involves statements about what Albert and Bernard know or do not know, based on what Cheryl told them and what they said to each other. To solve the puzzle, one has to keep track of the different levels of knowledge and how they change over time.
  • Reasoning about counterfactuals: The puzzle involves statements about what would happen if something were true, even though it is not. For example, Albert says that he knows that Bernard does not know, which means that he can rule out some dates that would have allowed Bernard to know. To solve the puzzle, one has to imagine different possible worlds and how they relate to the actual world.
  • Reasoning about logic: The puzzle involves statements that follow logical rules and principles, such as deduction, elimination, and inference. To solve the puzzle, one has to apply these rules and principles correctly and consistently.

How did GPT-4 perform on Cheryl’s birthday puzzle?

The BIS researchers posed the puzzle of Cheryl’s birthday to GPT-4 using the well-known 2015 wording of the puzzle. After each round, they cleared the memory and started a new session. They found that GPT-4 performed flawlessly on all three runs, with great fluency and clarity in exposition.

They also found that GPT-4 was able to paraphrase the puzzle and the solution in different ways, without losing the meaning or the logic. This suggested that GPT-4 had some degree of understanding and flexibility in dealing with natural language.

However, the researchers also noticed that GPT-4 relied heavily on the familiarity of the wording of the puzzle, rather than on the underlying structure and logic. When they changed some incidental details of the puzzle, such as the names of the characters or the months, GPT-4’s performance deteriorated dramatically.

GPT-4 made several mistakes and logical errors in its reasoning and often gave incorrect or inconsistent answers. It also showed signs of “muscle memory”, by mentioning the original names and months, even though they were not part of the new puzzle.

The researchers concluded that GPT-4’s apparent mastery of the logic was superficial and fragile and lacked the true understanding and awareness necessary for solving the puzzle in a robust and generalizable way.

What are the implications of the findings for the future of AI and its applications?

The findings of the BIS researchers have important implications for the future of AI and its applications, especially in central banking and other domains that require rigorous and careful economic reasoning.

On the one hand, the findings show the remarkable progress and potential of large language models, such as GPT-4, in generating natural language and performing various tasks. These models can be useful and powerful tools for data management, macro analysis, and regulation/supervision, as well as for scientific and creative applications.

On the other hand, the findings also reveal the limitations and challenges of large language models, such as GPT-4, in engaging in true reasoning and understanding. These models may not be able to handle complex and novel problems that demand logic, awareness, and counterfactual thinking, as well as tacit knowledge that goes beyond language.

Therefore, the researchers suggest that caution should be exercised in deploying large language models in contexts that necessitate careful and rigorous economic reasoning. They also suggest that more research and development are needed to overcome the cognitive limits of large language models and to achieve artificial general intelligence.

Summary

In this blog post, we learned:

  • GPT-4 is a large language model that can generate text on almost any topic, given some input words or sentences.
  • Cheryl’s birthday puzzle is a logic puzzle that requires reasoning about knowledge, counterfactuals, and logic.
  • GPT-4 performed flawlessly on the original wording of the puzzle, but failed on the modified wording, suggesting a lack of true understanding and awareness.
  • The findings have implications for the future of AI and its applications, especially in central banking and other domains that require rigorous and careful economic reasoning.

P.S. We’d love to hear your thoughts on this topic! What do you think about the potential and limitations of large language models like GPT-4? Do you see any other applications for these models in your field? Share your insights in the comments below!

Disclaimer: The views expressed in this blog are not necessarily those of the blog writer and his affiliations and are for informational purposes only.

Subscribe on LinkedIn

Follow us on social media @Linkedin

If you found this blog post insightful, don’t forget to subscribe to our website for more updates on Finance. Your subscription will help us continue to bring you the latest insights into the world of finance. And if you think this post could benefit others, please feel free to share it. Let’s spread the knowledge together!

Signup to FinFormed here