Understanding Large Language Models (LLMs): Scaling, Applications & The Future (Part 2)

Rachit Narang GEN AI 26.11.2024 | 0

Highlights: Large Language Models (LLMs) have become essential tools in Natural Language Processing, powering applications from chatbots to complex data analysis.

This is Part 2 of the post we posted earlier where we discussed the LLM by Meta called the LLaMa (Large Language Model Meta AI). We also learned how LLaMa was trained and how models like ChatGPT are trained. In today’s post, we’ll continue to discuss LLMs touching upon scaling laws, real-life applications of LLMs, the concept of System 1 and System 2 thinking, some of the current limitations faced by LLMs today, and the ongoing research that is shaping this technology’s future. So let’s begin!

Tutorial Overview:

Scaling Laws in LLMs
Real-World Applications of LLMs
Multimodal Capabilities: Vision & Language Models
System 1 vs System 2 Thinking
Self-Improvement in AI: Lessons from AlphaGo
Custom LLMs: The GPTs App Store by OpenAI
The Emergence of LLM Operating System (LLM OS)

1. Scaling Laws in LLMs

One of the fascinating aspects of Large Language Models (LLMs) is a concept known as scaling laws. Scaling laws reveal that the performance of these models is primarily governed by just two parameters: \(N\), the number of parameters in the network, and \(D\), the amount of text data used for training. These two variables play a crucial role in determining how well an LLM performs, essentially giving us a roadmap for improving model capabilities.

What’s particularly exciting is that these scaling curves do not show signs of “saturation” or “topping out,” meaning that model performance continues to improve as we increase the size of the network and the amount of data. This observation implies that, even without changing hardware architectures, simply increasing the model size and feeding it more data can lead to substantial improvements in performance.

Have a look at the graph below.

The graph above illustrates how model size correlates with loss, a measure of prediction error. As model size increases, loss decreases, indicating better performance. This means we can expect more “intelligence” from models as we scale up, effectively achieving better results “for free” through scaling—provided we have access to sufficient computational resources and data. With advancements in hardware and data availability, these scaling laws promise continued progress in the capabilities of LLMs, paving the way for more powerful and versatile AI applications.

More Intelligence, Better Performance

Empirical evidence suggests a strong correlation between model size, training duration, and the accuracy of Large Language Models (LLMs) across various tests and evaluations. As we increase model size and training data, moving from models like GPT-3.5 to GPT-4, we consistently see performance improvements across a wide range of assessments. This scaling effect implies that by merely expanding the model’s size, training on more data, and providing more computational resources, we can almost “for free” boost the model’s accuracy in diverse areas.

The chart above demonstrates how GPT-3.5 and GPT-4 perform on a variety of standardized tests, from AP Calculus and AP English Literature to GRE Writing and SAT exams. Notably, GPT-4 consistently outperforms GPT-3.5, underscoring the scalability advantage in language models. This trend explains the ongoing “Gold Rush” in AI, where companies are investing heavily in larger GPU clusters and more extensive datasets, driven by confidence in scaling as a reliable path to creating more capable models.

Scaling remains one of the most promising strategies for advancing AI, as it allows for predictable gains in capability across domains. This approach doesn’t necessarily require groundbreaking innovations in architecture; instead, it relies on expanding computational power and data availability, providing a straightforward yet effective route to more intelligent and versatile language models.

2. Real-world applications of LLMs

To illustrate the evolving capabilities of Large Language Models, let’s look at a practical example involving information retrieval. In this scenario, I asked ChatGPT to collect information about Scale AI’s funding rounds, including dates, amounts raised, and valuations, and organize it into a table.

Through its training and fine-tuning, ChatGPT understands that for queries like this, it should not rely solely on pre-trained knowledge. Instead, it uses external tools, such as a web browser, to gather real-time information. Just as a human would, the model initiates a search, processes the results, and synthesizes the data into a structured format.

In this case, ChatGPT used Bing Search to look up relevant funding data and compiled it into a table listing Series A through E funding rounds, with details on dates, amounts raised, and valuations. For Series A and B, it could only find the amounts raised, and it informed the user that the valuation data was unavailable. This transparency and accuracy-checking, including citations and source references, help ensure the reliability of the response.

This example demonstrates how language models are evolving beyond simple text generation. With integrated tools like browsing, they now mimic human research behaviors, enhancing their capacity to provide accurate, up-to-date information for practical applications. This shift from static knowledge to dynamic tool usage marks a significant advancement in the versatility and utility of large language models in real-world tasks.

3. Multimodal Capabilities: Vision & Language Models

A major advancement in LLMs is their growing multimodal capability—the ability to work with both text and images. This capability allows models not only to generate images but also to understand and interpret them.

In a demonstration by Greg Brockman, one of the founders of OpenAI, ChatGPT was shown an image of a simple hand-drawn sketch for a website labeled “My Joke Website.” Remarkably, ChatGPT could interpret the sketch and generate the corresponding HTML and JavaScript code to create a functional website. The resulting website allowed users to read jokes and click a button to reveal punchlines, demonstrating how the model can translate visual information into actionable code.

This example highlights the practical power of multimodality in LLMs, as models like ChatGPT can now take input in multiple forms, including images alongside text. This means users can input sketches, diagrams, or other visual references, and the model can produce relevant responses based on those images.

As multimodality continues to evolve, these models are likely to expand further into audio and other data types, enabling a richer and more versatile human-computer interaction. This progress makes it possible to seamlessly integrate text, images, and potentially audio into AI applications, vastly expanding the range of tasks language models can accomplish.

4. System 1 vs System 2 Thinking

The concept of System 1 and System 2 thinking was popularized by Daniel Kahneman’s book Thinking, Fast and Slow. It describes two distinct modes of thinking within the human brain:

System 1 Thinking: This is the quick, instinctive, and automatic part of our cognition. It is emotional and unconscious, allowing us to make rapid decisions with minimal effort. For example, answering “What is 2 + 2?” doesn’t require actual calculation because it’s cached knowledge. System 1 operates similarly in situations like speed chess, where instinctive moves are based on pattern recognition without deliberate analysis.
System 2 Thinking: This is the slower, deliberate, and rational part of our brain. It’s used for complex decision-making and logical reasoning, requiring conscious effort. For example, solving “What is 17 x 24?” involves a step-by-step calculation, engaging a more focused and conscious process. In chess competitions, System 2 thinking involves analyzing each possible move deeply and creating a decision tree of outcomes.

Limitations of LLMs based on System 1 Thinking

LLMs, such as ChatGPT, primarily operate with a System 1-like approach. They rely on fast, instinctive predictions based on prior patterns without deep reasoning. When prompted, the model generates words in a sequence, simply predicting the next word based on context. This process is much like the cartoon image shown below, where the character is laying down tracks as he moves forward—LLMs “lay down” words one at a time, predicting each based on the prior sequence.

Currently, these models lack System 2 capabilities. They don’t analyze, reason through complex problems, or create a “tree of possibilities.” Each word prediction is immediate, with each step taking roughly the same time, regardless of the complexity of the task.

The Future of LLMs Based on System 2 Thinking

There is growing interest in expanding LLM capabilities to include System 2-like thinking. This would involve adding reasoning and deliberation functions, enabling the model to handle complex problem-solving tasks with a structured approach, similar to how humans tackle difficult calculations or strategic game moves. Adding System 2 to LLMs could enable them to go beyond simple pattern matching, allowing them to analyze and reason about information in a more human-like way.

The ‘Tree of Thoughts’ Approach in System 2 Thinking

Current language models like ChatGPT primarily function with System 1 thinking, which is rapid, intuitive, and often automatic. However, researchers are exploring ways to enable System 2 thinking in these models—an approach that involves deliberate, reflective problem-solving. One promising concept is the “Tree of Thoughts” (ToT) method, inspired by how humans tackle complex tasks.

In the Tree of Thoughts model, instead of generating a response immediately, the language model would map out possible thought paths (like a tree in chess decision-making) and evaluate different approaches before producing an answer. This would allow the model to reflect, rephrase, and iterate through possible solutions, ultimately arriving at a more confident and accurate response.

Imagine a time-accuracy graph, where time (x-axis) is used as a resource to increase accuracy (y-axis). With current language models, accuracy doesn’t significantly improve with more time. However, a model that can think through problems, using additional time for reflection, could yield a response with higher accuracy and reliability. Though this capability isn’t yet available, it’s an area of active research. Achieving System 2 thinking in language models would mark a significant advancement, enabling AI to handle complex, multi-step reasoning tasks and produce well-considered responses, rather than quick predictions.

5. Self-Improvement in AI: Lessons from AlphaGo

The development of AlphaGo by DeepMind offers a fascinating blueprint for AI self-improvement. AlphaGo, a program designed to master the game of Go, underwent two major stages in its learning:

Imitation Learning: In its first stage, AlphaGo learned by imitating expert human players. DeepMind trained the model on a large dataset of games played by highly skilled players, teaching it to replicate the strategies of top human competitors. This approach enabled AlphaGo to become a strong player, but it was limited by the quality of human data—its performance was capped at the level of the best human players.
Self-Improvement through Self-Play: To surpass human limitations, DeepMind introduced a second phase where AlphaGo engaged in self-play. By playing millions of games against itself in a closed, controlled environment with a clear reward function (winning the game), AlphaGo could evaluate its strategies and improve iteratively. This reward-driven approach allowed it to surpass human players and eventually become the best Go player in the world, achieving new strategies beyond human reach.

The Challenge of Self-Improvement in LLMs

The success of AlphaGo’s self-improvement stage raises an intriguing question: What would self-improvement look like for LLMs? Currently, LLMs are trained primarily through imitation learning. Human labelers create training data, and the models learn by replicating human responses. However, this imitation approach limits the model to human-level accuracy.

The major challenge in implementing a self-improvement phase for LLMs lies in the lack of a clear reward criterion. Language is an open domain with a vast range of tasks, and there is no straightforward, universally applicable reward function like “winning the game” in Go. Without an automatic and objective way to determine if a generated response is “good” or “bad,” it becomes difficult for LLMs to improve beyond human-like responses.

That said, in narrow domains where specific reward criteria can be defined, self-improvement may be achievable for language models. However, developing a general self-improvement mechanism for open-domain language modeling remains an open question in the field. Researchers continue to explore how LLMs might evolve autonomously, but significant breakthroughs are needed to replicate the success of AlphaGo’s self-play in the complex, subjective world of language.

6. Custom LLMs: The GPTs App Store by OpenAI

Recently, Sam Altman announced the launch of a GPTs App Store at OpenAI’s Dev Day. This initiative introduces a new layer of customization for large language models, allowing users to create their specialized versions of GPTs tailored to specific needs.

Currently, there are two main ways to customize a GPT on ChatGPT:

Custom Instructions: Users can specify detailed instructions to personalize how their custom GPT should behave, including what it should prioritize or avoid in responses.
Knowledge Upload: Users can add files to provide the GPT with custom reference material. This process, called Retrieval-Augmented Generation (RAG), allows the model to retrieve and reference information from the uploaded documents to generate responses. Instead of browsing the internet, the custom GPT can “browse” these files to pull in relevant information, making it useful for tasks requiring specific, document-based knowledge.

These customization options open up possibilities for creating GPTs that specialize in particular domains, rather than relying on a single, general-purpose model. In the future, this could expand further to include fine-tuning—where users train the model on their datasets, making it even more specialized and capable of handling nuanced tasks.

OpenAI’s GPTs App Store represents a step toward a modular ecosystem of language models, each one optimized for unique applications, thereby transforming how users interact with AI by allowing them to create task-specific, expert models tailored to individual needs.

7. The Emergence of LLM Operating System (LLM OS)

Rather than viewing LLMs as simple chatbots or text generators, we can begin to think of them as the kernel of an emerging operating system—a foundational process that orchestrates multiple resources, tools, and memory for complex problem-solving. This LLM OS concept envisions an integrated environment where the language model becomes a central coordinator, handling a wide range of capabilities. Here’s how it might look in a few years.

Core Abilities:
- Text Generation and Comprehension: LLMs can read and generate text across all subjects, potentially possessing more knowledge than any single human.
- Web and Local File Access: Through browsing and retrieval-augmented generation (RAG), they can pull in current information from the internet or user-provided files.
- Existing Software Integration: They could leverage conventional tools like calculators, Python interpreters, and command terminals, using them as needed for complex tasks.
Multimodal Capabilities:
- Image, Video, and Music Generation: Beyond text, these LLMs could generate or interpret images, videos, and audio—allowing for versatile, creative outputs.
- Audio Interactivity: They could potentially hear and speak, creating a more interactive user experience.
Advanced Reasoning (System 2 Thinking):
- Long-Term Reflection: By adopting System 2 thinking (as discussed in the “Tree of Thoughts” concept), they could engage in deeper reasoning over extended periods, processing complex problems more accurately.
- Self-Improvement: In specialized domains with clear reward criteria, LLMs could employ self-improvement techniques, iteratively refining their responses.
Customization and Fine-Tuning:
- Specialized GPT Models: LLM OS could host a vast ecosystem of specialized GPTs or LLMs, each fine-tuned for specific tasks or industries. This resembles an app store filled with various AI experts, each optimized for particular applications.
Memory and Resource Management:
- Context Window as RAM: Just like RAM in traditional computers, the context window in LLMs acts as their working memory. It’s a finite resource that holds relevant information for generating responses, requiring efficient paging of data in and out.
- Disk or Internet Access as Storage: Just as we store data on disks, the LLM OS can access external knowledge through browsing or local files, adding depth and context to responses.
Equivalents to Modern OS Ecosystems:
- Proprietary LLM OS Platforms: Much like desktop OS options like Windows or macOS, we may see proprietary LLM platforms developed by different companies, each with its unique ecosystem and capabilities.

This LLM OS vision hints at a future where language models act as sophisticated, multifunctional operating systems, coordinating resources and dynamically adjusting to user needs. The result could be a transformative ecosystem where individuals and businesses interact with specialized assistants and applications built upon a powerful, adaptable AI kernel. This paradigm shift would revolutionize how we think about computing, placing LLMs at the heart of a new digital ecosystem.

On that note, it’s time to wrap up. Let’s do a quick revision of this post.

Scaling laws in LLMs dictate that the performance of these models is only based on the number of parameters and the size of the dataset
Scaling can improve intelligence as well as performance of LLMs across domains
A key application of LLMs lies in gathering information from the web, also called, information retrieval via models like ChatGPT
LLMs are increasing their multimodal capabilities with their new-found ability to read both text and images
Currently, LLMs are utilizing only one type of thinking used by the human brain, i.e., System 1 thinking
New-age LLMs are being built on System 2 thinking approaches relying on slow, deliberate, and rational decision-making instead of quick and automatic
A promising concept of applying the System 2 approach is the ‘Tree of Thoughts’ concept
With the rise of area-specific GPTs, the world of LLMs is further fine-tuning, self-improving, and becoming customizable for complex problem-solving

Summary

So, folks, this was Part 2 of our post regarding Large Language Models (LLMs). We hope you enjoyed the two parts. Apart from the information we shared, do go online and scout through research papers, videos, and articles to understand the rapid advancements happening in LLMs. In the meantime, we’ll prepare another rocking tutorial post for you. We’ll catch you soon. Take care! 🙂

Understanding Large Language Models (LLMs): Scaling, Applications & The Future (Part 2)