Exploring the World of Large Language Models: Overview and List
Explore our list of the leading LLMs: GPT-4, LLAMA, Gemini, and more. Understand what they are, how they evolved, and how they differ from each other.

Explore our list of the leading LLMs: GPT-4, LLAMA, Gemini, and more. Understand what they are, how they evolved, and how they differ from each other.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
In-context learning
As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.
[Provide the input text here]
[Provide the input text here]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?
Title italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
Lorem ipsum dolor sit amet, line first
line second
line third
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?
Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
The rapid growth of Large Language Models (LLMs) has reshaped how we build and interact with AI-powered systems. With new models launching at a steady pace, it can be difficult to keep track of what each one offers—and how they differ.
This guide provides a clear overview of today’s most widely discussed LLMs, highlighting their key characteristics, strengths, and common use cases. Whether you’re comparing open-source and proprietary options or simply trying to understand what sets GPT-4 apart from Claude or LLaMA, this list is a starting point for navigating the current LLM landscape.
Choosing an LLM is just the start. Discover how Lakera secures GenAI across every model and architecture.
The Lakera team has accelerated Dropbox’s GenAI journey.
“Dropbox uses Lakera Guard as a security solution to help safeguard our LLM-powered applications, secure and protect user data, and uphold the reliability and trustworthiness of our intelligent features.”
Disclaimer: This analysis focuses on prominent LLMs from various sources, both open and closed-source, selected for their notable impact and popularity. Due to the vast and ever-evolving field of LLMs, our coverage is not exhaustive. It aims to spotlight models leading in innovation, performance, and usage relevance, providing insights into those most pertinent to professionals and enthusiasts. This selection reflects current trends and recognizes the myriad of other LLMs contributing to the field's growth.
Before diving into specific profiles, it's essential to understand how the size and complexity of a Large Language Model (LLM) are determined. Two critical metrics stand out: parameters and tokens.
As LLMs grow in complexity, they can capture and reflect richer content. Models with more parameters have the bandwidth to absorb and analyze extensive information, sharpening their ability to recognize subtle nuances, relationships, and contextual indicators in the data they process.
OpenAI's GPT-3, launched in June 2020, marks a significant leap in AI language models with its 175 billion parameters, making it one of the most sophisticated models available at its debut.
This third installment in the GPT series enhanced natural language processing capabilities to unprecedented levels, enabling the creation of text—from essays and code to poetry—that rivals human output.
Following GPT-3, OpenAI introduced GPT-3.5 as part of ongoing iterations, fine-tuned performance, and reduced bias to maintain the model's cutting-edge relevance.
GPT-3 is built on the transformer architecture, a deep learning model introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017.
The transformer model utilizes self-attention mechanisms, which allow it to weigh the importance of different words in the input data, significantly improving its ability to understand the context and generate coherent and relevant text outputs.
Notable advancements include:
Some of the standout features of GPT-3 encompass natural language understanding and generation (NLU/NLG), the ability to generate code, translation capabilities, language learning, and extensive customization options.
GPT-4, the fourth iteration of the Generative Pre-trained Transformer series by OpenAI, was released in March 2023.
This release marks a significant leap forward in artificial intelligence language models, building upon the groundbreaking work of its predecessor, GPT-3. GPT-4 further enhances the model's capabilities in understanding and generating human-like text, showcasing remarkable improvements in accuracy, context comprehension, and the ability to handle nuanced instructions.
With advancements in architecture and training methodologies, GPT-4 sets new standards for natural language processing tasks, offering unparalleled versatility across various applications, from content creation to complex problem-solving.
GPT-4 is built on an evolved transformer architecture, maintaining the core principles that made its predecessors successful while incorporating significant innovations to improve performance and efficiency. These include:
Key features of GPT-4 include its vision-enhanced capability, known as GPT-4V, which allows the model to interpret and analyze images provided by users.
This development represents a significant advancement, integrating multimodal inputs (like images) with large language models (LLMs), a move many consider a crucial frontier in AI research.
Multimodal LLMs, like GPT-4V, extend the capabilities of text-only models, enabling them to undertake a broader array of tasks and offer new user experiences through diverse interfaces.
Additionally, GPT-4 showcases superior natural language understanding and generation (NLU/NLG), making it applicable in specialized domains such as legal analysis, advanced technical support, and nuanced creative writing. It also emphasizes improved safety measures and bias mitigation.
Moreover, GPT-4 provides enhanced interactivity and customization options, allowing developers to tailor the model for specific needs or conform to certain styles, thereby increasing its applicability in personalized applications.
OpenAI's ambitious journey towards achieving artificial general intelligence (AGI) is set to take a significant leap forward with the development of GPT-5, the latest iteration in the groundbreaking Generative Pre-trained Transformer series.
As the quest for AGI intensifies, OpenAI's GPT-5 emerges as a focal point of technological advancements, promising to surpass its intelligence, versatility, and capability predecessors. During a presentation at the World Governments Summit in Dubai, OpenAI's CEO, Sam Altman, shed light on the anticipated capabilities of GPT-5, highlighting its potential to significantly outperform predecessors by being "a little smarter...a little better at everything."
This evolution underscores a broader, more effective application across various tasks, driven by OpenAI's aggressive funding pursuits to expedite AI innovation.
GPT-5's training strategy involves leveraging expansive internet datasets and exclusive organizational data to refine reasoning and conversation abilities.
Altman's emphasis on multimodality—integrating speech, images, and eventually video—aims to cater to the increasing demand for versatile AI interactions. Moreover, enhancing the model's reasoning capacity and reliability is central to achieving consistently high-quality outputs, addressing the current limitations faced by GPT-4.
As GPT-5's capabilities continue to unfold, its development signals a significant leap towards realizing AGI, promising a new era of AI that surpasses human intelligence in various domains.
The inclusion of Sora into OpenAI's technology stack is a testament to the organization's pursuit of AGI by enhancing AI's ability to process and generate multimodal data.
By advancing beyond text and images to the dynamic realm of video, OpenAI is addressing the increasing demand for AI systems that can seamlessly operate across different types of content, thus making AI interactions more versatile and reflective of human-like understanding and creativity.
Furthermore, Sora's development, grounded in safety and ethical considerations through adversarial testing and collaboration with domain experts, aligns with OpenAI's approach to responsible AI development. This ensures that as OpenAI progresses towards AGI, it remains committed to mitigating risks associated with misinformation, bias, and other ethical concerns.
Incorporating Sora's groundbreaking text-to-video capabilities into the future outlook, alongside the anticipated advancements of GPT-5, underscores OpenAI's strategy to achieve a more intelligent, versatile, and capable AI.
This combination of linguistic intelligence with visual creativity and understanding is pivotal in OpenAI's mission to realize AGI, promising a new era of AI that not only surpasses human intelligence in analytical tasks but also in creating and interpreting complex visual narratives.
Google's journey in AI innovation is marked by significant milestones that have fundamentally enhanced how billions of people interact with digital information.
From the introduction of BERT, Google's early Transformer model that revolutionized understanding human language, to the development of MUM, which was more powerful and capable of multi-lingual understanding and video content analysis.
These advancements laid the groundwork for Google's exploratory conversational AI service, initially known as Bard and powered by LaMDA. Bard, announced by Google and Alphabet CEO Sundar Pichai in February 2023, aimed to merge the expansive knowledge of the internet with the capabilities of Google's large language models.
However, its initial release in March 2023 revealed significant shortcomings, prompting Google to evolve Bard into a more sophisticated AI model.
Acknowledging the need for a more advanced system, Google introduced PaLM 2 at Google I/O in May 2023, setting the stage for Gemini.
The rebranding of Bard to Gemini in February 2024, following its launch, signified a pivotal shift towards leveraging Google's most advanced LLM technology.
This name change reflected a strategic move to distance the chatbot from its early criticisms and align with the advancements embedded within the Gemini model. The transformation from Bard to Gemini wasn't merely cosmetic but a transition to a more efficient, high-performing AI model, culminating in the release of the most capable version of Gemini in December 2023.
Google's Gemini represents a monumental stride in the evolution of artificial intelligence technology. As part of Google's broader mission to pioneer advancements in AI, Gemini stands out as their most sophisticated and versatile large language model (LLM) to date.
Gemini is designed to cater to a wide range of complexities and is segmented into three distinct versions: Ultra, Pro, and Nano.
This stratification ensures that Gemini's groundbreaking capabilities are accessible across various platforms, from high-demand enterprise applications to on-device functionalities in consumer electronics.
Gemini's groundbreaking architecture is rooted in a transformer model-based neural network, expertly designed to manage complex contextual sequences across diverse data types such as text, audio, and video.
This architecture has been enhanced to include efficient attention mechanisms within the transformer decoder, enabling the models to handle and interpret extensive contextual data adeptly.
The introduction of Gemini 1.5 Pro marks a significant leap in AI capabilities, blending superior efficiency with quality that rivals its predecessor, Gemini 1.0 Ultra. Central to this advancement is incorporating a Mixture-of-Experts (MoE) architecture, elevating the model's ability to dynamically and efficiently process large and complex datasets across various modalities.
Gemini 1.5 Pro, a versatile, mid-size multimodal model, achieves performance on par with Gemini 1.0 Ultra and introduces an innovative approach to long-context understanding.
Initially offering a context window of 128,000 tokens, this model expands the frontier of AI capabilities by providing a context window upgradable to 1 million tokens, available through a private preview in AI Studio and Vertex AI.
This feature sets a new benchmark in the model's ability to process and analyze vast amounts of information, showcasing Gemini's continuous evolution in addressing the challenges and opportunities of modern AI applications.
Gemini's architecture and training strategies culminate in key features that set these models apart, such as extensive contextual understanding, multimodal interactions, multilingual competence, and customization.
Google's roadmap for Gemini aims to redefine AI's potential, focusing on advanced enhancements in planning, memory, and processing to broaden its contextual understanding.
This evolution will refine Gemini's conversational accuracy and depth, maintaining its leadership in AI dialogue systems.
Beyond mere improvements, Gemini aspires to transform AI interaction, leveraging Google's AI heritage to deliver superior assistance and innovation, thus enriching digital experiences globally.
The expansion of Gemini will see its integration into key Google services, including Chrome for an enriched browsing experience and the Google Ads platform, offering novel engagement strategies for advertisers.
This strategic extension underscores Google's commitment to infusing AI across its ecosystem, heralding new user interaction and engagement possibilities.
In February 2023, Meta AI (formerly Facebook AI) unveiled LLaMA, a large revolutionary language model poised to accelerate AI research.
Emphasizing open science, LLaMA delivers compact yet potent models that make top-tier AI research accessible to a broad spectrum of users, including those with limited computational means. This initiative makes AI research more scalable and accessible, granting widespread access to sophisticated AI technologies.
Built on the transformer architecture, LLaMA incorporates cutting-edge enhancements like the SwiGLU activation function, rotary positional embeddings, and root-mean-squared layer normalization to boost its efficiency and effectiveness.
The initial release of LLaMA featured four model variants with parameter counts of 7, 13, 33, and 65 billion. Notably, the developers of LLaMA highlighted that the model with 13 billion parameters surpassed the performance of the significantly larger GPT-3, across most NLP benchmarks.
Initially intended for a select group of researchers and organizations, it got leaked and quickly found its way across the internet by early March 2023, becoming accessible to a broader audience. In response to the widespread dissemination of its code, Meta chose to support the open distribution of LLaMA, aligning with its commitment to open science and broadening the impact of this advanced AI technology.
July 2023 saw the launch of LLaMA-2 in collaboration with Microsoft, marking an evolution of the original model with a 40% increase in training data and enhancements aimed at improving data handling and safety, focusing on bias reduction and model security.
LLaMA 2, still open source and free for research and commercial uses, advances the LLaMA legacy with models available in 7B, 13B, and 70B parameters, including the dialogue-enhanced LLaMA 2 Chat.
Meta enhanced accessibility by releasing model weights and adopting more flexible licensing for commercial applications, demonstrating an ongoing commitment to responsible AI development amidst concerns over bias, toxicity, and misinformation.
The key goals of LLaMA and LLaMA 2 include democratizing AI research by providing smaller, efficient models that open new avenues for exploration and enable specialized applications for users with limited computational resources.
Additionally, the public release of these models promotes collaborative research efforts, addressing critical challenges such as bias and toxicity within AI. Furthermore, this approach supports the creation of private model instances, thereby reducing reliance on external APIs and bolstering data privacy.
By providing open access to LLaMA and LLaMA 2, Meta propels AI research forward and sets a precedent for the responsible development and application of LLMs.
Meta is advancing the development of Llama 3, targeting improvements in code generation and advanced reasoning, aiming to match Google's Gemini model's capabilities.
CEO Mark Zuckerberg stated that while Llama 2 was a leading open-source model, the goal for Llama 3 is to achieve industry-leading status with cutting-edge features. Zuckerberg also outlined Meta's commitment to open-source AI models and detailed organizational changes to enhance AI efforts. He also announced plans to acquire over 340,000 Nvidia H100 GPUs by year's end, with total computing power nearing 600,000 H100 GPUs.
This significant investment underscores Meta's ambition to lead AI research and development.
Anthropic, an AI safety and research company, has taken a significant leap in AI with the development of Claude, focusing on creating reliable, interpretable, and steerable AI systems.
Introduced in March 2023, marking Anthropic's entry into publicly accessible AI models aimed at enhancing AI safety and ethics. Claude emerged as a response to large AI systems' unpredictable, unreliable, and opaque challenges.
Claude 2 followed in July 2023, building on its predecessor's foundation with improved performance and broader application capabilities while emphasizing ethical AI development.
Through the Constitutional AI framework, Claude distinguishes itself with a 52-billion-parameter, autoregressive model trained on a vast unsupervised text corpus, akin to GPT-3's training methodology but with a focus on ethics and safety.
Claude's architecture reflects a commitment to innovation, adopting similar architectural choices to those outlined in Anthropic's research but with a unique twist.
Unlike models trained through reinforcement learning from human feedback (RLHF), Claude uses a model-generated ranking system, aligning with the Constitutional AI approach.
This method starts with a set of ethical principles, forming a "constitution" that guides the model's development and output alignment, showcasing Anthropic's commitment to beneficial, non-maleficence, and autonomous AI systems.
Anthropic's key goals with Claude include democratizing AI research and fostering an environment of open research to collaboratively tackle AI's inherent challenges, such as bias and toxicity.
By offering Claude, Anthropic enables more secure and private model usage, reducing external API dependencies and promoting data privacy.
Claude's versatility shines across various applications:
Anthropic is set to launch Claude 3 in mid-2025, a milestone in AI that promises to push the frontiers of technology with its advanced language processing, reasoning, and versatility.
Integrating a constitutional AI framework, this model aims for an unparalleled 100 trillion parameters to enhance human-like interactions, analytical abilities, and creative output anchored in trust and safety.
The strategic rollout of Claude 3 underscores Anthropic's commitment to a balanced progression in AI, prioritizing both innovation and ethical considerations:
The creation of Claude 3 involves refining its Constitutional Corpus to promote beneficial and secure conversations.
Through external reviews and safety assessments, Anthropic is dedicated to minimizing risks associated with AI advancements, ensuring Claude 3's capabilities are leveraged without unintended consequences.
With the impending launch of Claude 3, Anthropic is focusing on enhancing integration capabilities, broadening use cases, and customizing AI assistants to meet diverse organizational needs.
The company anticipates regular updates to the Claude series, with Claude 3 marking a critical step towards achieving artificial general intelligence, reflecting a conscientious approach to harnessing AI's potential responsibly.
Cohere for AI has introduced "Aya," – a groundbreaking open-source, multilingual large language model.
Aya represents an exciting breakthrough in breaking down language barriers by supporting an impressive 101 languages. Its development addresses a critical concern in AI progress: overcoming the language limitations of existing models to make AI more accessible and equitable for diverse communities worldwide.
Aya's name, the Twi word for "fern," symbolizes endurance and resourcefulness.
This speaks to Cohere's commitment to empowering communities worldwide through innovative, globally accessible AI tools.
With a co-founder who also co-authored the visionary "Attention is All You Need" paper, Cohere for AI leverages its strong background in enterprise AI solutions (semantic search, text generation, summarization, classification) to push the boundaries of language accessibility with Aya.
Aya is built on the foundation of advanced machine learning principles, incorporating the insights from one of the authors of the seminal "Attention is All You Need" paper.
It leverages fine-tuning on a diverse, multilingual instruction dataset to provide state-of-the-art performance across various tasks and languages.
This model's architecture aims to capture cultural nuances and contextual understanding, which departs from existing models that often focus predominantly on English or a limited number of languages.
It is instruction-tuned, not foundational.
Unlike common foundational LLMs, Aya focuses on precisely following instructions, which is key for practical task accomplishment.
Aya represents a pioneering leap in language model technology, distinguishing itself with unparalleled multilingual support for 101 languages, including those like Somali and Uzbek, which were not catered to by existing LLMs.
This broad linguistic range is a step towards true global AI inclusivity, bridging the gap for both widely spoken and under-represented languages.
The model's dataset, enriched with about 204,000 prompts annotated by fluent speakers across 67 languages, ensures Aya's proficiency in capturing cultural nuances and contextual accuracy.
Designed with an enterprise focus, Aya excels in applications such as semantic search, embeddings, text generation, summarization, and classification, demonstrating its broad utility in various business contexts.
Beyond language inclusivity, Aya sets a new standard in instruction-based tasks, understanding and executing complex commands across an array of languages and domains.
Its real-world potential is vast, promising to revolutionize translation services, enable customer support systems tailored to diverse user bases, and facilitate multilingual content creation, among other yet-to-be-discovered applications
The release of Aya showcases tremendous strides towards AI for all. With its focus on linguistic and geographic inclusion, Aya has the potential to democratize AI access and pave the way for far-reaching, globally significant developments.
Hugging Face, often dubbed the GitHub for Large Language Models (LLMs), has promoted an open ecosystem for LLMs.
Initially focusing on natural language processing, the company pivoted significantly towards LLMs in 2020 by creating the Transformers library.
This library, which harmonizes various LLM architectures, has become one of the fastest-growing open-source projects in the field.
Hugging Face's platform, known as the "Hub," is a comprehensive repository of models, tokenizers, datasets, and demo applications (spaces), all available as open-source resources.
This blend of open-source contributions and traditional SaaS offerings has positioned Hugging Face as a pivotal player in democratizing AI development.
In 2022, Hugging Face launched BLOOM, a 176-billion-parameter transformer-based autoregressive LLM, under open licenses.
Trained on about 366 billion tokens, BLOOM stands as a testament to collaborative AI research, the BigScience initiative's main product—a year-long research workshop led by Hugging Face.
This workshop brought together hundreds of researchers and engineers from around the globe, backed by significant computational resources from the French supercomputer Jean Zay.
Additionally, Hugging Face recently introduced a ChatGPT competitor named HuggingChat, further expanding its suite of innovative AI tools.
The company also hosts an Open LLM leaderboard, which provides a platform for tracking, ranking, and evaluating open LLMs and chatbots, including popular models like Falcon LLM and Mistral LLM and emerging projects.
This initiative underscores Hugging Face’s commitment to transparency and progress in AI, facilitating a collaborative environment for AI innovation and evaluation.
Hugging Face is on track to solidify its status as the premier hub for Large Language Models (LLMs), outpacing traditional AI communities in growth and engagement.
With increasing developers and companies integrating its Transformers library and Tokenizers into their workflows,
Hugging Face is lowering the barriers to LLM innovation, much like GitHub revolutionized software development. This platform does not just facilitate access to LLM technologies. Still, it is poised to spur new markets and enhance human-AI collaboration, marking a significant leap forward in technological advancement.
In conclusion, the evolution of MLMs is reshaping the landscape of artificial intelligence, offering unprecedented opportunities for innovation across various sectors.
Exploring the expansive terrain unveils a dynamic interplay of innovation and accessibility. As the field grows, navigating the plethora of available models to find the right fit for specific needs becomes increasingly crucial.
With advancements in multilingual capabilities and the push towards more open and inclusive AI development, platforms are emerging as key facilitators of this technological progress. At the moment, the leading LLMs include:
These platforms democratize access to cutting-edge AI tools and foster a collaborative ecosystem that accelerates innovation.
As we stand on the brink of new AI horizons, the future promises a more interconnected, inclusive, and intelligent world powered by AI systems that are more adaptable, reliable, and aligned with human values.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.
Compare the EU AI Act and the White House’s AI Bill of Rights.
Get Lakera's AI Security Guide for an overview of threats and protection strategies.
Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.
Use our checklist to evaluate and select the best LLM security tools for your enterprise.
Discover risks and solutions with the Lakera LLM Security Playbook.
Discover risks and solutions with the Lakera LLM Security Playbook.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.