The List of 11 Most Popular Open Source LLMs [2025]

In today's digital era, large language models (LLMs) have undergone a significant transformation. They've progressed from struggling with human speech intricacies to generating text that closely resembles human writing. These LLMs now excel not only in contextual conversations but also in programming tasks.

The beginnings of LLMs are closely tied to the open-source movement. Pioneering minds and scholars recognized the potential within these models, while understanding the substantial computing resources needed to train them.

This led to the emergence of open-source alternatives, providing practical options for researchers and developers. In this article, we'll explore the top 11 open-source LLMs, comparing their capabilities. We'll also delve into LLM leaderboards and offer guidance on choosing the right LLM for your needs.

This article covers key aspects of open source LLMs, including notable examples, how to compare models using leaderboards, challenges in developing open source models, best practices for choosing the right LLM for your use case, and a look at what’s next in the open source LLM landscape.

Open-source LLMs are powerful but unguarded. Learn how Lakera secures your apps no matter the model.

‍

‍

‍

The Lakera team has accelerated Dropbox’s GenAI journey.

“Dropbox uses Lakera Guard as a security solution to help safeguard our LLM-powered applications, secure and protect user data, and uphold the reliability and trustworthiness of our intelligent features.”

-db1-

If you’re considering using open-source LLMs, here are some key reads that highlight the risks, attack surfaces, and defensive strategies:

Understand how open LLMs are especially susceptible to prompt injection attacks due to less robust guardrails.
Explore a common threat vector across both open and closed models in our LLM jailbreaking guide.
Learn how open-source models can be targeted even before deployment through training data poisoning.
Discover why traditional content filtering isn’t enough—and how content moderation is evolving for open GenAI systems.
See how AI security shifts when your models are fully exposed and customizable.
Compare open-source model behavior to what happens during in-context learning, especially in loosely supervised deployments.
For insight into maintaining visibility and control, this post on LLM monitoring is essential for any open-source stack.

-db1-

While several proprietary LLMs have carved their niche, the open-source arena is bustling with innovation, presenting models that are not only powerful but also accessible to a broader audience.

Let’s take a look.

1. Llama 2

Llama 2 is a cutting-edge collection of pre-trained and fine-tuned generative text models. The series offers models ranging from 7 billion to 70 billion parameters, making it a state-of-the-art tool. Llama-2-Chat, the fine-tuned versions, are designed explicitly for dialogue applications and have been optimized to provide superior performance compared to open-source chat models. They have been evaluated by humans and have received high marks in both helpfulness and safety, putting them on par with popular closed-source models like ChatGPT and PaLM.

Here are the details of this model:

Parameters: 7B, 13B, and 70B

License: Custom commercial license available at Meta's website.

Release Date: July 18, 2023

Paper: "Llama-2: Open Foundation and Fine-tuned Chat Models"

HuggingFace: https://huggingface.co/meta-llama/Llama-2-7b

Training Database: Llama 2 was pre-trained on 2 trillion tokens from public data, then fine-tuned with over a million human-annotated instances and public instruction datasets. Meta claims that no Meta user data was used in either phase.

Variants: Llama 2 is available in multiple parameter sizes, including 7B, 13B, and 70B. Both pre-trained and fine-tuned variations are available.

Fine-tuning Techniques: The model employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to better align with human preferences, ensuring helpfulness and safety.

2. OpenLLaMA

OpenLLaMA is an open-source replica of Meta AI's famous LLaMA model. The creators of OpenLLaMA have made this permissively licensed model available to the general public. With 7 billion to 65 billion parameters, the OpenLLaMA model is trained on 200 billion tokens.

Here are the details of OpenLLaMA:

Parameters: 3B, 7B and 13B

License: Apache 2.0

Release Date: May 5, 2023

Github: https://github.com/openlm-research/open_llama

Paper: Meet OpenLLaMA: An Open-Source Reproduction of Meta AI’s LLaMA Large Language Model

HuggingFace: OpenLLaMA: An Open Reproduction of LLaMA

Training Database: OpenLLaMA was trained using the RedPajama dataset, which has over 1.2 trillion tokens. The developers followed the same preprocessing and training hyperparameters as the original LLaMA paper.

Fine-tuning Techniques: The OpenLLaMA has the same model architecture, context length, training steps, learning rate schedule, and optimizer as the original LLaMA paper. The main difference between OpenLLaMA and the original LLaMA is the dataset used for training.

3. Falcon

Falcon models were developed by the Technology Innovation Institute in Abu Dhabi. The Falcon family of language models is groundbreaking and state-of-the-art, with the Falcon-40B being the most notable and could compete with multiple close-source LLMs.

Here are the details of Falcon model:

Parameters: 7B and 40B

License: Apache 2.0

Release Date: June 5, 2023

Paper: The Falcon has landed in the Hugging Face ecosystem

HuggingFace: https://huggingface.co/tiiuae/falcon-7b

Variants:

Falcon-40B: A heavyweight in the Falcon family, model is powerful and efficient, outperforming the LLaMA-65B with 90GB of GPU memory.
Falcon-7B: Falcon-7B is a top-performing, smaller version that only needs 15GB for consumer hardware.

Training Database: The Falcon-7B and Falcon-40B models have undergone extensive training using vast data, with 1.5 trillion and 1 trillion tokens, respectively. The primary training data for these models is the RefinedWeb dataset, which includes over 80% of their training material. This dataset is a massive web collection based on CommonCrawl, emphasizing quality and scale.

Techniques Used for Fine-Tuning: Falcon models use multiquery attention to share keys and values for improved inference scalability.

System Requirements: Falcon-40B: Requires ~90GB of GPU memory, and Falcon-7B: Requires ~15GB of GPU memory.

Package Version Requirements: For optimal performance, it's recommended to use the bfloat16 datatype, which requires a recent version of CUDA and is best suited for modern graphics cards.

**💡 Pro tip: Check out Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods.**

4. Dolly 2.0

Dolly, officially known as dolly-v2-12b, is an instruction-following large language model developed by Databricks. This model has been trained on about 15,000 instruction/response fine-tuning records created by Databricks employees using the pythia-12b model on the Databricks machine learning platform.

It covers a range of capability domains from the InstructGPT paper, such as brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Although Dolly is not considered a state-of-the-art model, especially after Databricks acquired MosaicML, it displays exceptional instruction-following behaviour that is not typical of the foundation model it is built upon.

Here are the details of this model:

Parameters: 12B

License: permissive license (CC-BY-SA)

Release Date: Apr 12, 2023

Github: https://github.com/databrickslabs/dolly

HuggingFace: https://huggingface.co/databricks/dolly-v2-12b

Paper: Free Dolly — Introducing the World’s First Truly Open Instruction-Tuned LLM

Variants: There are two versions of Dolly: Dolly-v2-7b, which has 6.9 billion parameters and is based on Pythia-6.9b, and Dolly-v2-3b, which has 2.8 billion parameters and is based on Pythia-2.8b.

Database Used for Training: The dataset used for training the model is databricks-dolly-15k. This dataset contains fine-tuning records created by Databricks employees.

Techniques Used for Fine-Tuning: The model was fine-tuned using data from various domains per the InstructGPT paper.

5. MPT

The MosaicML company has developed MPT-30B, a decoder-based transformer pre-trained on 1T tokens of both English text and code. It's part of the Mosaic Pretrained Transformer (MPT) series, designed for efficient fine-tuning and LLM deployment. MPT-30B boasts features like an 8k token context window, context-length extrapolation through ALiBi, and the FlashAttention mechanism for fast training and inference.

The model is compatible with both HuggingFace and NVIDIA's FasterTransformer, and its size is optimized for deployment on single GPU setups. The MosaicML NLP team developed MPT-30B on their platform using the LLM codebase found in the llm-foundry repository which is recommended for fine-tunning and inference.

Here are the detail of this model:

Parameters: 30B

License: Apache-2.0

Release Date: June 22, 2023

Github: https://github.com/mosaicml/llm-foundry/

HuggingFace: https://huggingface.co/mosaicml/mpt-30b

Paper: MPT-30B: Raising the bar for open-source foundation models

Variants: There are two models available: MPT-7B and MPT-30B. Each model comes with an instruction and a chat version.

Database Used for Training: 1T tokens of English text and code

System Requirements: The model could be deployed on a single GPU, which could be either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.

Package Version Requirements for Training: MosaicML recommends utilizing the MosaicML llm-foundry repository to train and fine-tune the model for optimal results. It's worth noting that the MPT-30B tokenizer used in the training process is identical to the EleutherAI/gpt-neox-20b tokenizer.

**💡 Pro tip: Check out Introduction to Large Language Models: Everything You Need to Know in 2023 (+ Resources). **

6. Guanaco

The Guanaco model is an LLM that utilizes the LoRA fine-tuning technique Tim Dettmers, and the UW NLP team developed. With the help of QLoRA, it's possible to fine-tune a 65B parameter model on a 48GB GPU without sacrificing performance compared to a 16-bit model.

The Guanaco series has outperformed all previous models. Since these models come from the LLaMA model series, they are suitable for commercial use. Although this LLM is not the most advanced model available on the market, this LLM introduces the QLoRA method, which offers an efficient fine-tuning technique and enables personal and smaller businesses to fine-tune large models with up to 65 billion parameters.

Here are the details of this model:

Parameters: 65B

License: MIT License

Release Date: May 24, 2023

Github: https://github.com/artidoro/qlora

Paper: QLoRA: Efficient Fine-tuning of Quantized LLMs

7. Bloom

BLOOM, which stands for BigScience Language Open-science Open-access Multilingual, is a powerful language model that uses large computational resources to generate text based on a given prompt. This model is the biggest in the list, with around 176 billion parameters.

It can produce coherent text almost indistinguishable from human-generated content in 46 natural languages and 13 programming languages. When given input text, BLOOM can continue the text to generate relevant continuations by examining the preceding words.

While the direct application of BLOOM is primarily for text generation, the model can be adapted for tasks such as Information Extraction, Question Answering, and text summarization by framing them as text generation tasks.

Here are the details of this massive model.

Parameters: 176B

License: RAIL License v1.0

Release Date: July 11, 2022

Github: https://github.com/bigscience-workshop/xmtf#models

HuggingFace: https://huggingface.co/bigscience/bloom

Paper: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Compute Infrastructure: This model was trained on the Jean Zay Public Supercomputer with 416 A100 80GB GPUs, 384 across 48 nodes, each with 8 GPUs connected through NVLink 4 inter-gpu connections and 4 OmniPath links. Each node has 512GB of RAM and the GPU has 640GB. Megatron-DeepSpeed, DeepSpeed, PyTorch, and apex are used to train this model.

8. Stanford Alpaca

Alpaca is a language model that follows instructions and generates outputs based on provided data. It has been fine-tuned from a 7B LLaMA model using 52K instruction-following data. In preliminary human evaluations, Alpaca has shown behaviour similar to the text-davinci-003 model in the Self-Instruct instruction-following evaluation suite.

Here are the details:

Parameters: Fine-tuned from a 7B LLaMA model.

License: Apache-2.0

Release Date: Mar 13, 2023

Github: https://github.com/tatsu-lab/stanford_alpaca

Paper: Alpaca: A Strong, Replicable Instruction-Following Model

Training Database: The model was fine-tuned on 52K instruction data using modified techniques from the Self-Instruct paper. Data generation leveraged text-davinci-003, a simplified pipeline, and produced one instance per instruction. Fine-tuning employed the Hugging Face training code.

9. OpenChatKit

OpenChatKit is an open-source toolset that empowers users to create general and specialized chatbot applications. One of the models developed in this platform is the GPT-NeoXT-Chat-Base-20B-v0.16, an LLM with 20B parameters.

This model is fine-tuned from EleutherAI's GPT-NeoX and focuses on dialogue-style interactions. Its primary function is to perform tasks like answering questions, classification, extraction, and summarization. The model has undergone extensive training with over 40 million instructions on 100% carbon-negative computing.

Here are the details:

Parameters: 20B

License: Apache-2.0

Github: https://github.com/togethercomputer/OpenChatKit

Training Database: The model has been enhanced with a set of 43 million top-notch instructions. The exact datasets utilized can be found in the togethercomputer/OpenDataHub repository.

Fine-tuning Techniques: This model has been enhanced and fine-tuned using EleutherAI's GPT-NeoX and feedback data, resulting in better adaptation for human conversation.

System Requirements: To run the GPT-NeoXT-Chat-Base-20B model, a minimum of 41GB of free VRAM is required, with each prompt consuming an additional 100-200 MB. Based on its guide, it is recommended to follow consumer hardware guidelines and use at least one GPU for the operation, although inference can be done with less than 48GB of VRAM.

10. GPT4All

GPT4All is an ecosystem for training and deploying large language models. These models can run locally on CPUs that are designed for consumer use. This system is an assistant-style language model that is instruction-tuned and can be used, distributed, and built upon by anyone, whether they are an individual or belong to an enterprise.

This ecosystem enables users to create and use language models specific to their requirements. These models can operate efficiently on standard CPUs without requiring an internet connection or GPU. Direct installer links are available for macOS, Windows, and Ubuntu.

Here are more details about their models:

Parameters: The model size ranges from 3GB to 8GB and given typical sizes, it could range between 7B to 13B.

License: MIT

Release date: Apr 24, 2023

Github: https://github.com/nomic-ai/gpt4all

Paper: GPT4All: An ecosystem of open-source on-edge large language models

Fine-tuning Techniques: The GPT4All software ecosystem supports multiple Transformer architectures, including Falcon, LLaMA (including OpenLLaMA), MPT (including Replit), and GPT-J.

11. FLAN-T5

FLAN-T5 is an improved version of T5 that is specifically designed for zero-shot and few-shot NLP tasks. With over 1000 additional tasks and multiple languages covered, it is a powerful language model optimized for research purposes, including reasoning and question answering.

Google has released various variants of the model from flan-t5-small with 80 million parameters to flan-t5-xxl with 11 billion parameters. Largest model flan-t5-xxl only support English, German, French languages while smaller models like flan-t5-xl support 50+ languages.

Here are the details of this models:

Parameters: 80M to 11B

License: Apache 2.0

Github: https://github.com/google-research/t5x

Paper: Scaling Instruction-Finetuned Language Models

HugingFace: https://huggingface.co/google/flan-t5-base

Variants: Google's LAN-T5 has been released in 5 variants: the flan-t5-small with 80M parameters, the flan-t5-base with 250M parameters, the flan-t5-large with 780M parameters, the flan-t5-xl boasting 3B parameters, and the largest, flan-t5-xxl, with 11B parameters.

Fine-tuning Techniques: Based on pretrained T5 Fine-tuned with instructions for enhanced zero-shot and few-shot performance.

System Requirements: The required hardware for this model includes Google Cloud TPU Pods, specifically TPU v3 or TPU v4 with a minimum of four chips. Additionally, the model has been trained using the t5x codebase in conjunction with jax, so these package versions are required.

Each of these 11 LLMs comes with distinctive features and specifications that cater to a range of users. Whether your focus is on portability, performance, or budget-friendliness, you will find a model designed to match your requirements.

However, while open-source options offer great advantages, the process of developing and selecting the right model can pose its challenges. Let’s explore them.

Leaderboards to Compare LLMs: Navigating the Ever-Evolving Landscape

LLMs are always changing, as new models keep appearing. While having lots of options is exciting, it can also be a bit overwhelming for developers, researchers, and tech enthusiasts. To help with this changing landscape, LLM leaderboards give us a clear picture of how different language models perform.

Let's take a look at some of them.

Open LLM Leaderboard

The HuggingFace Open LLM Leaderboard is a platform designed to track, rank and assess LLMs and chatbots as they gain popularity. It is unique because it is open to the community, allowing anyone to submit their model for automatic evaluation on the HuggingFace GPU cluster. The only requirement is that the model is a HuggingFace Transformers model with weights available on the Hub. They also allow for model evaluations with delta-weights for non-commercial licensed models, like the original LLaMa release. Users can easily filter models based on their type, whether pre-trained, fine-tuned, instruction-tuned or RL-tuned.

Chatbot Arena Leaderboard

The evaluation process used by the Chatbot Arena Leaderboard involves three benchmarks: 1Chatbot Arena, MT-Bench, and MMLU (5-shot). Models compete on Chatbot Arena in randomized settings, answer multi-turn questions on MT-Bench, and undergo a rigorous multitask accuracy test on MMLU (5-shot) across 57 tasks. The leaderboard is meticulous in its calculation of ratings and scores.

The most recent version can be found here.

AlpacaEval Leaderboard

The AlpacaEval Leaderboard has been designed to evaluate LLMs' ability to follow instructions. The models are evaluated based on their success rate and output length.

Open Source Model Development Challenges

Open-source development for Large Language Models (LLMs) brings numerous advantages, like collaboration, transparency, and innovation. However, building and maintaining these models presents its share of challenges, including:

Cost: Developing and maintaining open-source models can be financially burdensome, particularly for smaller teams and individual developers.

Companies like MosaicML and Databricks are trying to make fine-tuning more accessible through their platforms. Others, like Lambda, are working on reducing GPU costs. Still, cost-related issues persist.

Privacy Issues: Navigating sensitive data collection and storage within open-source models poses privacy hurdles demanding careful management.
Bias/Fairness: Eliminating biases and fostering impartial outcomes in open-source models is a pivotal challenge for achieving equitable AI.
Model Interpretability: Enhancing interpretability of intricate models is vital, particularly as open-source models can resemble "black boxes," obscuring decision-making transparency.
Version Control and Compatibility: The collaborative nature of open-source development introduces hurdles in maintaining version control and ensuring cross-platform compatibility.

Choosing the Right LLM - Best Practices

electing the appropriate Large Language Model (LLM) for your business use case requires a systematic approach. Here's a short step-by-step guide to help you make the right choice:

Identify Your Needs: Define your specific use case and objectives. Are you aiming for content generation, sentiment analysis, translation, or something else?
Analyze Data Requirements: Assess the amount and nature of data available for training and fine-tuning the model. Some LLMs require substantial data for optimal performance.
Consider Model Size: Choose a model size that aligns with your computational resources and performance needs. Larger models might offer better accuracy but come with increased resource demands.
Evaluate Pre-trained Models: Investigate existing pre-trained models that match your use case. Models like GPT-3, BERT, and T5 offer different strengths, such as language understanding or generation.
Customization Flexibility: Determine if the model allows fine-tuning for domain-specific language and nuances. Some models offer greater customization options.
Check Interpretability: Ensure the model provides insights into its decision-making process. Transparent models are crucial for understanding and troubleshooting.
Assess Bias and Fairness: Examine how the model addresses bias in language generation. Consider models that prioritize fairness and inclusivity.
API and Integration: Evaluate the availability and ease of using the model's API. Compatibility with your existing systems is essential for seamless integration.
Resource Requirements: Consider the hardware and computational resources needed to run the model effectively. Choose a model that aligns with your infrastructure.
Test and Validate: Before finalizing, conduct testing and validation to ensure the chosen model performs well on your specific tasks.
Monitor Performance: Regularly assess the model's performance in real-world scenarios and fine-tune if necessary to maintain accuracy and relevance.

What's Next for Open Source LLMs: Summary

Looking ahead, spanning 2023 and beyond, we can expect the open-source LLM landscape flourishing with the regular introduction of new models.

The 11 models that we’ve listed have made powerful language processing accessible, overcoming cost and proprietary hurdles.

However, they also face a multitude of challenges like cost, privacy, bias, and scalability. Users must consider these against benefits like customization, cost savings, and security, compared to proprietary LLMs that offer support but with fees and less flexibility.

Yet, the open-source community is committed to ethical, user-centric models. As technology evolves, these LLMs will progress, driving innovative, collaborative, and responsible AI-driven language processing.

We are excited to see what lies ahead for the AI community and hope you are, too!

The Lakera team has accelerated Dropbox’s GenAI journey.

Not sure how to secure your GenAI application?
Skip the guesswork with expert-recommended policies built by Lakera’s AI security team. Apply them in seconds, fine-tune when you’re ready, and get started with real protection from day one.

Download the Guide

On this page

Text Link

Hide table of contents

Show table of contents

1. Llama 2

2. OpenLLaMA

3. Falcon

4. Dolly 2.0

5. MPT

6. Guanaco

7. Bloom

8. Stanford Alpaca

9. OpenChatKit

10. GPT4All

11. FLAN-T5

Leaderboards to Compare LLMs: Navigating the Ever-Evolving Landscape

Open LLM Leaderboard

Chatbot Arena Leaderboard

AlpacaEval Leaderboard

Open Source Model Development Challenges

Choosing the Right LLM - Best Practices

What's Next for Open Source LLMs: Summary

Related Articles