Meta Unveils Llama 3.1

Introducing Meta’s Llama 3.1
Meta’s Llama 3.1 405B stands as a groundbreaking model, rivaling the top AI models with its state-of-the-art capabilities in general knowledge, steerability, mathematics, tool use, and multilingual translation. The release of the 405B model promises to drive innovation with unprecedented opportunities for growth and exploration. This latest generation of Llama is poised to ignite new applications and modeling paradigms, including synthetic data generation for the enhancement and training of smaller models, as well as model distillation—a feat never before achieved at this scale in the open-source community.
Enhanced Models and New Licensing
With this major release, Meta introduces upgraded versions of the 8B and 70B models. These models are multilingual, feature an extended context length of 128K, exhibit state-of-the-art tool use, and boast superior reasoning capabilities. These advancements support complex tasks such as long-form text summarization, multilingual conversational agents, and coding assistants. Additionally, Meta has updated its licensing terms to allow developers to utilize outputs from Llama models, including the 405B, to enhance other models. True to Meta’s commitment to open-source principles, these models are now available for the community to download from llama.meta.com and Hugging Face, and can be developed immediately on Meta’s extensive ecosystem of partner platforms.
Model Evaluations
For this release, Meta evaluated performance across over 150 benchmark datasets spanning numerous languages. Extensive human evaluations were conducted to compare Llama 3.1 with competing models in real-world scenarios. Experimental results suggest that the flagship model competes with leading foundation models like GPT-4, GPT-4o, and Claude 3.5 Sonnet across various tasks. Furthermore, the smaller models also perform competitively with closed and open models of similar parameter sizes.
Model Architecture
Training Llama 3.1 405B on over 15 trillion tokens posed a substantial challenge. To manage training at this scale and achieve efficient results, Meta optimized the entire training stack and deployed over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this magnitude. The design choices focused on maintaining scalability and simplicity in the model development process. Meta opted for a standard decoder-only transformer model architecture with minor adaptations to ensure training stability. The iterative post-training procedure, involving supervised fine-tuning and direct preference optimization, enabled the creation of high-quality synthetic data and enhanced each capability’s performance. Improvements in data quantity and quality for pre- and post-training were achieved through meticulous pre-processing, curation, and rigorous quality assurance and filtering methods.
To support large-scale production inference for the 405B model, Meta quantized the models from 16-bit (BF16) to 8-bit (FP8) numerics, effectively reducing compute requirements and enabling the model to run on a single server node.
Instruction and Chat Fine-Tuning
With Llama 3.1 405B, Meta aimed to enhance the model’s helpfulness, quality, and detailed instruction-following capabilities while maintaining high safety standards. The main challenges were supporting more capabilities, the 128K context window, and increased model sizes. In post-training, Meta produced final chat models through multiple rounds of alignment, involving Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). Synthetic data generation was used to create the majority of SFT examples, iterating to produce higher quality data across all capabilities. Various data processing techniques were employed to filter this synthetic data to the highest quality, enabling the scaling of fine-tuning data across capabilities.
Meta carefully balanced the data to maintain high quality across all capabilities, ensuring the model’s performance on short-context benchmarks even when extending to 128K context. The model continues to provide highly helpful answers while incorporating safety mitigations.
The Llama System
Llama models are designed to function as part of a broader system that orchestrates multiple components, including external tools. Meta’s vision extends beyond foundational models, providing developers with a flexible system to create custom offerings aligned with their vision. This approach began last year with the introduction of incorporating components outside the core LLM.
Read the complete article here
Also download this AI to run locally here at huggingface.co and here llama.meta.com
Visit our info customgpt so you can learn more about Meta Llama 3.1 here