LLaMA 3.3 vs. Previous Generations: What’s New and Why It Matters

4 min readFeb 26, 2025

LLaMA stands for Large Language Model Meta AI, developed by Meta (Facebook’s AI team).

Model Information:
The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Model developer: Meta

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Llama 3.3 model
Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.

Autoregressive in LLaMA 3.3?

An autoregressive (AR) model is a type of model where future values are predicted based on past values. It is commonly used in time series forecasting and language models like LLaMA, GPT, and BERT.

Key Concept

Auto → “Self”
Regressive → “Depends on past values”

An autoregressive model predicts the next value based on its own previous values. Instead of using independent input features, it recursively takes its own past outputs as inputs.

In Natural Language Processing (NLP), autoregressive models like GPT, LLaMA, and LLaMA Instruct generate text token-by-token.

Example in Text Generation:

Given “The weather is”, the model predicts “sunny”
Then, given “The weather is sunny”, it predicts “today”
This continues iteratively, making the model autoregressive

💡 Key Characteristic: Each token depends on previous tokens.

Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) in LLaMA

LLaMA models (like LLaMA 3 and LLaMA Instruct) are trained using a two-step process to make them more useful, aligned, and human-friendly:

1️⃣ Supervised Fine-Tuning (SFT)

Supervised Fine-Tuning is the first step where the model is trained on labeled datasets with correct input-output pairs.

How it Works

The model is given human-annotated datasets (e.g., question-answer pairs, instructions, or summaries).
It learns from ground truth responses instead of randomly predicting text.
The loss function (like Cross-Entropy Loss) helps adjust the model weights.

Example

📌 Dataset Example (Instruction → Expected Response)

📌 Training Process

Model takes the input ("Explain black holes...").
Compares its generated response with the ground truth ("A black hole is...").
Adjusts weights to minimize the difference between its answer and the correct one.

🎯 Goal: Train the model to follow instructions well using high-quality supervised data.

2️⃣ Reinforcement Learning with Human Feedback (RLHF)

After SFT, the model might still generate incorrect, biased, or misleading responses. RLHF helps improve alignment by using human preference data.

How it Works

RLHF involves three steps:

1️⃣ Data Collection (Ranking Responses)

Humans are given multiple model responses to the same prompt.
They rank responses from best to worst based on helpfulness, accuracy, and safety.

2️⃣ Reward Model Training

A separate reward model is trained to predict human preferences.
It learns from ranked examples and assigns a “reward score” to new responses.

3️⃣ Fine-Tuning with Reinforcement Learning (PPO Algorithm)

The main LLaMA model is trained using Proximal Policy Optimization (PPO).
It generates responses → gets a reward → updates itself to maximize rewards.

Example

📌 Prompt: “How to fix a flat tire?”

The reward model learns that detailed step-by-step instructions are preferred.
The LLaMA model updates itself to generate better responses over time.

🎯 Goal: Make LLaMA more aligned, helpful, and less harmful by learning from human preferences.

Comparison: SFT vs. RLHF

Which Models Use SFT & RLHF?

LLaMA 3 Base → Pretraining.
LLaMA 3 Instruct → Uses SFT + RLHF (better at following instructions).

Final Takeaways

Supervised Fine-Tuning (SFT) teaches the model basic instruction-following using labeled data.
RLHF refines the model further by aligning responses with human feedback.
LLaMA 3 Instruct models go through both steps, making them better at chat, reasoning, and safety.

Understanding LLaMA 3, Its Versions, and the History of LLaMA Models

1. Why is it called LLaMA 3? What were LLaMA 1 and 2?

The number (1, 2, 3, etc.) represents different generations of the model.

🔹 LLaMA 1 (Feb 2023)

First release by Meta.
Sizes: 7B, 13B, 33B, 65B.
Not instruction-tuned (needed extra fine-tuning to follow tasks well).

🔹 LLaMA 2 (July 2023)

Improved training dataset (2T tokens).
Added LLaMA-2-Chat (trained with RLHF for better conversations).
Sizes: 7B, 13B, 70B.
Openly available for commercial use.

🔹 LLaMA 3 (2024)

Even larger dataset (~15T tokens).
Better instruction-following and reasoning.
More efficient inference (cheaper to run).
Includes LLaMA 3 Base (pretrained model) & LLaMA 3 Instruct (fine-tuned for chat).

2. What is the difference between LLaMA 3, 3.1, 3.3, etc.?