The 'LLM Architecture Gallery' illustrates the architectures of various large-scale language models such as GPT, Llama, and Grok. – GIGAZINE

Spread the love

While various large-scale language models exist, such as OpenAI's GPT series, xAI's Grok, and Meta's Llama, the ‘ LLM Architecture Gallery ,’ which illustrates the structures of these models, is publicly available online.

LLM Architecture Gallery | Sebastian Raschka, PhD

For example, clicking on ‘Llama 4 Maverick’ displayed a diagram showing the architecture. Click to enlarge the diagram.

The enlarged diagram looks like this. Clicking ‘View in article’ in the upper right corner of the screen will allow you to read Mr. Rashka's explanation of each model.

Rashka explains the similarities and differences between various large-scale language models, comparing them with other models.

For example, Llama 4 employs an architecture very similar to DeepSeek V3, and both use a machine learning approach called ‘ Mixture-of-Experts (MoE) ‘. The main difference is that Llama 4 uses Grouped-Query Attention (GQA) to improve the efficiency of the attention mechanism in the Transformer model, while DeepSeek V3 uses Multi-Head Latent Attention (MLA) .

While GPT-OSS and Qwen3 use similar components, there are differences in the number of Transformer blocks used for various processing tasks: GPT-OSS has 24, while Qwen3 has 48, as well as differences in

While Grok 2.5 has a fairly standard structure overall, it is characterized by having only eight individual subnetworks (experts) that make up MoE, which is considerably fewer than Qwen3's 128. Since the new design recommends using more experts, Grok reflects the old trend. Mr. Rashka also explained that it is interesting that Grok uses an additional

Related Posts:
<< Next
What would you like to do if casinos were legalized in Japan? We asked GIGAZINE readers.
Prev >>
‘LATENT’ teaches tennis skills to humanoid robots using incomplete human motion data.
in AI,   Web Service, Posted by log1h_ik

source

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top