What is a Large Language Model

What LLMs Are

In traditional software, given an input, a program produces a definite output. Function calls have clear return values. Conditional branches have explicit rules.

Large Language Models (LLMs) break this paradigm. An LLM is not a set of if-else rules — it's a probability model that, given some text, predicts what's most likely to come next.

The Core Mechanism: Next Token Prediction

At its heart, an LLM does exactly one thing:

Given all preceding text, predict the most probable next token.

For example, given "The weather today is", it might produce:

  • "nice" → 65% probability
  • "bad" → 15% probability
  • "really" → 12% probability
  • others → 8%

It picks one, appends it to the text, and predicts the next token. This loop continues until a complete response is generated.

This is called autoregressive generation — producing output one token at a time. The "typing" effect you see in ChatGPT is a direct manifestation of this process.

How It Differs from Traditional Software

Traditional SoftwareLLM
Driven byRulesProbabilities
Output determinismSame input → same outputSame input → possibly different output
Capability sourceHuman-written logicPatterns learned from massive data
Error characteristicsCrashes, exceptions, logic bugs"Hallucinations" — confidently wrong answers
BoundariesWell-defined (documented)Fuzzy (requires exploration and testing)

The key mindset shift: an LLM doesn't "understand" your intent — it performs statistical pattern matching. Its responses look like "thinking," but underneath it's computing probabilities.

What Makes Them "Large"

The "large" in Large Language Model refers to parameter count. Think of parameters as internal knobs — the more parameters, the more complex language patterns the model can capture.

  • GPT-2 (2019): 1.5 billion parameters
  • GPT-3 (2020): 175 billion parameters
  • GPT-4 (2023): undisclosed, estimated over 1 trillion

This explosive growth in parameters led to an interesting phenomenon: emergent abilities. When models get large enough, they suddenly exhibit capabilities that smaller models don't have — logical reasoning, code generation, multilingual translation. It's like heating water to 100°C and watching it suddenly boil — a phase transition from quantity to quality.

Key Takeaways

  1. LLMs are probability machines, not knowledge bases. Their answers are "most likely," not "certainly correct."
  2. Their capabilities come from training data. Domains not covered in training will yield poor or fabricated results.
  3. They don't execute logic. What looks like reasoning is pattern matching. This explains why they sometimes make surprisingly basic mistakes.
  4. Non-determinism is a feature, not a bug. The same question may yield different answers — that's a natural consequence of probabilistic sampling.