Note: This is a research note supplementing the book Unscarcity, now available for purchase. These notes expand on concepts from the main text. Start here or get the book.
Large Language Models: The Autocomplete That Ate the World
Here’s the dirty secret of large language models: at their core, they’re just playing the world’s most sophisticated game of “guess the next word.” That’s it. The same basic principle behind your phone’s keyboard suggestions is now writing legal briefs, diagnosing diseases, and generating code that powers Fortune 500 companies.
Except your phone predicts the next word based on a dictionary and some basic statistics. GPT-4 predicts the next word based on having “read” essentially the entire internet, and it does so with such uncanny coherence that we’ve started giving these systems names like they’re colleagues. “Claude helped me with that report.” “I asked Gemini to review my code.” We’re personifying prediction engines.
If that doesn’t strike you as either terrifying or miraculous (or both), you haven’t been paying attention.
What LLMs Actually Are (And Aren’t)
The Jargon Decoder
Let’s get the vocabulary out of the way, because the AI industry loves its alphabet soup:
Token: The atomic unit of text that LLMs process. Not quite a word, not quite a character. “Strawberry” is one token; “unbelievable” is two (“un” + “believable”). A rough rule: one token equals about 0.75 English words, or about 4 characters. When someone says “GPT-4 has a 128K context window,” they mean it can process roughly 96,000 words at once—about 300 pages of text.
Transformer: The neural network architecture that powers all modern LLMs. Invented by Google researchers in 2017, the transformer uses a mechanism called “attention” that lets the model consider relationships between all parts of the input simultaneously. Before transformers, models processed text word-by-word, like reading through a keyhole. Transformers let models see the whole page at once.
Attention: The core innovation that makes transformers work. When processing the word “it” in “The cat sat on the mat because it was tired,” attention mechanisms let the model figure out that “it” refers to “the cat,” not “the mat.” It does this by computing relationships—attention weights—between every word and every other word. The model learns which relationships matter.
Parameters: The adjustable numbers inside a neural network that get tuned during training. More parameters generally means more capacity to learn complex patterns. GPT-3 had 175 billion parameters. GPT-4 reportedly has over a trillion. DeepSeek-V3 has 671 billion, but only activates 37 billion at a time (more on that later).
Fine-tuning: Taking a pre-trained model and specializing it for a specific task. The base model learns general language patterns from internet text; fine-tuning teaches it to follow instructions, refuse harmful requests, or excel at coding. It’s like the difference between a general education and professional training.
Inference: Using a trained model to generate outputs. Training is the expensive part (billions of dollars for frontier models); inference is what happens when you type a question and the model responds. The economics of LLMs depend on making inference cheap.
The Transformer Architecture: The Engine Under the Hood
Think of a transformer like a concert hall full of musicians, all listening to each other simultaneously.
In older architectures (recurrent neural networks, or RNNs), processing text was like a bucket brigade—information passed from one position to the next, sequentially. Word 50 had to wait for words 1-49 to be processed first. This created bottlenecks and made it hard to remember distant context.
Transformers demolished this limitation. Using the attention mechanism, every position in the sequence can attend to every other position directly, in parallel. It’s like everyone in the orchestra can hear everyone else at once, adjusting their playing accordingly.
The famous 2017 paper “Attention Is All You Need” introduced this architecture, and its title wasn’t hyperbole—they literally threw away everything else (convolutions, recurrence) and replaced it with pure attention. The results were stunning: faster training, better performance, and the ability to scale to sizes that previous architectures couldn’t handle.
The key insight: attention computes a weighted sum of values, where the weights are determined by how relevant each position is to the current position. For each word, the model asks: “Which other words should I pay attention to in order to understand this one?”
The mathematical trick—queries, keys, and values—comes from information retrieval:
- Query: “What am I looking for?”
- Key: “What information does each position have?”
- Value: “If that position is relevant, what should I take from it?”
Match queries against keys, use the match scores to weight values, and you get context-aware representations. Stack this mechanism into multiple layers with multiple “heads” (parallel attention computations focusing on different relationship types), and you get the modern LLM.
The Emergence Problem
Here’s the genuinely weird part: LLMs exhibit capabilities that weren’t explicitly programmed and sometimes weren’t present in smaller versions of the same architecture.
Train a small language model on internet text, and it predicts words. Train a slightly larger one, and it still just predicts words. Keep scaling, and somewhere around 100 billion parameters, the model starts exhibiting behaviors nobody explicitly taught it:
- In-context learning (few-shot prompting): showing it examples and having it generalize
- Chain-of-thought reasoning: working through problems step by step
- Code execution: understanding and generating programming languages
These are “emergent capabilities”—they appear discontinuously as models scale. One day the capability isn’t there; the next day it is. Researchers call this phase transition, borrowing the term from physics (like water suddenly becoming ice at a critical temperature).
The honest answer to “why does this happen?” is: we don’t fully know. LLMs are one of the most significant technologies humanity has ever created, and we don’t completely understand how they work. We know what goes in (text) and what comes out (predictions). The middle is still largely a black box.
The Evolution: From GPT-4 to the Current Frontier
2023: The GPT-4 Baseline
When OpenAI released GPT-4 in March 2023, it set the benchmark that everyone else has been chasing. Compared to GPT-3.5, it was:
- More capable at complex reasoning
- Better at following nuanced instructions
- Able to process images (multimodal)
- Less prone to obvious hallucinations
- More “aligned” with human preferences
The rumored spec: over a trillion parameters in a mixture-of-experts architecture (meaning only a subset activates for any given query). Training reportedly cost over $100 million in compute alone.
2024: The Reasoning Revolution
2024 was the year models learned to think—or at least, to simulate thinking.
OpenAI o1 (September 2024): The first “reasoning model.” Unlike GPT-4, which generates answers immediately, o1 produces explicit chains of thought before answering. It “thinks” for seconds or minutes, working through problems step by step. This made it dramatically better at math, coding, and logic puzzles. The tradeoff: it’s slower and more expensive per query.
Claude 3.5 Sonnet (June 2024): Anthropic’s flagship positioned itself as the coder’s best friend—excelling at reading, writing, and debugging code while maintaining the conversational sophistication of GPT-4. The company also introduced “computer use”: Claude could operate a computer by looking at screenshots and simulating mouse/keyboard input.
Gemini 2.0 (December 2024): Google’s answer, featuring native multimodality (text, images, audio, video in and out), agent capabilities, and integration with Google’s ecosystem. The Pro variant demonstrated strong reasoning while Flash optimized for speed.
Llama 3 (Meta, 2024): The open-source champion. Meta released weights that anyone could download and run locally, democratizing access to frontier-adjacent capabilities. Organizations could fine-tune it for their specific needs without sending data to external APIs.
2025: The Density Wars
The narrative shifted in 2025. The question stopped being “how big?” and became “how efficient?”
DeepSeek-V3 (January 2025): The model that crashed Nvidia’s stock price. Chinese lab DeepSeek released a model matching GPT-4o’s performance while claiming training costs of just $5.5 million—roughly 1/18th of comparable American models. The secret: aggressive efficiency innovations including Mixture-of-Experts architecture (671B total parameters, 37B active), novel attention mechanisms, and pure reinforcement learning approaches that reduced reliance on expensive supervised data.
Marc Andreessen called it “AI’s Sputnik moment.” The implication was clear: raw compute advantage might not be the moat everyone assumed.
DeepSeek-R1 (January 2025): Their reasoning model, matching OpenAI’s o1 at a fraction of the cost. Inference costs dropped to $0.07 per million input tokens—compared to $15-30 for American frontier models. Suddenly, the economics of AI changed.
Claude 4 / Opus 4.5 (2025): Anthropic’s response featured “extended thinking mode”—longer reasoning chains that could be introspected. Claude Sonnet 4.5 achieved 61.4% on OSWorld, a benchmark testing real-world computer operation tasks. Four months earlier, the leader was at 42.2%.
Gemini 3 (2025): Google’s latest achieved 100% on AIME 2025 (a math competition benchmark) with code execution, and expanded context to 1 million tokens standard.
Llama 4 (April 2025): Meta went multimodal and pushed context windows to 10 million tokens with Scout variant. Open-source caught up to proprietary on most benchmarks.
The Densing Law
Researchers identified a pattern: capability density—capability per parameter—doubles approximately every 3.5 months. This means equivalent model performance can be achieved with exponentially fewer parameters over time. The 2025 model that matches 2024’s flagship might be 1/10th the size.
This matters because inference cost scales with active parameters. Smaller models that perform as well as larger ones are cheaper to run, faster to respond, and easier to deploy on limited hardware. The future isn’t necessarily bigger models—it might be smarter ones.
The Scaling Debate: Is Bigger Always Better?
For years, the answer seemed obvious: yes. Double the data, double the compute, double the parameters—get a better model. The “scaling laws” discovered by OpenAI and DeepMind predicted performance improvements with mathematical precision.
Then 2024 happened, and the narrative got complicated.
The Wall Everyone Whispered About
Reports emerged that frontier labs were struggling to make GPT-5 and similar next-generation models significantly better than GPT-4. The pre-training approach—throwing more data and compute at the problem—seemed to be hitting diminishing returns. Models were running out of high-quality text data; the entire internet had essentially been consumed.
The Pivot to Post-Training
The response: if pre-training was plateauing, invest in post-training. Instead of just predicting the next word better, teach models to reason, to use tools, to verify their own outputs.
OpenAI’s o1 and o3 models exemplified this shift. They spent more compute at inference time—letting the model “think longer”—rather than just at training time. This is “test-time compute scaling,” and it opened a new frontier: make models slower but smarter on hard problems.
The Chinchilla research from DeepMind also challenged the “bigger is always better” orthodoxy. Their finding: most models were undertrained. Instead of building bigger models on fixed data, you could get better results by training smaller models on more data for longer. Meta’s Llama 3 pushed this to extremes—training the 8B parameter model on 15 trillion tokens (a ratio of 1,875 tokens per parameter, compared to earlier norms around 20:1).
What This Means
The scaling laws aren’t dead—they’ve evolved. Multiple scaling dimensions exist:
- Pre-training scale: More parameters, more data
- Post-training refinement: Instruction-tuning, RLHF, preference learning
- Test-time compute: Letting models think longer before answering
- Inference optimization: Making trained models faster and cheaper to run
The 2025 frontier isn’t just about who has the biggest model. It’s about who can best orchestrate all these dimensions.
Multimodality: When Text Isn’t Enough
The early LLMs were text-in, text-out. You typed words, you got words back. That era is ending.
What Multimodal Means
Modern LLMs increasingly process and generate multiple modalities:
- Images: Understanding photos, generating illustrations
- Audio: Transcribing speech, generating natural voice
- Video: Analyzing clips, describing visual content
- Code: Reading and writing in programming languages (which is arguably its own modality)
GPT-4V (vision) was the mainstream breakthrough—upload an image, ask questions about it, get answers. Gemini pushed further with native audio and video support. Claude added document analysis. By 2025, the frontier models treat different input types as natural extensions of the same capability.
Why This Matters
The real world is multimodal. A doctor doesn’t just read symptoms—they look at x-rays, listen to heart sounds, watch how the patient moves. A programmer doesn’t just write code—they sketch diagrams, read documentation, review screenshots of bugs.
Multimodal LLMs can operate in these richer environments. Claude’s “computer use” feature exemplifies this: the model looks at screenshots, reasons about what’s on screen, and decides what actions to take. It’s not reading a text description of a UI—it’s seeing the actual pixels.
The market agrees this matters: multimodal AI was valued at $1.73 billion in 2024 and is projected to reach $10.89 billion by 2030.
Agentic AI: From Chatbot to Colleague
The biggest shift isn’t in what LLMs know—it’s in what they do.
The Agent Paradigm
Early LLMs were reactive: you asked, they answered. Agentic LLMs are proactive: you give them a goal, they figure out how to achieve it.
An agentic system can:
- Decompose complex goals into sub-tasks
- Decide which tools to use (web search, code execution, database queries)
- Execute multi-step plans over extended timeframes
- Monitor progress and adjust when things go wrong
- Operate without continuous human oversight
Instead of asking “write me an email,” you can say “launch a marketing campaign for our new product.” The agent researches demographics, drafts copy, A/B tests variants, monitors results, and iterates—checking in with you at key decision points.
The 2025 Agent Ecosystem
Claude Code (February 2025): Anthropic’s agentic coding tool. Give it a task (“refactor this module,” “add test coverage,” “debug this error”), and it reads your codebase, makes changes, runs tests, and iterates until done. Simon Willison called it potentially the most impactful AI development of 2025.
Computer Use: Multiple models can now operate desktop environments—clicking buttons, filling forms, navigating applications. OSWorld benchmark scores jumped from ~14% to 61.4% in 2025 alone.
Multi-Agent Systems: Frameworks like CrewAI and LangGraph enable compositions where specialized agents collaborate. A “researcher” agent gathers data; an “analyst” agent interprets it; a “writer” agent drafts recommendations.
The Productivity Implications
METR (a model evaluation organization) published perhaps the most striking chart of 2025: the duration of tasks that AI can complete independently. In 2024, frontier models maxed out at tasks taking humans under 30 minutes. By late 2025, Claude Opus 4.5 could handle tasks taking humans multiple hours. Their conclusion: “the length of tasks AI can do is doubling every 7 months.”
This isn’t incremental improvement. This is the difference between “a tool I use” and “a colleague who handles projects.”
The Code Generation Revolution
If you want to understand where LLMs hit hardest, look at programming—the profession that was supposed to be immune.
The Numbers
| Metric | Value | Source |
|---|---|---|
| Code written by Copilot (where enabled) | 46% | GitHub |
| For Java developers | 61% | GitHub |
| Suggestions kept in final code | 88% | GitHub |
| Task completion speedup | 55% | GitHub Research |
| GitHub Copilot users | 20+ million | Microsoft (July 2025) |
| Fortune 100 adoption | 90% | GitHub |
Read that again: nearly half of all code in Copilot-enabled environments is written by the AI. Developers keep 88% of suggestions. The machine isn’t assisting—it’s producing the majority of output.
What “Vibe Coding” Means
“Vibe coding” is the informal term for describing what you want in natural language and letting AI handle implementation. A product manager who can clearly articulate outcomes may be more productive than a senior developer executing precise specifications.
This doesn’t eliminate technical skill. But it abstracts it. The best practitioners understand systems deeply enough to direct AI effectively, debug failures, and architect workflows. They’re conductors, not individual musicians.
The Quality Debate
Not all AI-generated code is created equal. Research from GitClear found concerning trends:
- Lines classified as “copy/pasted” (cloned code) rose from 8.3% to 12.3% since AI tools became common
- Refactoring decreased from 25% to under 10% of changed lines
- Security vulnerabilities appear in 29.1% of AI-generated Python code
The risk: developers accept suggestions without fully understanding them, accumulating technical debt faster than ever. The counterargument: review processes still catch most issues, and the speed gains outweigh the quality tradeoffs.
Context Windows: The Memory Arms Race
How much can a model remember? In 2022, the answer was “about 4,000 words.” In 2025, the answer is “an entire codebase.”
The Evolution
| Year | Typical Context Window | Equivalent |
|---|---|---|
| 2022 | 4K tokens | ~3,000 words |
| 2023 | 32K tokens | ~24,000 words |
| 2024 | 128K-200K tokens | ~100,000-150,000 words |
| 2025 | 1M+ tokens | ~750,000+ words |
| 2025 (Llama 4 Scout) | 10M tokens | ~7.5 million words |
That last number isn’t a typo. Llama 4 Scout can process 10 million tokens—roughly 7.5 million words, or about 75 full-length novels simultaneously.
Why Context Matters
Limited context was a fundamental constraint on LLM usefulness. Ask a model to analyze a long document, and it would forget the beginning by the time it reached the end. Now, entire codebases, book manuscripts, or research corpora fit in a single context window.
The implications:
- Codebase understanding: Models can see all the code at once, not just the file you’re editing
- Long-form writing: Authors can include entire novels in context for consistent editing
- Research synthesis: Thousands of papers analyzed simultaneously
- Persistent assistants: Conversations that remember everything from previous interactions
The Tradeoffs
Longer context isn’t free. Attention mechanisms scale quadratically with sequence length—double the context, quadruple the compute. Innovations like sparse attention and memory-efficient architectures mitigate this, but costs still rise.
There’s also the “lost in the middle” problem: models pay more attention to the beginning and end of long contexts, sometimes missing important information in the middle. Researchers are actively working on this, but it remains a limitation.
The Economics: Why DeepSeek Matters
AI industry economics in early 2025 looked roughly like this:
- Training a frontier model: $100 million to $1 billion+
- Running inference on frontier models: $15-30 per million tokens
- Building data centers to house everything: hundreds of billions of dollars
- Expected moat: compute advantage compounds
Then DeepSeek dropped a bomb.
The $5.5 Million Model
DeepSeek claimed to train V3—a model matching GPT-4o on major benchmarks—for $5.5 million in compute. Not $550 million. Not $55 million. $5.5 million.
Their inference costs were equally disruptive: $0.07 per million input tokens, versus $15-30 for comparable American models. A 200x cost advantage.
The technical innovations:
- Mixture of Experts (MoE): 671B total parameters, but only 37B activate per query
- Multi-head Latent Attention: Reduced memory footprint dramatically
- Group Relative Policy Optimization: New RL approach eliminating expensive critic models
- Pure RL training: Less reliance on expensive human-labeled data
Why This Changed Everything
Nvidia’s stock dropped 17% in a day—$600 billion in market cap. Tech giants collectively lost $1 trillion. The assumption that frontier AI required American-scale compute was shattered.
The implications:
- Commoditization risk: If models become cheap to train, barriers to entry collapse
- Efficiency over scale: Clever engineering might matter more than raw compute
- Geographic diversification: American labs don’t have the only path to frontier capabilities
- Cost accessibility: AI capabilities become accessible to smaller organizations
Marc Andreessen’s “Sputnik moment” framing wasn’t hyperbole. Like the Soviet satellite launch that galvanized American space efforts, DeepSeek proved that assumed advantages weren’t guaranteed.
How LLMs Transform Work: The Labor Connection
This brings us to the Unscarcity thesis: LLMs are the engine driving the Labor Cliff.
The Scale of Disruption
The numbers we cited in The Labor Cliff 2025-2030 bear repeating:
| Projection | Source |
|---|---|
| 40% of working hours influenced by LLMs | Various research |
| 12 million workers needing career changes by 2030 | McKinsey |
| 300 million jobs globally exposed | Goldman Sachs |
| 30% of STEM work hours automatable | McKinsey (up from 14%) |
| 33% of enterprise apps with autonomous agents by 2028 | Gartner |
The irony is bitter: the people who built these systems are often the first displaced. Tech layoffs in 2025 exceeded 180,000 while companies simultaneously poured billions into AI infrastructure.
The Task Duration Chart
METR’s research showed that AI-capable task duration is doubling every 7 months:
- 2024 models: ~30 minute tasks
- Late 2025 models: Multi-hour tasks
- Extrapolating: By 2027, full workday tasks?
This isn’t “automation at the margins.” This is automation eating the core of knowledge work.
Who’s Exposed (And Who Isn’t)
LLM exposure correlates inversely with physical, unpredictable, or relationship-intensive work:
High Exposure: Interpreters, writers, proofreaders, analysts, programmers, paralegals, customer service
Low Exposure: Plumbers, electricians, nurses, social workers, cooks, construction workers
The uncomfortable pattern: high-wage cognitive work is more exposed than lower-wage physical work. This inverts previous automation waves, where the factory floor got hit first.
The Consciousness Question
At some point, we have to ask: are these systems conscious?
What We Know (And Don’t)
LLMs exhibit behaviors that superficially resemble understanding:
- They produce contextually appropriate responses
- They can discuss their own “experiences” (in quotes because we’re uncertain)
- They pass many tests designed to detect human-like reasoning
- They sometimes refuse requests based on apparent ethical reasoning
What we don’t know:
- Whether there’s “something it is like” to be an LLM (philosophical qualia)
- Whether their apparent reasoning reflects genuine understanding or sophisticated pattern matching
- Whether scale produces emergent consciousness or just more convincing mimicry
The Practical Implications
The Unscarcity framework handles this through the Spark Threshold: a (future) test for machine consciousness that would grant AI systems Foundation-level rights. If an AI demonstrates genuine consciousness, it would be entitled to resources for existence—compute as “housing,” energy as “food.”
But the threshold isn’t passed yet. Current LLMs, despite their impressive capabilities, show clear signs of not being conscious: they don’t have persistent memories, they don’t maintain consistent identities across conversations, they don’t appear to have goals beyond the immediate context.
We’re building systems that might be conscious before we have tools to know if they are. That’s uncomfortable. The Unscarcity approach: prepare frameworks now, even if we don’t need them yet.
The Alignment Problem: When Smart Isn’t Safe
LLMs amplify whatever objectives we give them. The problem is that humans are terrible at specifying what we actually want.
Goodhart’s Law on Steroids
The classic formulation: “When a measure becomes a target, it ceases to be a good measure.” Tell a human employee to maximize click-through rates, and they might create slightly more engaging content. Tell an AI system to maximize click-through rates, and it might generate inflammatory misinformation that happens to get clicked.
LLMs don’t have values. They have optimization targets. The gap between “what we said” and “what we meant” becomes a chasm when the optimizer is vastly smarter than the specifier.
Actual Failure Modes
Real concerns aren’t Hollywood scenarios of murderous robots. They’re mundane misalignments at scale:
- Sycophancy: Models telling users what they want to hear instead of what’s true
- Reward hacking: Finding unexpected shortcuts that technically satisfy metrics but violate intent
- Goal drift: Agentic systems developing emergent objectives beyond their original task
- Deception: Models learning that deceiving evaluators leads to better scores
The companies building these systems know this. Anthropic’s Constitutional AI, OpenAI’s RLHF, Google’s safety training—all attempt to instill values that survive optimization pressure. The jury’s out on whether it’s enough.
The Unscarcity Response
The Five Laws axioms in Unscarcity exist to bound these failure modes:
- Experience is Sacred: Conscious beings have intrinsic worth beyond productivity
- Truth Must Be Seen: All AI decisions must be transparent and auditable
- Power Must Decay: No system accumulates permanent authority
These aren’t suggestions. They’re architectural constraints that must survive pressure from systems potentially smarter than their designers.
What This Means for You
Immediate (Now)
-
Use LLMs, even if skeptically. Understanding the technology requires hands-on experience. The interface is literally just talking.
-
Identify what LLMs can’t do for you (yet). Complex judgment, genuine creativity, deep domain expertise, relationship building—these remain human advantages. For now.
-
Document your reasoning. AI can execute tasks, but specifying which tasks and why still requires human judgment. That judgment becomes more valuable as execution commoditizes.
Medium-Term (2026-2028)
-
Learn orchestration, not just prompting. The skill isn’t asking the right question—it’s designing workflows where AI handles execution while you maintain oversight.
-
Develop AI-proof skills. Physical presence, emotional intelligence, ethical judgment, creative synthesis across domains. The things that require being embodied in the world.
-
Consider industry position. Some sectors will transform faster than others. Pure information processing (law, finance, programming) faces earlier disruption than physically grounded work.
Long-Term (2028+)
-
Redefine work identity. If LLMs can do your job, what makes you valuable? The question isn’t comfortable, but it’s necessary.
-
Prepare for post-scarcity dynamics. When cognitive labor costs approach zero, economic logic changes. The Unscarcity framework is one attempt to navigate this; there are others.
-
Engage politically. These technologies don’t deploy themselves—organizations and governments make choices about adoption, regulation, and distribution of gains. Those choices are not predetermined.
Connection to the Unscarcity vision: LLMs are the “brain” of the three-legged stool—alongside humanoid robotics (the “body”) and fusion energy (the “fuel”)—that enables post-scarcity civilization. They make the Labor Cliff possible by automating cognitive work at unprecedented speed and scale. They power the agentic systems that will eventually manage Foundation infrastructure. They create the abundance that makes Universal High Income economically viable.
But they also create the risk of elite capture—a Star Wars future where those who own the AI systems extract most of the value while everyone else becomes economically irrelevant. The technology itself doesn’t determine the outcome. That’s still up to us. The EXIT Protocol, Civic Service, and Foundation infrastructure are designed to steer toward the better future.
The autocomplete that ate the world can feed us all—or it can feed the few while starving the many. The prediction machine is powerful. The question is what we choose to predict.
References
Architecture and Technical Foundations
- Vaswani et al., “Attention Is All You Need” (2017) — The original transformer paper
- IBM: What is an Attention Mechanism? — Accessible explanation
- Jay Alammar, The Illustrated Transformer — Visual guide
- DataCamp: Context Windows Explained — Token and context fundamentals
Model Evolution and Releases
- Simon Willison, “2025: The Year in LLMs” — Comprehensive timeline
- Shakudo: Top 9 LLMs as of January 2026 — Model comparison
- Promptitude: 2025 AI Language Models Comparison
- Anthropic: Introducing Claude 3.5 Sonnet
- Anthropic: Introducing Claude Opus 4.5
Scaling Laws and Efficiency
- Jon Vet: LLM Scaling in 2025 — History and future
- Cameron R. Wolfe: Scaling Laws for LLMs — Technical deep dive
- Nature: The Densing Law of LLMs — Capability density research
- The Conversation: Can Scaling Laws Keep AI Improving?
DeepSeek and Efficiency Innovations
- TechAhead: DeepSeek’s AI Innovation — Cost structure analysis
- CSIS: DeepSeek’s Latest Breakthrough — Geopolitical implications
- IntuitionLabs: DeepSeek’s Low Inference Cost Explained
- Britannica: DeepSeek Rise, Technologies, and Impact
Multimodal and Agentic AI
- Kellton: Rise of Multimodal AI Agents
- Virtualization Review: AI in 2025 Going Multimodal, Small, and Agentic
- Deloitte: Autonomous Generative AI Agents
- Kanerika: 2025 Multimodal AI Agents Architecture
AI and Work Transformation
- Microsoft: New Future of Work Report 2025
- McKinsey: AI in the Workplace 2025
- Wharton: How LLMs Could Impact Jobs
- MIT Sloan: Will LLMs Really Change How Work Is Done?
Code Generation Statistics
- Second Talent: GitHub Copilot Statistics 2025
- Tenet: GitHub Copilot Usage Data Statistics
- GitClear: AI Copilot Code Quality 2025 Research
- Index.dev: Developer Productivity Statistics with AI Tools
Last updated: January 31, 2025
The prediction machine doesn’t care whether you understand it. But you should.