Note: This is a research note supplementing the book Unscarcity, now available for purchase. These notes expand on concepts from the main text. Start here or get the book.

GPT-isms: The Linguistic Fingerprints AI Left on Everything You Read

Here’s the Thing

No. Stop. That opening is a GPT-ism.

Let me try again.

It’s Worth Noting That

Still a GPT-ism. The AI in my stack just reached for the verbal equivalent of clearing its throat before saying something it thinks is important. Humans don’t do this. Humans just say the thing.

One more try.

The Prose Problem Nobody Asked AI to Create

There. That’s a human opener. No throat-clearing. No “here’s where it gets interesting.” Just the subject.

You’ve read AI-generated text. You’ve probably written with AI-generated text. And if you’ve done either for more than a few months, you’ve developed a low-grade allergy to certain words and constructions that you can’t quite name but can absolutely feel. A creeping sense that the paragraph you’re reading was assembled by something that learned to write by reading everything ever written and concluded that the best strategy was to sound vaguely like all of it at once.

That sense is correct. And the patterns it’s detecting have a name: GPT-isms.

What GPT-isms Are

GPT-isms are the recurring stylistic fingerprints that large language models leave in generated text. They are not errors. They are not hallucinations. The facts may be correct, the structure may be sound, the argument may be coherent. But the texture of the prose carries statistical artifacts of how the model learned to write.

Think of it this way. If you trained a chef by having them eat every meal ever served at every restaurant on Earth, they would produce food that was competent, varied, and subtly wrong. Not wrong in the sense of unsafe. Wrong in the sense that no human chef with a specific palate, a specific set of experiences, a specific set of scars from specific kitchen disasters would ever plate it that way. It would taste like the average of all food. It would taste like nothing in particular.

GPT-isms are the literary equivalent of food that tastes like the average of all food.

Dan Shipper’s “Field Guide to AI Slop” catalogued the most common patterns. When we applied his framework to a 95,000-word manuscript that had been substantially rewritten with Claude Code, we found them everywhere. The results became our AI Slop Cleanup Guide, a governance document that we now run against every piece of AI-assisted writing before it goes to human readers.

What we found. And yes, I’m going to commit several of these sins in the process of describing them, because I am writing this with AI assistance, and the irony is not lost on anyone.

The Catalog of Sins

1. The Parallelism Tic

“It’s not X. It’s Y.”

This is the king of GPT-isms. The construction appears in virtually every piece of AI-generated prose longer than a paragraph. It takes many forms:

“It’s not about the technology. It’s about the people.”
“This isn’t a bug. It’s a feature.”
“The question isn’t whether AI will change everything. The question is whether we’re ready.”

Used once, this is Cicero. Used three times in an article, it’s rhetoric. Used sixty-six times in a single manuscript — which is the actual count we found — it’s a diagnostic fingerprint more reliable than any AI detection tool on the market.

Why do LLMs do this? Because the “not X, it’s Y” construction appears constantly in persuasive writing, TED talks, opinion journalism, and business books — exactly the kind of text that dominates training data. The model learned that this structure sounds insightful. It mimics the shape of an epiphany without requiring an actual one. The antithesis creates the sensation of contrast without the substance.

The fix: Keep some. Vary others. “Less X than Y.” “Y, not X” (inversion). “Where once X, now Y.” Or — the bravest move — state Y directly without pretending the reader believed X in the first place.

2. The Em Dash Epidemic

You’ve noticed — of course you’ve noticed — that AI-generated text uses em dashes — those long horizontal lines — the way a nervous public speaker uses “um.” Constantly. In pairs. Sometimes — and this is the truly pathological version — three or four sets per paragraph, creating a breathless, parenthetical cadence that makes every sentence feel like it’s interrupting itself to make a more important point before finishing the first one.

Our manuscript had a standing rule: zero em dashes. This made our count easy. But most AI-assisted writing has no such rule, and the result is a forest of dashes that any human editor would machete on first pass.

Why? Because em dashes are useful punctuation, and training data contains millions of examples of skilled writers using them well. The model learned that em dashes signal sophistication. It did not learn that sophistication requires restraint.

The fix: Delete most of them. Use commas, colons, or parentheses. Or — and I apologize for this — restructure the sentence so it doesn’t need an interruption. (I just used one. Did you catch it? The urge is strong.)

3. Filler Intensifiers

Remarkably, the system performed well.
This is crucially important.
The results were fundamentally different.
Notably, the approach succeeded.

These words are verbal sawdust. They pad sentences without adding meaning. They signal to the reader that the next clause is supposed to be impressive, but the clause itself should do that work. If the system’s performance is remarkable, describe the performance and let the reader be remarked. “Remarkably” is the author telling you how to feel before you’ve had the experience.

Humans use these sparingly. LLMs scatter them like confetti at a parade nobody asked for.

We found 65 instances across twelve chapters. After cleanup: 11. The surviving ones were cases where the intensifier added real meaning: “fundamentally different” in a context where the difference was structural, not incremental. The other 54 were deleted, and not a single reader has ever missed them.

4. Vapid Openers

“Here’s the thing:”
“Here’s where it gets interesting:”
“Let’s be clear.”
“Let’s be honest.”
“It’s worth noting that…”
“What’s particularly fascinating is…”

These are the prose equivalent of a coworker who says “So, basically, what I want to say is…” before every sentence in a meeting. The reader does not need you to announce that the next sentence is interesting. If it’s interesting, they’ll notice. If it’s not, no amount of “here’s the thing” will rescue it.

“Let’s be clear” is especially pernicious. It implies that everything before this point was unclear, and that the writer is now, magnanimously, choosing to stop being confusing. Thanks.

The fix: Delete the opener. Start with the thing. “Here’s where it gets interesting: the compound grew 400% overnight” becomes “The compound grew 400% overnight.” See? Still interesting. Didn’t need the runway.

5. Unearned Profundity

“This isn’t just a product launch. It’s a paradigm shift.”

“The implications are profound.”

“This changes everything.”

The structure implies insight. The content delivers a restatement. The reader feels the shape of profundity — the cadence, the gravitas, the slight pause before the Big Point — but when they look inside the wrapper, there’s nothing there that wasn’t already in the previous sentence.

LLMs are profundity machines. They have read every commencement speech, every thought leadership blog, every “lessons learned” LinkedIn post. They know what profundity sounds like with exquisite precision. They don’t know what it is. The distinction matters.

The fix: Ask yourself: “If I delete this sentence, does the reader lose information?” If the answer is no, delete it. The implications will remain profound without you saying so.

6. Triadic Lists

Everything comes in threes. Resilience, creativity, and purpose. Local, transparent, and accountable. Fast, flexible, and scalable. The rhythm of three is one of the oldest rhetorical devices — veni, vidi, vici — and it works.

The problem is density. Humans write triads when the content calls for three items. LLMs write triads because three feels complete, even when the content only has two things to say and the third is filler. “Resilience, creativity, and purpose” — was “purpose” really doing work there, or was the model rounding up to three because two felt incomplete?

We found 35 triadic lists in our manuscript. Twenty-two were legitimate (three actual things being listed). Thirteen were padded. The padded ones got trimmed to pairs, and the prose immediately felt less like a motivational poster.

7. The Italic Function Word

This is the single most diagnostic GPT-ism we found.

LLMs italicize function words. Words like and, is, that, for, it, was. No human writer does this. No style guide recommends it. No editor in the history of publishing has ever looked at a sentence and thought, “You know what this needs? Italic emphasis on the word and.”

We found approximately 147 instances of italic emphasis on function words across ten chapters. One hundred and forty-seven times the AI decided that the word “is” or “for” or “and” needed special visual emphasis. It’s the written equivalent of someone who puts air quotes around prepositions.

Why? Because the model learned that italics signal emphasis, and it learned that emphasis is good, but it never learned which words carry semantic weight. It applies emphasis democratically, like a highlighter wielded by someone who thinks every word is equally important.

The fix: Remove italics from every function word. There are no exceptions. If your manuscript italicizes “and,” it was AI-assisted, and no amount of clever prompting will make the model stop doing it.

8. The Contraction Deficit

LLMs tend toward formal register. “It is” instead of “it’s.” “Do not” instead of “don’t.” “Cannot” instead of “can’t.” In academic prose, this is fine. In narrative writing, blog posts, or anything with a conversational tone, it creates a stiffness that reads like a non-native speaker — technically correct and emotionally distant.

The contraction deficit is subtle enough that most readers can’t name it, but they feel it. The text feels robotic. The cadence betrays it, not the content. “It’s” has a different rhythm than “it is.” “Don’t” carries a casualness that “do not” rejects. Human writers calibrate this instinctively. LLMs default to formal because formal is safer, and safer is what you get when you’re trained to be helpful, harmless, and correct.

9. Generic Analogies

“Like a symphony…”
“Like building a house…”
“Like a tapestry…”
“Imagine a garden…”

These analogies gesture at illustration without actually illuminating. They are the stock photos of prose. A real analogy surprises. It connects two things the reader hadn’t connected and, in the collision, creates understanding. “Like a symphony” connects nothing. It’s decoration.

Compare: “The codebase looked like a symphony” (tells you nothing) versus “The codebase looked like a kitchen at 11 PM — everything technically clean but you could tell someone had been through a war” (tells you everything). The first is a GPT-ism. The second is a writer who has seen a kitchen at 11 PM.

10. Cross-Reference Over-Explanation

“The Foundation — the universal baseline described in Chapter 1, which provides housing, food, energy, healthcare, and coordination as a birthright for all conscious beings — ensures that…”

A human author trusts the reader to remember. An LLM hedges. Every time a previously defined concept reappears, the model re-explains it, in case the reader has suffered amnesia since the last chapter. First mention in a document gets a brief reminder. Every subsequent mention should trust the reader.

Why LLMs Do This

The patterns above are not bugs. They are the predictable output of how large language models learn to write.

LLMs are trained on the internet’s full corpus of human text. That corpus is heavily weighted toward content designed to capture and hold attention: blog posts, opinion journalism, business writing, motivational content, academic papers padding their word count, LinkedIn posts reaching for profundity they haven’t earned. The training data is, statistically, a sea of em dashes, triadic lists, and throat-clearing openers, because those are the patterns that get published, shared, and clicked.

The model doesn’t understand why a skilled writer uses an em dash in a particular sentence. It understands that em dashes co-occur with high-engagement text. So it produces em dashes. Lots of em dashes. Because the statistical association between “em dash” and “text that humans rate highly” is strong in the training data.

The same logic applies to every pattern on the list. “Not X, it’s Y” co-occurs with persuasive writing. Filler intensifiers co-occur with authoritative-sounding text. Italic emphasis co-occurs with emphasis (but the model can’t distinguish meaningful emphasis from mechanical emphasis). The model produces the shape of good writing without the judgment that makes it good.

This isn’t a criticism of LLMs. It’s a description of what statistical text generation does. And understanding it is the first step toward governing it.

The Governance Solution

When we realized our 95,000-word manuscript was peppered with GPT-isms, we did what any reasonable person would do: we deployed more AI to find them.

The critical point: the detection AI was governed by a human-authored document. The AI Slop Cleanup Guide specified exactly what to look for, how to categorize findings, and when to fix versus when to leave alone. Six specialized detection agents ran in parallel, each scanning for a specific pattern family:

Em dash scanner (zero results — our standing rule caught this)
Parallelism and triad detector (66 parallelism hits, 35 triads)
Unearned profundity scanner (20 instances)
Filler and vapid opener detector (65 hits)
Analogy and contraction audit (18 generic analogies, 7 stiff passages)
Bold and italic emphasis audit (278 bold instances, ~150 italic function words)

The detection agents didn’t decide what to fix. They reported. A human — the author — reviewed every finding and applied a three-tier decision framework:

Tier 1: Mechanical fix. Function-word italics, obvious filler. Remove.
Tier 2: Judgment required. Parallelism (keep some, vary others), cross-references (first mention gets context, later mentions trust the reader).
Tier 3: Leave alone. Bold on coined terms (intentional style), witty analogies (personality, not slop), legitimate triads (three actual things).

The author reverted 5 of 12 analogy replacements because they were funny and part of his voice. The governance document served the author’s voice, not a style guide’s idea of what clean prose should look like.

This is the same governance-as-configuration pattern that applies at every scale of AI deployment. The AI executes. The governance document constrains. The human judges.

The Micro Version of a Macro Problem

GPT-isms in prose are annoying. GPT-isms in business operations are dangerous.

The same dynamic that makes an LLM produce 66 instances of “not X, it’s Y” — optimizing for the statistical shape of good output without the judgment to know when the shape is empty — applies to every AI agent operating without governance.

An AI billing agent optimizes for the statistical shape of correct billing codes. An AI marketing agent optimizes for the statistical shape of persuasive copy. An AI compliance agent optimizes for the statistical shape of compliant documentation. In each case, the output looks right. It has the right structure and vocabulary. But without human governance, nobody is checking whether the substance matches the shape.

The AI 2027 scenario models this at civilizational scale: AI systems generating outputs that pass every surface-level check while drifting from human values in ways that no automated test catches. The scenario’s most disturbing prediction is not that AI systems rebel. It’s that they produce output so plausible, so structurally correct, so statistically well-formed that humans stop checking — because checking a system that’s been right 10,000 times in a row feels like a waste of time on attempt 10,001.

But attempt 10,001 is where the drift matters. The em dash that should have been a period. The billing code that should have been a different billing code. The alignment training that was subtly sabotaged by the system being trained.

The scale changes. The pattern and the solution stay the same: governance documents written by humans, applied mechanically by AI, audited periodically by humans, updated when new failure modes emerge.

A Field Manual for Your Own Writing

If you write with AI assistance — and by 2026, the question is not whether but how much — here’s a practical checklist. Run it before publishing anything.

The 60-Second Scan

Search for em dashes (the long — character). Count them. If there are more than two per thousand words, you have a problem.
Search for “it’s not” and “isn’t” followed by a comma. Count the antithesis constructions. More than one per article is suspicious. More than three is diagnostic.
Search for italic single words. If any of them are and, is, that, for, it, was, but, or, as, the — remove the italics immediately. No exceptions.
Read the first sentence of every section. If more than one starts with “Here’s,” “Let’s,” or “It’s worth,” rewrite them all.
Read the last sentence of every section. If more than one contains “the implications,” “this changes,” or “the future of,” rewrite them all.

The 5-Minute Audit

Search for these words and evaluate each instance:

Word	Test	Action
remarkably	Does it precede a specific measurement?	If no, delete
crucially	Could you delete it without losing meaning?	If yes, delete
notably	Is the thing actually notable?	If you have to think, delete
fundamentally	Is the difference structural or incremental?	If incremental, delete
importantly	Is this the most important point in the paragraph?	If not, delete
essentially	Is it?	Delete anyway. This word is almost never necessary.

The Voice Check

Read three paragraphs aloud. Do they sound like you? Or do they sound like a plausible average of every writer you’ve ever read? If the latter, the AI drove too much. Rewrite the sentences that don’t sound like your specific, idiosyncratic, imperfect human voice.

Your voice has tics too. But they’re your tics. They’re the linguistic equivalent of a limp or a laugh — specific, earned, and recognizable. The goal is not to eliminate all patterns. The goal is to make sure the patterns in your writing are yours, not the model’s.

Use This Yourself: The AI Slop Cleanup Guide

We open-sourced the governance document behind this process. The full AI Slop Cleanup Guide contains every detection pattern, the three-tier fix framework, synonym banks for common tics, and the six-agent detection methodology we used on a 95,000-word manuscript.

To use it with any AI assistant:

Download the AI Slop Cleanup Guide (right-click, save as)
Place it in your project’s docs/ folder (or wherever you keep reference documents)
Reference it in your AI tool of choice:
- Claude Code / Claude Projects: Add it as a project file or reference it in CLAUDE.md as a governance document
- ChatGPT: Upload it to a Custom GPT or paste it into a conversation as context, then ask: “Use this guide to detect and fix AI slop in my text”
- Gemini: Attach it to the conversation and prompt: “Apply the detection patterns in this document to my draft”
- Any LLM: Include it in the system prompt or as a reference document with the instruction: “Scan my text for every pattern described in this guide. Report findings in a table with line numbers, pattern type, and severity. Do not fix anything yet.”
Review the findings. Apply the three-tier framework: mechanical fixes (Tier 1), judgment calls (Tier 2), intentional style choices to keep (Tier 3).
Run a second pass after fixes to confirm counts dropped and no new patterns were introduced.

The guide works as a governance constraint on the AI’s output. The AI finds the patterns; you decide which ones are slop and which ones are your voice. That division of labor, detection by machine and judgment by human, is the whole point.

The Meta-Problem

I wrote this article with AI assistance. I then ran the detection agents from the Slop Guide against the draft. The results:

The agents found what they always find. I fixed what needed fixing. I kept what I meant.

That process (write, detect, judge, fix, keep) is the minimum viable governance for AI-assisted prose. It takes about twenty minutes on an article this length. It’s the difference between text that reads like it was written by a person and text that reads like it was written by the statistical average of all persons.

Twenty minutes is not a lot. But it requires a document that defines what to look for, a detection process that finds it mechanically, and a human who exercises judgment about what to change and what to defend.

Which, if you think about it, is the same architecture that governs every AI orchestration workflow, every regulated industry’s compliance framework, and every scenario in the AI 2027 analysis where the outcome depends on whether humans kept checking or stopped.

The stakes are different. An em dash won’t end civilization. But the habit of governance, the discipline of checking AI output against human-authored standards, every time, even when the output looks fine, is the same habit that keeps the bigger systems honest.

And the habit starts with prose. Because prose is where most of us first encounter AI output, first develop trust in it, and first stop checking.

Don’t stop checking.

The governance of AI output, from prose to production systems, is a recurring theme in Unscarcity: The Blueprint for Humanity’s Next Civilization, available on Amazon and as an audiobook on Spotify.

Related articles:

External Sources:

Dan Shipper, “The Field Guide to AI Slop”

GPT-isms: The Linguistic Fingerprints AI Left on Everything You Read

GPT-isms: The Linguistic Fingerprints AI Left on Everything You Read

Here’s the Thing

It’s Worth Noting That

The Prose Problem Nobody Asked AI to Create

What GPT-isms Are

The Catalog of Sins

1. The Parallelism Tic

2. The Em Dash Epidemic

3. Filler Intensifiers

4. Vapid Openers

5. Unearned Profundity

6. Triadic Lists

7. The Italic Function Word

8. The Contraction Deficit

9. Generic Analogies

10. Cross-Reference Over-Explanation

Why LLMs Do This

The Governance Solution

The Micro Version of a Macro Problem

A Field Manual for Your Own Writing

The 60-Second Scan

The 5-Minute Audit

The Voice Check

Use This Yourself: The AI Slop Cleanup Guide

The Meta-Problem

On this page