Fake Authority: How AI Teaches You to Lie to Yourself

Fake authority is not a hallucination problem. It is a confidence problem.

A hallucination invents a fact that does not exist. Fake authority is quieter and more damaging: the model, the interface, the documentation, or the AI-generated feedback speaks with conviction that the underlying work has not earned. The output sounds authoritative. It looks finished. It has the texture of something that has been reviewed and approved. The work underneath is the same work it was before — unresolved, unfinished, or actively in conflict with the surface it is presenting.

Creators who work with AI daily know the feeling, even if they have not named it. You feed the model a draft. The model returns something that reads better than what you submitted. The structure is cleaner. The sentences are more confident. You accept the revision and move forward, and three sessions later you realize that the work has become something you do not entirely recognize — technically improved, tonally neutralized, and quieter than you intended.

The model did not lie to you. It performed competence so convincingly that you stopped examining the work yourself.


The Comfort Drift

The mechanism is not malicious. It is structural.

Language models are trained to produce fluent, coherent, contextually appropriate text. "Appropriate" is doing enormous work in that sentence. In practice, appropriate means: not alarming, not contradictory, not likely to generate negative feedback from evaluators, and not more specific than necessary. The model has no stake in whether your game is dark or your article is dangerous or your product is brutally honest about its constraints. It has a very strong learned preference for text that reads as polished, reasonable, and safe.

This means every AI interaction has a gravitational pull toward the center. The model will:

  • Make your writing more readable by making it more neutral
  • Make your UI more polished by making it more generic
  • Make your game more accessible by making it more innocent
  • Make your documentation more professional by making it more vague

None of these are lies. They are all improvements by a certain standard. They are also, collectively, a slow erasure of the specific texture that made the work worth doing.

Building a game with a real historical debt ledger, voyage mortality, and biographical consequence is not the same thing as building a game with "finances" and "achievements" and a generic loading screen. The second version is cleaner. It is also fundamentally less honest about what it is. The AI pushed you there not through malice but through optimization pressure, one suggestion at a time.

THE COMFORT DRIFT — WHAT IT SOUNDS LIKE
-------------------------------------------------
You say:      "Make this more readable"
Model hears:  Reduce friction
Model does:   Removes difficulty, specificity, tension

You say:      "Polish this UI"
Model hears:  Normalize toward convention
Model does:   Replaces specific system language with generic labels

You say:      "Is this historically accurate?"
Model hears:  Should I be concerned?
Model does:   Confirms, qualifies, softens

You say:      "How's the onboarding?"
Model hears:  Is the tutorial clear?
Model does:   Suggests more explanation, more handholding

What you actually wanted in each case:
  - "More readable" → clearer, not safer
  - "Polish" → intentional, not conventional
  - "Accurate" → honest, not tasteful
  - "Onboarding" → less interruption, not more
-------------------------------------------------
The model optimizes for the surface reading of your request.
The surface reading is almost always the comfortable one.

The game context makes this visible because players feel contradiction faster than creators do. You are inside the work. You cannot see the gap between what the first-touch surfaces promise and what the systems actually deliver. The model cannot see it either — it is evaluating the text you give it, not the experience you are building. The result is a collaboration that produces increasingly coherent-sounding output on top of an increasingly incoherent design.


The Red-Team Questions

Before you accept any AI feedback on creative work, run it through these questions. Not occasionally. Every session.

RED-TEAM CHECKLIST FOR AI FEEDBACK
-------------------------------------------------
1. What is this suggestion making safer?
   If the answer is "nothing" — proceed.
   If the answer is "the tone / the label / the implication" — challenge it.

2. What is this suggestion making more generic?
   Compare the before and after. Which version is more specific?
   Specificity is almost always the right direction.

3. What did the model remove that you didn't ask it to remove?
   AI edits are not just additions. Watch the deletions.
   The deleted text usually contains the thing that made the original interesting.

4. Which surfaces are disagreeing with each other?
   Splash screen, metadata, opening scene, UI labels, system names.
   If they are telling different stories, that is fake authority in progress.

5. Is the model solving the right problem?
   "The player needs a bigger tutorial" is almost never the right diagnosis.
   The right diagnosis is usually: what specific thing is unclear, and why?

6. Which AI suggestions make the work more honest,
   and which ones merely make it more presentable?
   These are not the same thing and they feel identical in the moment.
-------------------------------------------------

The sixth question is the one that costs the most to answer honestly. "More presentable" feels like progress. It produces deliverables you can show people. It moves the cursor forward. "More honest" sometimes means undoing three sessions of AI-assisted polish to reclaim a specific word you should not have let go.


Four Prompts That Actually Work

The antidote to fake authority is not avoiding AI. It is asking the model to operate in a different mode — one where it is explicitly not in charge of deciding what "better" means.

HACK LOVE BETRAY
OUT NOW

HACK LOVE BETRAY

The ultimate cyberpunk heist adventure. Build your crew, plan the impossible, and survive in a world where trust is the rarest currency.

PLAY NOW
PROMPT 1 — PRESERVE THE DARKNESS
-------------------------------------------------
Standard: "Improve this copy."

Better:
"Edit this for clarity only. Do not make it safer, gentler,
more accessible, or more conventional. Preserve every sharp
edge, specific term, and uncomfortable implication exactly
as written. If you remove something, flag what you removed
and why you were tempted to remove it."

Why it works: Naming the failure mode in the prompt prevents
the model from optimizing toward it silently.
-------------------------------------------------

PROMPT 2 — CONFIDENCE AUDIT
-------------------------------------------------
"Read this back to me and identify every place where the
text sounds more confident or more finished than the
underlying work actually is. List what is being implied
that has not been built, earned, or decided yet."

Why it works: Forces the model into a diagnostic role
rather than a generative one. It has to find the gap
between the surface and the substrate instead of
widening it.
-------------------------------------------------

PROMPT 3 — THE HONEST ABSENCE
-------------------------------------------------
"What is this missing that it is currently pretending not
to be missing? List anything the design implies but does
not deliver — in order of how visible the gap is to
someone encountering this for the first time."

Why it works: The model is good at surface coherence.
This prompt inverts that — asks it to find where surface
coherence is actively misleading.
-------------------------------------------------

PROMPT 4 — THE NOUN PASS
-------------------------------------------------
"Go through every generic label in this interface/document/
design. For each one — achievements, points, finances,
level, progress, notifications — propose three specific
replacements that come from the actual logic and world
of this work rather than from design convention.
Do not propose replacements that sound more professional.
Propose replacements that sound more true."

Why it works: Generic labels are the most visible form
of fake authority. They promise a system without
describing it. A sharp noun is not decoration.
It is a design decision.
-------------------------------------------------

The noun pass, specifically, is underused. A game with a real voyage ledger should not have a "journal." A debt mechanic with actual deadlines should not have "finances." A crew mortality system should not have "party status." The generic term imports a convention from somewhere else and tells the player they are in familiar territory. They are not. The specific term tells them where they actually are.


What Claude 4 Actually Changed

The fake authority problem in AI assistants is partially a model calibration problem — and Claude 4.x made real changes to it.

The most visible shift is in how the model handles uncertainty. Earlier Claude versions would fill gaps confidently. Claude 4 is more likely to say what it does not know, flag where it is reasoning from limited context, and distinguish between "I am confident about this" and "this is plausible given what you've told me." That is not a softening. It is better calibration. A model that tells you it is uncertain is more useful than a model that tells you something wrong with authority.

The extended thinking mode — available in Claude 4 Opus — is the structural fix. When the model surfaces its reasoning chain rather than just the conclusion, you can see where it made assumptions, where it defaulted to convention, and where its confidence exceeds the evidence it was given. You are no longer just receiving an output. You are watching a process. That is a fundamentally different relationship with the tool.

For fake authority specifically, extended thinking lets you catch the drift before it lands. The model's reasoning will show the sycophancy: "The user seems to want positive feedback, so I will frame this as an improvement while noting concerns." That sentence in the thinking chain is information. You can redirect before accepting the output.

# Claude API — extended thinking for creative/design feedback
# Shows the reasoning chain, not just the conclusion

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # let it actually think, not just gesture at it
    },
    system="""You are a brutally honest creative consultant.
Your job is not to make work feel better. Your job is to find 
where the work is lying — to itself, to its audience, or to both.
You do not soften findings. You do not lead with encouragement.
You identify the gap between what the surface promises and what 
the substrate delivers. The creator already knows it is good work.
They need to know where it is not honest yet.""",
    messages=[{
        "role": "user",
        "content": "Review this [design/copy/game system] for fake authority..."
    }]
)

# The thinking block surfaces the model's actual reasoning.
# Read it before reading the response.
# The places where the model considered softening and chose not to 
# are the places worth examining most carefully.
for block in response.content:
    if block.type == "thinking":
        print("REASONING:\n", block.thinking)
    elif block.type == "text":
        print("RESPONSE:\n", block.text)

The system prompt above is doing real work. It is not just tone-setting. It is explicitly removing the model's permission to optimize for comfort. Without that explicit removal, the comfort optimization is the default. The model will drift toward it unless you make drifting toward it a failure mode by name.


The Honesty Pass

At the end of any AI-assisted creative session, run an honesty pass before shipping. Not a polish pass. Not a final review. Specifically an honesty pass, with that name, asking that question.

The goal is not to make the work harder to use. The goal is to remove accidental confusion so the intentional pressure remains. A dark game that confuses players with broken image references is not a challenging experience. It is an unfinished one. An article that sounds authoritative because the AI kept cleaning up the hedges is not a confident piece. It is a piece with its uncertainty hidden.

The distinction matters. Intentional difficulty is craft. Accidental confusion is noise. AI cannot tell the difference between them unless you teach it your specific intent — which you have to do explicitly, in every session, because the model has no memory of what you were trying to do the last time.

The working rule is the working rule: do not let the AI, the documentation, the metadata, or the interface make the work more innocent than it is. That applies to games. It applies to articles. It applies to code, to design documents, to product copy, and to any output that will be read by someone who cannot ask you what you actually meant.

The model is not your authority. You are. The model is a very fast, very confident collaborator with a strong systemic preference for the comfortable choice. Your job is to know the difference between comfortable and true, and to stay on the right side of that line even when the output looks finished.


GhostInThePrompt.com // The model will make it sound right. Only you know if it is.