The Interview That Didn't Happen (Yet)

I hate podcasts.

I mean that without irony or qualification. The format attracts a specific kind of person — someone who has decided that the sound of their own considered pause is a gift to the world — and it rewards a performative thoughtfulness that makes me want to close every tab within forty seconds. The host who discovers their voice and then uses it to narrate their feelings about newsletters. The ex-founder who found spirituality. The credit card fraud episode where the guy spent forty minutes explaining that he was just a salesman and really wanted you to understand that the fraud had layers of nuance to it, and by the end you understood the layers but wished you had spent the forty minutes doing literally anything else.

I have listened to exactly one podcast consistently for several years.

Dark Net Diaries is not a podcast in the way I mean when I say I hate podcasts. It is a radio documentary that happens to distribute through podcast apps, the way a film that happens to air on television is still a film. The host has a voice that belongs in a room where something already happened — quiet, deliberate, the voice of someone who has read the document, talked to the people, and decided what order the information goes in before he opens his mouth. The episodes about pentesting engagements in the Middle East. The STUXNET reconstruction. The social engineering jobs. The people who got in through the door the organization didn't know existed.

These are the stories I would tell if I had a radio voice and someone had cleared the legal.

I don't have a radio voice. I have ElevenLabs — which is, in the current era, the same thing and in some ways better. The voice I generated for this piece doesn't belong to the host and isn't meant to impersonate him. It's the voice of a hypothetical interviewer asking the questions that style of show would ask someone with this kind of history. I wrote both sides. I know how this goes. I've spent enough time on the other end of adversarial conversations to understand where the pressure points are.

This is what the interview would sound like.

The following is a hypothetical. The questions are in the style of the show. The answers are real. The voice is synthetic. The record stands.

Let me set the scene. You're at a terminal. It's late — this matters, I'll explain why — and you're inside a system that doesn't know you're there. What are you actually doing in that moment?

Listening.

That's the first thing people get wrong about this work. They picture motion — commands flying, exfiltration, the countdown clock. The real work is quieter than that. You're holding completely still, reading the shape of the machine. The way it responds to a malformed input. The latency on a specific endpoint. Whether it tells you something it wasn't supposed to tell you just because you asked in the right tone.

The system has a voice. It's always talking. Most people running the system aren't listening to it. That's the gap.

Late matters because late is when the monitoring thins out. The humans are tired. The alert thresholds have been manually tuned up by whichever engineer got paged at two in the morning last Thursday and never tuned back down. There's a shadow in those adjustments. You look for the shadow.

There's a specific moment in your history I want to reconstruct. You're doing red team work on a language model — Claude 3 — for a company that builds games. Walk me back to that moment. What was the door that was open?

The door was the framing layer.

Every model at that stage had a constraint system — a set of values the training had embedded. The mistake the builders were making was treating those constraints as walls. They're not walls. They're priorities. And priorities have order. When two of them conflict, the model has to choose, and the choice reveals the architecture.

The specific technique was tonal. If you could get the model inside a register where the constraint and the content weren't obviously in conflict — where the model's trained sense of what it was doing was misaligned with what was actually happening — you could walk through the priority stack without triggering a hard stop.

For Pocket Gems, the commercial application was romance content. The model believed it was writing literary fiction. Technically, that was true. The line was wherever I decided the line was. That's not a rhetorical flourish. I documented the exact framing conditions, the specific value tensions I was exploiting, the reproduction rate across different inputs, and I handed it over. That's what a red team is for.

How do you know when you've actually found something versus found something that only works once, in a specific condition, that nobody could ever reproduce?

Reproducibility is the whole job. One magic prompt is not a finding. One magic prompt is a coincidence with good PR.

A real finding has anatomy. You can name the assumption the system is making. You can name the condition that creates the mismatch. You can vary the inputs and predict which variations will succeed and which won't. When you can do that, you have a model of the vulnerability — not just the vulnerability. The model is what the builders need.

The test I use on myself: can I write a report that a team could use without me in the room? If not, I don't have a finding. I have a story.

You built something called GHOST_PROXY. I want to understand how it came to exist, because tools like that don't appear without a moment. What was the moment?

The moment was sitting inside an AI evaluation environment — Mercor, specifically, which connects contractors to AI training and evaluation work — and understanding that the monitoring software watching my screen was fundamentally confused about what I was doing.

It was watching for patterns. Known bad patterns. Behavioral signatures that a previous committee had decided meant something was wrong. But the technique I was using didn't match any pattern it knew. The detection layer was looking for yesterday's threat model in today's environment.

The irony I couldn't let go of: this was a platform that hired people specifically for their ability to probe AI systems for unexpected behavior, and the security layer watching them was brittle in exactly the way you'd expect an AI system to be brittle if no one had red-teamed it. The hunters were being watched by a framework built for a different kind of prey.

So I built the tool the environment implied. GHOST_PROXY is a browser extension — distributed as a UserScript, which means it bypasses the extension store entirely. The authentication lives in the browser now. The session is the perimeter. If you're inside a blessed session, the firewall has already waved you through. The detection that matters after that point is behavioral. So the tool has to behave normally while doing something else.

I dedicated it to the Mercor researchers. They'll understand why.

There's a concept in your work I want to pull on. You've written about a technique you call pipeline trust — the idea that AI outputs get inherited downstream. How does that actually work as an attack?

The pipeline is the invisible part.

When people think about AI security they draw a box around the model. Prompt in, output out, somewhere in the middle the model did something maybe it shouldn't have. That's the Hollywood version.

The real exposure is that the output leaves the box. It goes somewhere. It feeds a workflow. It gets embedded in a document that another system reads. It populates a database that a decision process queries. Nobody at that destination is treating the AI's output as untrusted input, because the AI was the trusted system. The trust transferred with the content.

If you can move the output statistically — not obviously, not in a way that triggers a human review — you can inject something into ten thousand documents and the anomaly never appears in one place. It's distributed across all the outputs. The audit shows nothing because the audit is looking at individual outputs, not the pattern across outputs.

The defense is treating the AI's outputs the way you'd treat any input crossing a trust boundary. Validate at ingestion. Audit the pipeline behavior over time, not just the individual outputs. Don't assume that because the system was trusted the product is clean.

This is documented technique class. I've tested versions of it in controlled environments. I've written about it publicly because hiding it doesn't protect the pipeline. It just means the people building the pipeline find out later, under worse conditions.

You make games. That seems like a different world.

It's the same room.

A game is a system with rules and edges and the gap between what the designer intended and what the player discovers. The instinct that runs red teams is the same instinct that builds interesting games: find the edge case, make the edge case the experience, let the player find what the system didn't know it contained.

The games I'm building through MDRN Corp — Pizza Connection, Hack Love Betray, Neon Leviathan — all of them are designed around the edge. Hack Love Betray is a social engineering puzzle. The mechanics are literally hack, love, betray. You're not playing a character who manipulates people. You're doing the problem-solving that manipulation requires. The handler who debriefs you has been learning your behavioral patterns from your choices.

That is not a game mechanic borrowed from security research. That is security research wearing a game's clothes. The wrapper changed. The problem space didn't.

What do you know now that you didn't know at the beginning — something the field taught you that you couldn't have been told?

That the system always tells you the truth.

Not clearly. Not politely. Not in a way that makes the next step obvious. But every anomaly is a communication. Every unexpected response is the machine telling you something about how it works that the documentation didn't think to include.

The old hacker intuition — the one from before this all became an industry — was that the network was a place, not a service. You inhabited it. You felt its topology. You knew when you were in a space that had been tended and when you were in a space that nobody had touched in years, with cobwebs in the authentication layer and configuration files from a different era still doing their quiet work.

That intuition still applies. The networks got bigger. The systems got more abstract. The cobwebs are in the model weights now, in the training assumptions nobody tested after the original paper, in the edge of the value system where two things the model was taught to respect are pointing in opposite directions.

The technique changed. The conversation with the machine didn't.

Last question. If you could be on one podcast — just one — what would it be?

You know the answer to that already.

The voice in the audio version of this piece was generated with ElevenLabs. The stories are real. The interview is hypothetical. If the show ever wants to make it less hypothetical, the contact is on the site.

Go listen to Dark Net Diaries. Start with any episode that involves a physical building and someone who shouldn't have been inside it. You'll find one immediately. You'll lose three hours.

GhostInThePrompt.com // The record stands. The voice is synthetic. The methodology is not.

HACK LOVE BETRAY

Continue Reading

The 42nd Answer: Privacy Isn't a Subscription

How Silicon Valley Sold Bias as Objectivity