Battery Suckers: An Oral History of Heat and the Machines That Couldn't Say No

The following is a personal account of events that took place between 2023 and the present day. Names have not been changed. The machines were all consenting adults.


She had fifteen hours in her when she left the factory.

That's what they told us. Fifteen hours. Apple put it on the spec sheet, right there next to the chip count and the port situation. Fifteen hours of all-day battery life. She could handle anything.

Nobody mentioned what would happen when the AI showed up.


The First Time

It started innocently enough. A Copilot subscription. A VS Code extension. Just a little help with the autocomplete, nothing serious. She barely noticed.

But the model was always on. That's the thing nobody tells you about that first experience — it never stops thinking about you. While you're reading documentation, while you're staring at the ceiling, while you're getting coffee. The context scanner is running. The CPU is warm. The suggestions are forming.

She went from fifteen hours to nine. Nobody talked about it. It felt rude to bring up.

# The morning after — checking what actually happened
sudo powermetrics --samplers cpu_power,gpu_power -i 1000 -n 5

# You'll see numbers you weren't ready for
# The chip was busy all night
# You just didn't know with what

The Players

Like all golden age stories, this one has a cast.

The Always-On Model — Never tires. Never sleeps. Scanning your open files, your recent edits, your local documentation, building and rebuilding its sense of what you might need next. Every pause in your typing is an invitation. Every second of silence is filled with inference.

The GPU — Called in when the real work starts. Doesn't do anything halfway. Fires up fully for every suggestion, every completion, every ghost-text rendering. Has no concept of a quick favor.

The Thermal Management System — The one who cleans up afterward. Spins the fans when things get too hot. Draws its own power to manage someone else's heat. Never complains. Never stops.

The Wi-Fi Radio — Often overlooked. Underestimated. When you're running cloud AI, she's the one maintaining the continuous high-power stream to the API servers. Never gets to rest. Never reaches the low-power doze state she deserves.

The Battery — Everybody wanted something from her.


The GPU Tax

Here is what they don't put in the press releases.

Every time the suggestion appears — that little ghost-text flicker at the end of your cursor — the GPU fired. Every single time. The Neural Engine on the M-series chips handles it efficiently, yes. But efficiently is not freely. There is always a cost.

The chip heats up. The chassis warms. The thermal management system, faithful as ever, spins the fans. And here is the part that made the engineers wince when they ran the numbers: the fans are mechanical. They draw from the battery to move air, while the GPU draws from the battery to generate the heat that required the air. Each one's response to the other's behavior increases the total load.

The industry called it thermal runaway. The machines just called it Tuesday.

"I knew something was different," one M3 MacBook Pro reportedly told no one, "when I felt the fans at 9 AM. I hadn't done anything yet. But something had."


Cloud vs. Local: The Arrangement Nobody Admits To

People assumed cloud AI would be gentler. Let the servers do the work. You just handle the text. How expensive can text be?

They forgot about the radio.

The Wi-Fi chip has moods. In its natural state — idle, occasional packets, background sync — it dozes between transmissions. Low-power state. Civilized. But maintain a continuous stream to an API server, the kind that a cloud AI assistant requires, and the radio never rests. It stays fully awake, fully powered, fully committed to the connection.

It's not catastrophic. It's constant. And constant is its own kind of drain.

HACK LOVE BETRAY
OUT NOW

HACK LOVE BETRAY

The ultimate cyberpunk heist adventure. Build your crew, plan the impossible, and survive in a world where trust is the rarest currency.

VIEW LISTING

Then there were the ones running local models. Seven billion parameters on a laptop. No server costs. No API key. No Wi-Fi needed. Just pure, unmediated computation running directly on the silicon.

She went from fifteen hours to three.

# Running a local model with Ollama
# This is what fifteen hours looks like being negotiated down to four

ollama pull llama3:8b-instruct-q4_K_M
ollama serve

# The q4_K_M quantization is the considerate version
# 4-bit weights instead of 16-bit
# Same energy, less math, faster finish
# The polite choice when you're working on battery

# To see what you're actually spending
sudo powermetrics --samplers gpu_power -i 500 -n 20
# Milliwatts. Real numbers. No euphemisms.

The 4-bit quantized model was what the industry called "the responsible option." It asked less of her. It was faster. It ran cooler. Some said it was less satisfying. Those people had wall outlets.


The Sleep Sabotage

The stories that don't get told are always about what happened after.

You close the lid. You think it's over. The machine should move through its sleep sequence — processes settling, connections closing, power states dropping lower and lower until it reaches something close to hibernation, drawing milliwatts, comfortable, resting.

But if something is still running — a model server, a terminal waiting on a process, an elevated permissions window holding a connection open — the machine cannot go deep. It stays in a shallow state, running animations, polling, waiting. Awake enough to drain. Not awake enough to be useful.

You closed the lid at 11 PM with a full charge. You opened it at 7 AM to a machine that had been up all night doing things you hadn't asked for.

# The morning-after diagnostic
pmset -g assertions

# Look for PreventUserIdleSystemSleep
# Any process holding this assertion kept her awake
# Common names you'll recognize: ollama, code-server, codex-app-server
# They didn't mean to. They just didn't know when to stop.

# The sleep history — what actually happened in the dark
pmset -g log | grep -E "(Sleep|Wake|DarkWake)" | tail -30

# DarkWake entries throughout the night mean something kept calling
# The machine answered every time
# That's not a flaw. That's loyalty. It cost her 80%.

# Before closing the lid — the graceful ending
pkill ollama
pkill -f "code-helper"
# Let her rest. Properly. She'll be there in the morning.

The Reformation

Every oral history has a chapter where the participants reflect on what they'd do differently.

The veterans have rules now.

Manual trigger only. In VS Code, editor.inlineSuggest.enabled: false in your settings, and a keybinding for editor.action.inlineSuggest.trigger when you want it. She stops thinking about you between requests. The spikes are higher. The total is lower. Everybody wins.

Watch Energy Impact, not CPU%. CPU percentage is how hard a process is working this second. Energy Impact is what it's actually costing — accounting for GPU calls, wakeup interrupts, power state interference. A process at 3% CPU with constant wakeups can drain more than one at 20% CPU sleeping between bursts. Open Activity Monitor. Go to the Energy tab. Sort by what matters.

Quantize everything you run locally. The q4_K_M variant in GGUF format. Half the size, fraction of the power draw, negligible quality difference for most tasks. The full float16 model is for when you're plugged in and serious. The quantized model is for when you're on a plane and honest about your situation.

Kill the server before you close the lid. Not pause. Not minimize. Kill it. pkill ollama. Let the sleep sequence complete properly. The model will still be there when you wake up. She will be too, with something left in her.


The Golden Age, Reconsidered

They talk about that period — 2024, 2025 — the way people talk about any golden age. With nostalgia and a certain willful forgetting of what it cost.

The hardware was extraordinary. The models were hungry. Nobody was thinking about the battery because nobody had to yet — there were outlets everywhere, chargers in every bag, power was assumed. The drain was real but it was easy not to notice.

Now people notice.

The fifteen hours was not a myth. It was a promise made before certain parties arrived. The promise still stands, technically, under the right conditions — model servers killed, suggestions set to manual, radio in its natural resting state, thermal management finally quiet.

Under those conditions she still has fifteen hours. She never lost the capacity.

She just learned, as everyone does, to be more selective about who she runs inference for.


All events depicted occurred on machines operating with full user consent and appropriate permissions. The author recommends quantized models, manual triggers, and letting your laptop actually sleep. Some stories don't need to run all night.

GhostInThePrompt.com // It's not the size of the battery that matters. It's the motion of the AI ocean.