Photographers know this problem intimately, even if they've never framed it as a security issue.
You shoot on location. You strip EXIF before delivering. You know GPS coordinates are in there, you know how to remove them, you do it. Clean file. Job done.
Except the image itself still knows where it was taken.
The distinctive ironwork on that balcony. The particular species of palm that only grows in specific coastal regions. The condensation-fogged window with a street reflection in it. The menu board in the background with a restaurant name that, if you know the area, places the shot within two blocks. A good OSINT analyst with local knowledge doesn't need your GPS coordinates. They have the frame.
That's the attack surface Amnesia was built to address.
What Metadata Stripping Doesn't Touch
exiftool -all= image.jpg handles the structured fields. It does nothing about the pixels.
The visual geolocation problem is older than AI — there are whole communities built around identifying photo locations from environmental cues. What changed is that doing it at scale now takes seconds per image instead of minutes, and the inference quality on regional identifiers has gotten sharp enough to catch things even experienced photographers miss.
Amnesia runs that same inference pipeline in reverse: instead of identifying where an image was taken, it surfaces the specific elements that could be used to identify it, and gives you the tooling to remediate before the asset goes anywhere.
How Vision-Language Models Actually Read an Image
This is the part worth understanding, because it's what makes the modern threat different from anything that came before.
VLMs — models like Gemini Vision, GPT-4V, LLaVA — don't process images the way earlier computer vision systems did. They don't run a landmark detector and match against a database of known buildings. They reason about the image the same way a knowledgeable human would, drawing on a training corpus that includes enormous amounts of geotagged photography, travel writing, architectural documentation, botanical data, and regional signage.
The model has seen enough images of Haussmann-style ironwork to know which arrondissements use it. Enough images of coastal vegetation to distinguish Malibu scrub from Florida palms. Enough streetwear photography to recognize that a particular boutique exists in exactly one city.
It's not pattern matching. It's the same contextual inference a seasoned photo editor would apply — except it runs in two seconds and scales to an entire site audit.
This is what makes the remediation problem hard: the localization signal isn't in a header field you can zero out. It's distributed across the entire frame, in elements that look like background detail to a human reviewer who isn't specifically looking for them.
How the Scan Works
The web interface takes a target URL and scans it for images and video. Each asset goes through high-inference analysis — landmarks, street signage, architectural details, regional foliage, reflections, boutique signatures — anything that creates a localization vector. The model flags what it found and why, and you decide what to remediate.
The "Wipe & Download" function applies heavy visual redaction to flagged regions and saves the cleaned version. It's not a crop or a low-quality resize. It's a targeted blur on the specific elements the scanner identified, leaving the rest of the image intact.