Deepfakes and AI: Understanding the Technology…

How deepfake technology actually works, the harm it’s already doing, and why the AI tools meant to catch synthetic media keep falling a step behind.

The Hong Kong call that ended my faith in video

February 2024, a finance worker at Arup sits down for what looks like a routine video call. Arup is one of the biggest engineering firms going. On screen are the chief financial officer and a handful of colleagues, all nodding along in their little rectangles, all asking for a wire transfer. So the worker sends it. $25 million, gone to criminals. Every face on that call except the worker’s own was generated by AI, and Hong Kong police only confirmed the fraud after the money had already left the building.

I’ve covered tech for about fifteen years, and plenty of so-called existential threats have rolled through and quietly fizzled. This one I can’t shake. The people building generative models, the folks running voice-authentication systems at banks, the academics in forensics labs keep landing on roughly the same uncomfortable point: the tools for making a convincing fake get cheaper and easier every month, and our ability to catch them isn’t keeping pace. Not even close, from what I can tell.

So somewhere along the way I stopped trusting what I see on a screen by default. Maybe that reads as paranoid. Probably is, a little. But after months of poking at how this stuff actually functions under the hood, I’m honestly not sure paranoia isn’t the rational setting right now.

What a deepfake actually is, minus the mystique

The word smashes “deep learning” together with “fake,” and at the plainest level that’s the whole story. Synthetic media (a video, a chunk of audio, a still image) made or altered by a machine.

The mechanics get more interesting once you look closer. For years most deepfakes leaned on generative adversarial networks, GANs for short. Picture two neural networks locked in a fight. One churns out fake content. Its rival tries to flag that content as fake. They volley back and forth, and the generator keeps improving until its output fools the detector every time. Here’s the part that sits with me: if a network purpose-built to spot fakes throws up its hands, what odds does a tired human on a Thursday-afternoon call really have? Slim ones.

GANs stopped being the only game a while ago, mind you. Diffusion models, the same family behind Stable Diffusion from Stability AI and OpenAI’s DALL-E 3, have wandered into deepfake work. They learn by adding noise to training data and then reversing the process, conjuring fresh content out of what is basically static. The results are realistic in a way that’s hard to sit with. Then you’ve got variational autoencoders, or VAEs, which squeeze a face down into a mathematical shorthand and decode it onto somebody else’s head. Real-time face-swapping, no supercomputer required.

Why dwell on the plumbing? Because five years back, a convincing deepfake meant pricey hardware, real skill, and days of render time. That’s all collapsed. Open-source kits like DeepFaceLab and FaceFusion happily run on an ordinary gaming GPU. A teenager with a decent PC and a couple of YouTube tutorials can knock out a face-swap clip in under an hour. The jump from hard-to-do to almost-trivial is the whole reason this keeps me up.

The voice problem is worse than the video one

I think the audio side is scarier, and I doubt I’m alone in that.

Outfits like ElevenLabs, Resemble AI, and Descript’s Overdub can clone a voice from as little as three seconds of sample audio. Three seconds. That’s shorter than the throwaway voicemail greeting you’ve recorded a dozen times. Feed the model that scrap and it’ll generate your voice saying whatever the operator wants, complete with the breathing, the little emotional shifts, the pauses you’d swear were real.

Pindrop, the voice-authentication company run by Vijay Balasubramaniyan, handles this exact problem for major banks, and the picture from there is grim. Voice deepfakes are turning up in social-engineering attacks on bank call centers, and the rate’s been doubling roughly every six months. Human agents can no longer reliably tell a cloned voice from a real one; only the AI detection layer catches them, and even that needs near-constant retraining. Across north of five billion calls a year, the company finds synthetic voice in about one out of every 2,000 calls to financial institutions. Sounds like a rounding error until you do the multiplication.

Banks aren’t the only target. In January 2024, robocalls went out to New Hampshire voters carrying a cloned version of President Biden’s voice, telling them to skip the primary. Investigators traced the audio to a political consultant who’d run it through ElevenLabs. His whole production cost roughly a dollar. One dollar. The FCC later ruled that AI-generated voices in robocalls count as “artificial” under the Telephone Consumer Protection Act, which makes them illegal without consent. Enforcing that across a million phone lines is, of course, an entirely separate headache.

Elections are where this gets ugly fast

Slovakia’s September 2023 parliamentary vote is the case study I keep returning to. About 48 hours before polls opened, a deepfake audio clip landed online, supposedly catching the liberal candidate Michal Simecka discussing how to rig the election and, of all the absurd details, raising the price of beer. Pure fabrication. But it tore through social media during a mandated quiet period, the stretch when candidates legally couldn’t respond or push back. Simecka’s party lost. Whether that clip swung the result is genuinely debatable; the timing was either very lucky or very deliberate.

Since Slovakia, synthetic political clips have surfaced in elections across Argentina and Bangladesh, in Indonesia and Pakistan, in the United States, and the shape repeats itself each time. A fake drops at the worst possible moment. It outruns every fact-checker. And even once it’s debunked, the doubt it planted just sits there. Researchers at Oxford’s Internet Institute gave the effect a name: the “liar’s dividend.” Once everybody knows deepfakes exist, real footage can be dismissed as fake while fakes get sold as real. The technology doesn’t even need to be deployed in a given case to do damage. Knowing it’s out there is enough to corrode the trust.

Plenty of misinformation researchers are openly nervous about the 2026 US midterms, and I don’t blame them. The tech has improved enormously since 2024. Detection is stuck in an arms race it keeps losing. And the social platforms have hollowed out the trust-and-safety teams that might once have caught a synthetic clip before it went viral. Nobody seems to have a clean answer.

Not every use is sinister

Worth saying plainly, because the alarm can drown it out: a fair amount of this work is legitimate. Hollywood has done face replacement for years. They de-aged Robert De Niro in “The Irishman.” Peter Cushing came back as Grand Moff Tarkin for “Rogue One.” Paul Walker’s character got to finish “Furious 7” after the actor’s death. Handled with care, in plenty of cases.

The capability has since blown past anything Industrial Light & Magic was managing a few years ago. You can now generate whole synthetic performances that are nearly impossible to separate from a real actor on screen. That leap is precisely what pushed performers onto the picket line during the 2023 SAG-AFTRA strike. Their worry, and I’d say it was a fair one, was that a studio could scan an actor’s face and body in a single session, then spin up unlimited performances forever without paying them again. A resulting contract did add consent and compensation rules for digital replicas. Enforcement hasn’t really been stress-tested yet. And the protections only reach SAG-AFTRA members anyway. Voice actors in smaller markets, performers overseas, the independent creator grinding away alone? They’ve got nothing.

Music’s wrestling with the same knot. AI tracks aping Drake and The Weeknd have gone viral on TikTok and Spotify. Back in April 2023, a song called “Heart on My Sleeve,” riding AI-cloned vocals of Drake and The Weeknd, pulled millions of streams before Universal Music Group got it pulled down. The legal footing here strikes me as genuinely murky. Copyright protects a specific recording and a specific composition. Someone cloning the identity of your voice with AI? The law hasn’t squarely addressed that one.

The detectors are losing ground

Sometimes they catch a fake. Not reliably, though, and the gap’s widening.

The early fakes practically waved at you. Strange blinking. Light that fell wrong. Blurry seams where the face met the neck. Teeth that looked rendered on a Nintendo 64. Those tells have mostly evaporated. A modern deepfake handles micro-expressions, skin texture, the way hair moves, the way light wraps a cheekbone, all with an accuracy that, frankly, unsettles me.

Detection tools do exist, and some are genuinely clever. Microsoft’s Video Authenticator walks through a clip frame by frame, hunting for blending boundaries and grayscale inconsistencies a human eye can’t register. Intel built a thing called FakeCatcher that uses photoplethysmography, which is a long word for reading the faint color shifts in skin as blood pulses through the capillaries underneath; deepfakes don’t reproduce that signal cleanly. The Media Forensic Lab at the University of Buffalo studies spectral frequencies and compression artifacts. All of it works, to a point. Every one of these methods also wants the original media file, real processing time, and an operator who knows what they’re staring at.

The deeper trouble is the lopsidedness of the whole arrangement. Making a fake takes minutes; verifying one can eat hours. And every time a detector learns a new tell, the generation side patches it on the next release. Hao Li, a computer science professor at UC Berkeley and one of the most cited deepfake researchers anywhere, has put the bind about as bluntly as anyone: detection is a losing game over the long haul, the field will always be playing catch-up, and the real fix has to be provenance, proving content is authentic at the moment it’s created rather than trying to prove it’s fake after the fact. That framing has stuck with me.

Li’s point gets at what a growing share of researchers see as the smarter long game. Stop chasing fakes after they ship. Instead, build a verifiable chain of custody for genuine content and prove a thing is real from the instant it’s captured.

The Coalition for Content Provenance and Authenticity, C2PA, is the main body pushing this. Adobe and Microsoft founded it alongside Intel and the BBC. They’ve built an open standard that bakes cryptographic metadata straight into a media file as it’s created: which device shot it, when, where, whether anyone has edited it since. All signed, all checkable.

Camera makers have started coming along. Nikon, Sony, and Leica have shipped bodies with built-in C2PA signing. Adobe’s Content Credentials surfaces provenance data right inside Photoshop and the rest of Creative Cloud. Google announced in late 2025 that Android devices would support C2PA metadata natively starting with Android 16. Good steps, every one. But adoption is patchy, and the standard only does anything if the platforms actually read and display the provenance, which most social networks still haven’t committed to with any consistency.

Watermarking comes at the same goal from another direction. Google DeepMind’s SynthID stamps invisible marks into AI-generated images and audio; specialist tools can read them, while regular viewers notice nothing. Meta has folded similar watermarking into its own AI output. The snag is that a watermark can be stripped or worn away by a screenshot, by compression, by a format conversion. And watermarking only ever labels the synthetic stuff. It does precisely nothing to prove that human-made content is the real thing.

Where the law stands

Honestly? A bit of a mess.

The US still has no federal statute aimed squarely at deepfakes. Bills have been floated. One of them, the DEFIANCE Act, which appeared in January 2024, would hand victims of non-consensual deepfake pornography a federal civil cause of action. A separate AI Labeling Act would require disclosure whenever content is AI-made. As of early 2026, neither has passed. Some states moved quicker; Texas, California, Virginia, and a dozen or so others have laws covering specific deepfake uses, mostly revenge pornography and election interference.

Europe’s AI Act began landing in stages from 2024, labeling rules for AI-generated content among them. China went further with its Deep Synthesis Provisions, demanding consent from anyone whose likeness ends up in synthetic media and requiring a visible label on every deepfake. Fine on paper. Cross-border enforcement is basically a fantasy, though. A fake produced in a jurisdiction with zero rules reaches every corner of the internet in seconds.

Legal scholars tend to reach for phrases like “wildly inadequate” when describing where things sit. Danielle Citron, a professor at the University of Virginia School of Law and one of the sharpest legal minds working on this, has framed it roughly this way: the law always lags technology, but the gap here is unusually dangerous, because we’re dealing with something that can undermine elections, wreck reputations, and enable fraud at scale, and our legal machinery was designed for a world where seeing was believing.

Who actually gets hurt the most

This is the hardest part to write, and I think the political and financial coverage tends to crowd it out of view.

A 2023 study by Home Security Heroes found that 98% of deepfake videos online are pornographic, and 99% of the victims are women. Overwhelmingly the tech is being used to lift a woman’s face from her social-media photos and paste it onto explicit content she never consented to. The psychological damage, and I’ve read firsthand accounts I haven’t been able to forget, runs deep and lasts.

South Korea hit a breaking point in 2024 when it emerged that students, plenty of them minors, were generating and circulating deepfake sexual images of their own classmates and teachers through Telegram channels. More than 500 schools were caught up in it. Emergency legislation followed, but by then the scale had already shown itself. Comparable patterns have surfaced in schools across the US and the UK, and again in Australia and India. The tools have gotten so simple that children are producing this material, often without fully grasping what they’re doing or the harm they’re handing out.

For a victim, the aftermath is brutal in a way that’s easy to underestimate. Once a deepfake image is loose, scrubbing it off the internet is close to impossible. A search engine might delist the links, sure, but the files survive on servers, in private group chats, on platforms in countries that ignore takedown requests entirely. Anxiety, depression, pulling back from people, careers knocked sideways; the toll mirrors other forms of image-based abuse, except it’s sharpened by the knowledge that the means of making these images keeps getting more accessible, never less.

What a regular person can actually do

Fair question, and the answer’s a little frustrating, because there’s no single fix.

Start from skepticism. Not cynicism, just skepticism. If a clip surfaces right before an election, or seems precision-built to make you furious, or shows up with no clear source attached, sit on it before you share. Check whether established outlets are reporting the same claim. Hunt down the original if you can. I know that sounds basic. It is basic. Most people still skip it.

Then shrink your own attack surface. The more high-resolution footage of you that’s floating around publicly, the easier it is to assemble a convincing fake of you. I’m not telling you to nuke your whole online presence; that ship sailed for most of us years ago. But maybe think twice before posting crisp video with clean audio. Tighten the privacy settings. Limit who can download what. Small moves, and they do add up.

Back the provenance standards where you get the chance. If a platform or a device offers content-authenticity features, switch them on. When you’re trying to judge whether something’s real, look for C2PA credentials or other provenance markers. And lean on the platforms you use to actually implement and show that data, because user pressure is one of the few levers that reliably moves a tech company.

Push for better law, too. Contact your representatives. Support groups like the Cyber Civil Rights Initiative and the Electronic Frontier Foundation, which are working on policy that doesn’t overcorrect. The rules written over the next few years will probably govern this technology for a long stretch.

Is anyone building a defense that holds?

Some people are trying hard. Whether it holds is an open question.

The DARPA-funded Semantic Forensics program, SemaFor, pulls researchers from a spread of universities and private companies into building integrated detection systems, the kind that fuse facial dynamics with audio spectral analysis and linguistic patterns and metadata checks inside a single platform. The idea is to stop leaning on any one signal and move toward multi-modal authentication that’s much harder for a generator to beat all at once. It looks promising. Whether it’ll hold as the generation side keeps improving, nobody can say yet.

Startups are circling too. Reality Defender, a New York company founded in 2021, sells real-time deepfake detection for enterprise video calls, exactly the sort of thing that might have flagged the Arup scam before the twenty-five million walked out the door. Truepic, out of San Diego, does photo and video verification through cryptographic provenance, and insurance companies and humanitarian groups have picked it up. Hive AI sells content-moderation APIs with deepfake detection baked in, already wired into several social platforms.

But I keep coming back to a single shared caveat, the one nearly everyone working in the field repeats. Detection is necessary. It’s also nowhere near enough on its own. You’d need the technology and the regulation and the platform accountability, plus media literacy taught in schools, plus some cultural norm about sharing synthetic media responsibly. None of those pieces carries the weight alone. Stacked together, all of them just might. Maybe.

Where this is heading

Real-time deepfake generation, swapping a face and a voice live during a call rather than in a pre-baked clip, already works in lab conditions and in a few semi-commercial tools. As compute gets cheaper and the models get leaner, doing it during an ordinary video call slides toward trivial. That rewrites the whole threat model. Being wary of a clip making the rounds on social media is one thing. Wondering whether the person you’re actually talking to, live, this second, is who they appear and sound like? That’s a different kind of vertigo.

I’d bet this collides with augmented and virtual reality in ways none of us have fully thought through. Inside an immersive environment, the seam between real and synthetic blurs further still. And the generative models keep getting better, so the distance between authentic and fabricated media keeps shrinking, plausibly down to a point where even the best forensic tools can’t call it.

I don’t have a tidy ending, because there isn’t one to hand you. Deepfakes pick at something we’ve taken as a given for more than a century: that a photo records a real event, that a video captures an actual moment, that the voice on the phone belongs to the person you picture. We built journalism and courts and police work and elections on the quiet assumption that recorded media could stand as evidence. On my commute the other morning I caught myself half-listening to a radio interview and wondering, just for a second, whether the voice was even the person it claimed to be. A year ago that thought wouldn’t have crossed my mind. That small involuntary doubt, multiplied across a few billion people, might end up being the real story here, long before any single fake makes the news.

Deepfakes and AI: Understanding the Technology Behind Digital Deception

The Hong Kong call that ended my faith in video

What a deepfake actually is, minus the mystique

The voice problem is worse than the video one

Elections are where this gets ugly fast

Not every use is sinister

The detectors are losing ground

Where the law stands

Who actually gets hurt the most

What a regular person can actually do

Is anyone building a defense that holds?

Where this is heading

(0) Comments

Leave a Comment Cancel reply

The Hong Kong call that ended my faith in video

What a deepfake actually is, minus the mystique

The voice problem is worse than the video one

Elections are where this gets ugly fast

Not every use is sinister

The detectors are losing ground

Where the law stands

Who actually gets hurt the most

What a regular person can actually do

Is anyone building a defense that holds?

Where this is heading

(0) Comments

Leave a Comment Cancel reply

Related Articles

Machine Learning in Weather Prediction: Accuracy Like Never Before

AI-Powered Robotics: The Future of Manufacturing

How Natural Language Processing Is Breaking Language Barriers