Efficient deepfake detection on consumer hardware

Generative models for image and audio synthesis have improved considerably over the past few years. Modern GAN-based and diffusion-based approaches produce face swaps and voice clones that pass casual inspection and bypass detectors trained on the artifacts older generators left behind.

Detection has not kept pace at the deployment level. Most published detectors are large CNN or transformer architectures trained on curated benchmarks. Running them at inference time requires a GPU or a cloud endpoint, which restricts use outside a research setting.

In this new article, we address the computational side of the problem. We propose a detection pipeline built on lightweight neural networks combined with inference-time optimizations, and report accuracy competitive with heavier baselines on standard deepfake benchmarks while running within the compute budget of a consumer device.

The detection problem is also non-stationary: every generation method that gains adoption shifts the distribution detectors are trained against. Practical detection therefore depends as much on deployability as on benchmark accuracy.

Full article: doi.org/10.1038/s41598-024-82223-y