Meet the company working with the Air Force to detect deepfakes

DeepMedia plans to release a public deepfake detection product by December 2022 or early 2023.

August 22, 2022

• 5 min read

Two of the ways DeepMedia makes money: by creating deepfakes and by detecting them.

Rijul Gupta, a former machine learning engineer, co-founded the Bay Area startup in 2017 with Emma Brown as a way to communicate in Hindi with his extended family. The tool they built, DubSync, currently allows someone to appear to be speaking any one of 10 languages, using translation, vocal synthesis, dubbing, and facial animation.

Now, the company claims it’s working with some of the biggest names in streaming, but would not share specific clients or partnerships. According to Gupta, it’s currently on track to generate between $300,000 and $500,000 in revenue this year.

In the years since DeepMedia’s debut, the team—now a group of about 20—has also begun pursuing its other mission: detecting synthetic audio and video. And they’re doing it via partnerships, including one with the Air Force Research Laboratory, the division’s primary science R&D arm and part of the Department of Defense. Announced in April, the grant involves developing deepfake detectors for faces, voices, and aerial imagery. One of the ways DeepMedia trains these detector tools is by constantly creating datasets of advanced deepfakes, using the company’s own generation tools.

By December 2022 or early 2023, DeepMedia plans to release a public deepfake detection product: a web-based tool that allows individual consumers or enterprises—for example, political candidates looking to verify or prove video authenticity—to pay to upload content and receive a report explaining whether the content is falsified, the algorithm that was initially used to create the deepfake, and how the company came to that conclusion.

As of now, Gupta said, the company’s deepfake detectors work at about 95% accuracy across “most deepfake modalities on synthetic faces, voices and aerial imagery.” He said the company won’t release them until they’re at 99% accuracy.

“The worst thing for us would be to release a tool, and it says something’s fake when it’s actually real, or it says something’s real, when it’s actually fake,” Gupta said, noting that “no machine learning algorithm is ever 100% accurate—those types of issues will always be present.”

Diving into detection

There are already a range of deepfake detection tools out there, so DeepMedia had competition for the DOD grant. To win, the team demonstrated that the deepfakes it had generated could only be detected by the company’s own detectors and not by other existing tools. Gupta attributes its success, in part, to how quickly overall deepfake quality is advancing, and that existing deepfake datasets—like Deepfake Detection Challenge Dataset (DFDC), DeeperForensics, and FaceForensics—can become obsolete in just a few years’ time.

“[Existing datasets] don’t represent the modern quality of deepfakes that we’re seeing, like deepfake Zelenskyy video, for example,” Gupta told us, referencing a viral synthetic video of Vladimir Zelenskyy from March, in which the Ukrainian president appears to tell the country’s troops to surrender. “The algorithms used to create that deepfake are not in the deepfake datasets that are publicly available.”

Those datasets also are significantly lacking in data for Black and Latino individuals, and the data skews male, Gupta said—meaning that tools trained solely on those datasets could tend to classify videos of minorities as authentic, even when they’re synthetic.

DeepMedia uses those existing datasets as benchmarking tools for its model training.

The company’s AI detection process starts with a lot of pre-processing steps. Say you uploaded a video that seemed to feature President Biden and wanted to check its authenticity: First, the models would need to detect the president’s face in that video, analyze facial landmarks, pick out the voice and extract it from background noise, and more—all before the deepfake detection begins. Next, the content is run through a series of detectors, including a binary classifier of real versus fake for both video and audio. Finally, the content is examined by convolutional neural networks, which attempt to pick out which algorithm, or algorithms, were used to create a deepfake.

DeepMedia is also beginning to use vision-based Transformer models—the same model architecture Google uses to pull up relevant search results—to build detection networks. Transformer models can train 10 times faster than the convolutional neural networks that the company currently uses, Gupta said, and end up with similar accuracy metrics.

The company is still building out its detectors’ capabilities by developing new datasets, but there’s still the potential for bias against vulnerable communities.

“It’s not perfect—we don’t have an exact equal number between all different races,” he said. “But it’s close enough to being equal, where we can get accuracy metrics that do work. So as opposed to having 100 videos of people of color and 100,000 of white people, it might be 50,000 [of] white people and 55,000 [of] people of color.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.