AI-Generated Face Swapping Video Technology Rampant in China

By Shawn Lin
Shawn Lin
Shawn Lin
Shawn Lin is a Chinese expatriate living in New Zealand. He has contributed to The Epoch Times since 2009, with a focus on China-related topics.
April 22, 2022 Updated: April 22, 2022

News Analysis

Do you ever worry about identity theft? Well, now there’s “face swapping,” the latest AI technology that could get us all into a whole lot more trouble. It’s not only fast, but can also mix the spurious with the genuine. The barriers to setting up this technology are getting lower and lower, and a fake video of you doing or saying something can be synthesized in China for a small fee.

According to an April 14 report by Shanghai-based media The Paper, it is now very easy to generate and make dynamic videos of human faces on the computer. Last year in Hefei City, Anhui Province, the police found that several people were able to create a dynamic video from static photos. The characters produced through this simulation not only nod and move their heads, but also make expressions like winking, opening the mouth, frowning, etc. The effect is extremely realistic.

Chinese police found more than a dozen gigabytes of face data on one group’s computers, with face photos and ID photos stored in different folders according to different categories. “Photos of the front and back of the ID card, photos of people holding ID cards, self-portraits, etc., are called a set.” A complete set of photos is called “materials.” Due to the ease and simplicity of production, the cost of a video is only 2 to 10 yuan (about $0.30 to $1.55).

The eight people involved in this case were said to not be highly educated, some had not even finished high school. They downloaded the software according to online tutorials and spent months teaching themselves to use it.

In addition to obtaining other people’s photos, there are also people who buy other people’s voices and other “materials.” Only a small audio and video sample is needed to synthesize fake audio and video, which is comparable to real images and sounds.

Therefore, as long as you get data such as a person’s photo and voice recording, video and audio of this person can be generated on the computer. In other words, a video and audio of anyone may not be authentic or real.

Zhu Jun, the director of the Basic Theory Research Center of the Institute of Artificial Intelligence of Tsinghua University, said that the rapid development of deep synthesis technology has made “seeing no longer believing.”

There is even a “face changing tutorial” on YouTube. The presenter uses a photo of President Joe Biden and inserts a video of a Chinese singer. It only takes a few minutes to generate a video of President Biden singing songs in Chinese. It is difficult to determine the authenticity from the realistic way the lips move.

A popular science article in China’s Shanghai Science and Technology Museum introduces the specific method of AI face changing. The article said that in addition to facial features, factors such as age, gender, personality, and emotions will be reflected on the face. All these factors are considered parameters in the artificial intelligence neural network model. The computer obtains these parameters by learning and summarizing a large number of pictures—a process called “training.”

The article said that in order to make the pictures or videos look more realistic, AI face changing is usually realized with the help of “Generative Adversarial Networks” (GAN) and usually consist of two parts. One is the Generative Model (G for short); the other is the Discriminative Model (D for short), which discriminates the faces synthesized by the generative model. The two are trained using a large number of pictures, and they try to challenge each other. Once the discriminant model cannot judge the authenticity anymore, it means that the training is relatively mature. When the deep model is trained and matured, one can input a human face, generate various expressions on the face, and add in other pictures or videos to achieve seamless integration.

This technology was originally used for the post-production of film and television works. Later, some people used the technology to play pranks, using photos of celebrities and politicians to “face-swap” them into videos, and to synthesize speeches that didn’t happen. In addition, some pornography websites also swap faces of celebrities into photos or videos, causing them a lot of trouble.

In China, it was discovered that some people have used the technology to register mobile phone cards or deceive payment systems with fake IDs. They do this to evade or counter the Chinese authorities’ large-scale surveillance system. In China, real-name registration is required to purchase a mobile phone card, and real-name authentication is required to speak online or even buy a kitchen knife. Thus, by changing faces one can evade the supervision of the CCP.

Shawn Lin
Shawn Lin is a Chinese expatriate living in New Zealand. He has contributed to The Epoch Times since 2009, with a focus on China-related topics.