Actor Tom Hanks has played a wide range of characters over the years, yet we always recognize him as Tom Hanks. Why? Is it his appearance? His mannerisms? The way he moves?
A new study gets us closer to an answer by showing that it’s possible for machine learning algorithms to capture the “persona” and create a digital model of a well-photographed person like Tom Hanks from the vast number of images of them available on the Internet.
With enough visual data to mine, the algorithms can also animate the digital model of Tom Hanks to deliver speeches that the real actor never performed.
“One answer to what makes Tom Hanks look like Tom Hanks can be demonstrated with a computer system that imitates what Tom Hanks will do,” says lead author Supasorn Suwajanakorn, a graduate student in computer science and engineering at the University of Washington.
The technology relies on advances in 3-D face reconstruction, tracking, alignment, multi-texture modeling, and puppeteering that have been developed over the last five years by a research group led by Ira Kemelmacher-Shlizerman, assistant professor of computer science and engineering.
The team’s latest advances include the ability to transfer expressions and the way a particular person speaks onto the face of someone else—for instance, mapping former president George W. Bush’s mannerisms onto the faces of other politicians and celebrities.
It’s one step toward a grand goal: Create fully interactive, 3-D digital personas from family photo albums and videos, historic collections, or other existing visuals.
Learning ‘In the Wild’
As virtual and augmented reality technologies develop, researchers envision using family photographs and videos to create an interactive model of a relative living overseas or a far-away grandparent, rather than simply Skyping in two dimensions.
“You might one day be able to put on a pair of augmented reality glasses and there is a 3-D model of your mother on the couch,” says senior author Kemelmacher-Shlizerman. “Such technology doesn’t exist yet—the display technology is moving forward really fast—but how do you actually re-create your mother in three dimensions?”
One day the reconstruction technology could even be taken a step further.
“Imagine being able to have a conversation with anyone you can’t actually get to meet in person—LeBron James, Barack Obama, Charlie Chaplin—and interact with them,” says coauthor Steve Seitz, professor of computer science and engineering. “We’re trying to get there through a series of research steps. One of the true tests is can you have them say things that they didn’t say but it still feels like them? This paper is demonstrating that ability.”
Existing technologies to create detailed 3-D holograms or digital movie characters like Benjamin Button often rely on bringing a person into an elaborate studio. They painstakingly capture every angle of the person and the way they move—something that can’t be done in a living room.
Other approaches still require a person to be scanned by a camera to create basic avatars for video games or other virtual environments. But computer vision experts wanted to digitally reconstruct a person based solely on a random collection of existing images.
To reconstruct celebrities like Tom Hanks, Barack Obama, and Daniel Craig, the machine learning algorithms mined a minimum of 200 Internet images taken over time in various scenarios and poses—a process known as learning “in the wild.”
“We asked, ‘Can you take Internet photos or your personal photo collection and animate a model without having that person interact with a camera?'” says Kemelmacher-Shlizerman. “Over the years we created algorithms that work with this kind of unconstrained data, which is a big deal.”
Suwajanakorn more recently developed techniques to capture expression-dependent textures—small differences that occur when a person smiles or looks puzzled or moves his or her mouth, for example.
By manipulating the lighting conditions across different photographs, he developed a new approach to densely map the differences from one person’s features and expressions onto another person’s face. That breakthrough enables the team to “control” the digital model with a video of another person, and could potentially enable a host of new animation and virtual reality applications.
“How do you map one person’s performance onto someone else’s face without losing their identity?” asks Seitz. “That’s one of the more interesting aspects of this work. We’ve shown you can have George Bush’s expressions and mouth and movements, but it still looks like George Clooney.”
The research, presented this week at the International Conference on Computer Vision in Chile, was funded by Samsung, Google, Intel, and the University of Washington.