Usually, it takes an expensive equipment and expertise to create an accurate 3D reconstruction of someone’s face that’s realistic and doesn’t look creepy. Now, Carnegie Mellon University researchers have gotten th breakthrough by using video recorded on an ordinary smartphone.
Using a smartphone to shoot a continuous video of the front and sides of the face produces a packed stream of data. A two-step method created by CMU’s Robotics Institute uses that data, with some help from deep learning algorithms, to build a digital reorganize of the human face. The team’s experiments show that their method can achieve sub-millimeter accuracy, which is much better compared to other camera-based processes.
A digital face might be used to build an avatar for gaming or for virtual or augmented reality, and could also be used in animation, biometric identification and even medical procedures. An accurate 3D version or sketch of the face can possibly be useful in building respirators or customized surgical masks.
“Developing a realistic 3D reconstruction of the face has been an open challenge in computer vision and graphics because of people’s sensitive facial recognition features,” said Simon Lucey, which is an associate research professor in the Robotics Institute. “Even small error in the reconstructions can cause the end result to appear unrealistic.”
Laser scanners, structured light and multicamera studio setups can generate highly accurate scans of the face, but these specialized sensors are prohibitively expensive for most applications. CMU’s newly developed method, however, makes use smartphones only.
The new method, in which Lucey discovered together with master’s students Shubham Agrawal and Anuj Pahuja, was made available in early March at the IEEE Winter Conference on Applications of Computer Vision (WACV) in Snowmass, Colorado. It started with shooting 15-20 seconds of video. In this case, the researchers used an iPhone X in the slow-motion setting.
According to Lucey, the high frame rate of slow motion is one the key features of their method because it creates a dense point cloud.
The researchers then employ a commonly used technique called visual simultaneous localization and mapping (SLAM). Visual SLAM triangulates points on a surface to calculate its shape, while at the same time using that information to ascertain the position of the camera. This creates an initial geometry of the face, while missing data leave gaps in the model.
In the second step of this method, the researchers now find a way to bridge those gaps created by the missing data, first by using deep learning algorithms. There is limit to which Deep learning can be used. However, it can be used to identify person’s profile and landmarks such as ears, eyes and nose. Another technique employ was Classical computer vision which was then used to fill in the gaps.
“Deep learning is a Strong tool that we use on daily bases,” Lucey said. “But deep learning has a disposition to memorize solutions,” which works against efforts to include distinguishing details of the face. “If you use these algorithms just to create the landmarks, it will easier for to use classical computer vision method to fill in the gaps.”
The method isn’t necessarily quick; it took 30-40 minutes of processing time. But the entire process can be performed on a smartphone.
In addition to face reconstructions, the CMU team’s methods might also be employed to capture the geometry of almost any object, Lucey said. Digital reconstructions of those objects can then be incorporated into animations or perhaps transmitted across the internet to sites where the objects could be duplicated with 3D printers.
Source: Carnegie Mellon University