Last Updated on March 29, 2021 by Larious
Spotting faces in a scene is easy when the punims are nice and close to the camera. But what about group shots where the faces are tiny? That, I fear, robots have a harder time with.
A new research project by Deva Ramanan, associate professor of robotics, and Peiyun Hu, a Ph.D. student in robotics at Carnegie Mellon, fixes that problem by assessing the context of images. Instead of just looking for two eyes and a mouth the system looks for bodies, arms, legs, and other parts that suggest that a face might be in the offing.
“It’s like spotting a toothpick in someone’s hand,” sand Ramanan. “The toothpick is easier to see when you have hints that someone might be using a toothpick. For that, the orientation of the fingers and the motion and position of the hand are major clues.”
When used the system “reduced error by a factor of two” and 81% of the faces found were actual faces “compared with 29 to 64 percent for prior methods.” This means, for example, your phone won’t face swap you with your cat. This system can also spot small faces in a crowd, and allows for better headcount.
From the release:
￼The method that he and Hu developed uses “foveal descriptors” to encode context in a way similar to how human vision is structured. Just as the center of the human field of vision is focused on the retina’s fovea, where visual acuity is highest, the foveal descriptor provides sharp detail for a small patch of the image, with the surrounding area shown as more of a blur.
By blurring the peripheral image, the foveal descriptor provides enough context to be helpful in understanding the patch shown in high focus, but not so much that the computer becomes overwhelmed. This allows Hu and Ramanan’s system to make use of pixels that are relatively far away from the patch when deciding if it contains a tiny face.
Now, perhaps, we’ll finally know just how many people are in a crowd for, say, a football game, a party, or an inauguration.