Low-resolution face recognition
- 24 July, 2017
- Raymond Veldhuis, Professor Biometric Pattern Recognition
During the last 10 years, automatic face recognition has matured into a reliable and accepted technology for people recognition. For instance, in the e-gates at Amsterdam Airport it is used for border control, specifically to verify whether the person who carries a passport is the rightful holder of that passport. This is verified by automatically comparing a facial image stored on the chip of the passport with a life scan of the face of the carrier of the passport. If these images are sufficiently similar, it is decided that they must originate from the same individual, and hence that the carrier of the passport must be the person whom it was issued to. If not, a border-control official will look into the case.
Under controlled conditions, i.e. good quality frontal images with uniform illumination and a neutral expression, state-of-the-art face recognition systems are generally considered reliable and sometimes even more accurate at comparing faces than humans are. Gradually we see that the recognition performance of these systems is also improving under less favourable conditions. Many systems are nowadays capable of handling some variations of pose and illumination. In particular, the face recognition systems based on deep neural nets as used by Google and Facebook for face recognition applications on social media show impressive results.
For surveillance applications, especially when we are dealing with very low-resolution facial images, the situation is different and remains challenging. Improved face recognition performance is relevant in this domain as it can help to identify offenders better from surveillance video footage. Resolution is often expressed in pixels between the eye centres. For passport photographs, a minimum required eye-to-eye distance is 60 pixels, but at least 90 pixels is recommended (ISO/IEC 19794-5 ISO standard on biometric data formats). In surveillance videos we find facial images with eye-to-eye distances below 10 pixels. An illustration of surveillance images is given in Figure 1.
Figure 1 Surveillance images at varying distances. The gallery image is a high-resolution image taken at 1 meter that serves as a reference.
We observe that the recognition performance of high-end face recognition systems, which heavily rely on high-resolution details abruptly drops if the image resolution decreases. In fact, we also see that more simple face recognition systems are less sensitive to the decrease of image resolution and at low resolutions actually outperform the high-end systems, but there is a huge gap between the face recognition performance achieved for surveillance images, compared to that achieved for high-resolution images. For instance, on high-resolution, high quality images, a state-of-the-art system will have a true-match rate (the probability that two images from the same person are classified as such) of 99% at a false-match rate (the probability that two facial images of different individuals are classified as coming from the same person) of 0.1%. For surveillance images with an eye-to-eye distance of of 10 pixels the state-of-the-art is a true-match rate of 70% at a false-match rate (the probability that two facial images of different individuals are classified as coming from the same person) of 10%, which is much less accurate.
The, yet unanswered, question is: ‘How and in how far can we improve the face recognition performance for very low-resolution surveillance images in order to make it better suitable for law-enforcement applications?’ As said, the question is still open, but below we present a few directions to possible solutions.
Common face recognition systems compare images of similar sizes. When dealing with surveillance data, a comparison must be made between a high-resolution image, e.g. a mug shot of a suspect, and a low-resolution surveillance image. This calls for a system that is capable of comparing images at different resolutions, maybe with different properties.
Proper design and training
Face recognition systems are classifiers that must be designed for trained on relevant data. Figure [ref] illustrates that low-resolution surveillance images differ in more aspects from down-scaled high-resolution images than just by resolution. Many low-resolution face recognition systems that are proposed in the literature are trained on down-scaled high-resolution images, and fail when tested on real surveillance images. This implies that effort must be spend on collecting proper training data.
Figure 2 Left: Real low-resolution surveillance images. Middle: High-resolution images. Right: Down-scaled high-resolution images. All images are displayed with the same size. It is clear that Real low-resolution images differ in many aspects from down-scaled images.
Proper alignment and pose correction
Prior to comparison by a face recognition system, the images are aligned and small deviations from frontal pose are corrected. This is commonly based on land marks. Specific feature points in the face, such as eyes, nose, mouth corners that can easily be found in high-resolution images. These landmarks are used to define operations on the image that take care of alignment and pose correction. On low-resolution images, these landmarks can often not be found, or are very inaccurate, which results in poor alignment and pose correction. Therefore, landmark-independent methods for alignment and pose correction are needed.
Deep neural nets
Deep neural nets form a new category of classifiers that have been applied with great success in many applications, including face recognition. They could possibly also be applied successfully in low-resolution face recognition, but they need extensive relevant training data, in the order of millions of training samples. As we remarked above, at present this data is not available in the right quantities and generating it by downscaling high-resolution data will not help much because the of the mentioned differences between real low-resolution surveillance data and down-scaled data. When the data problem has been resolved, mixed resolution deep neural nets can be designed and trained for this problem.
With these steps, the gap in recognition performance between high-resolution face recognition and low-resolution face recognition will not be closed completely, but first experiments indicate that it can be made substantially smaller.
The Delft Safety & Security Institute (DSyS) is partner of Amsterdam Security, 31 October – 2 November in RAI Amsterdam.