But note that proposed open-source machine learning approaches have not yielded sufficient accuracy appropriate for psychophysics experiments ( Zhang et al., 2017 Lemley et al., 2018).
Here, machine learning-based approaches could estimate such complex relations between the position of facial landmarks and the point of gaze ( Ranjan et al., 2018). However, when under various head poses, this transformation would differ as a function of the specific head pose. This is a straightforward problem when the subject’s head is stationary, leading to a simple transformation of facial landmarks’ relative positions to the point of gaze. Here, machine-learning techniques are often used to map the extracted coordinates of facial landmarks to the respective point of gaze ( Lu et al., 2017).
The appearance-based approach uses only the video camera’s image data. (2008) eliminated the errors induced by the 3D model, using a novel mathematical approach. Guestrin and Eizenman (2006) documented that pose-invariance of gaze estimation is highly dependent on the number of cameras and infrared light sources, suggesting a stronger pose-invariance when using more than a single infrared source (see Eyelink 1000 system-SR Research, Mississauga, ON, Canada as an example of a pose-invariant model-based eye-tracking system with an array of infrared LEDs). Model-based approaches calculate the point of gaze using a 3D model of the eye and the reflected infrared patterns on the cornea. There are two major approaches to video camera-based eye-tracking: model-based and appearance-based. While there are different existing technologies to eye tracking on the market, an affordable and practical technology is still missing, limiting the use of this technique to a broader audience. This application has been the focus of classic studies ( Yarbus, 1967), and more recently, the approach has been used not only as a clinical tool to detect neurological and neuropsychiatric disorders by studying the patients’ gaze patterns ( Adhikari and Stark, 2017), but also has been shown to be useful in every-day applications, such as analyzing the trustworthiness of phishing emails ( McAlaney and Hills, 2020). Tracking the point of gaze as a window to the internal state of the human mind is a key requirement in cognitive tasks, where it is important to control the attention of human subjects. Our results contribute to the growing field of deep-learning approaches to eye-tracking, laying the foundation for further investigation by researchers in psychophysics or neuromarketing. Tested for three extreme poses, this architecture reached a median error of about one degree of visual angle. Using DeepLabCut (DLC), an open-source toolbox for extracting points of interest from videos, we obtained facial landmarks critical to gaze location and estimated the point of gaze on a computer screen via a shallow neural network. Here, we introduce a deep learning-based approach which uses the video frames of low-cost web cameras. Many of the modern eye-tracking solutions are expensive mainly due to the high-end processing hardware specialized for processing infrared-camera pictures. Real-time gaze tracking provides crucial input to psychophysics studies and neuromarketing applications.
2Faculty of Biology and Psychology, University of Göttingen, Göttingen, Germany.1Cognitive Neuroscience Lab, German Primate Center – Leibniz Institute for Primate Research, Göttingen, Germany.Niklas Zdarsky 1*, Stefan Treue 1,2,3,4 and Moein Esghaei 1*