标题: EG-Net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism
作者: Wu, XM (Wu, Xinmei); Li, L (Li, Lin); Zhu, HH (Zhu, Haihong); Zhou, G (Zhou, Gang); Li, LF (Li, Linfeng); Su, F (Su, Fei); He, S (He, Shen); Wang, YG (Wang, Yanggang); Long, X (Long, Xue)
来源出版物: EXPERT SYSTEMS WITH APPLICATIONS 卷: 238 文献号: 122363 DOI: 10.1016/j.eswa.2023.122363 提前访问日期: NOV 2023 子辑: F 出版年: MAR 15 2024
摘要: Gaze estimation, which has a wide range of applications in many scenarios, is a challenging task due to various unconstrained conditions. As information from both full-face and eye images is instrumental in improving gaze estimation, many multiregion gaze estimation models have been proposed in recent studies. However, most of them simply use the same regression method on both eye and face images, overlooking that the eye region may contribute more fine-grained features than the full-face region, and the variation in the left and right eyes of an individual caused by head pose, illumination, and partially occluded eye may lead to inconsistent estimations. To address these issues, we propose an appearance-based end-to-end learning network architecture with an attention mechanism, named efficient gaze network (EG-Net), which employs a two-branch network for gaze estimation. Specifically, a base CNN is utilized for full-face images, while an efficient eye network (EE-Net), which is scaled up from the base CNN, is used for left-and right-eye images. EE-Net uniformly scales up the depth, width and resolution of the base CNN with a set of constant coefficients for eye feature extraction and adaptively weights the left-and right-eye images via an attention network according to its "image quality". Finally, features from the full-face image, two individual eye images and head pose vectors are fused to regress the eye gaze vectors. We evaluate our approach on 3 public datasets, the proposed EG-Net model achieves much better performance. In particular, our EG-Net-v4 model outperforms state-of-the-art approaches on the MPIIFaceGaze dataset, with prediction errors of 2.41 cm and 2.76 degrees in 2D and 3D gaze estimation, respectively. It also yields a performance improvement to 1.58 cm on GazeCapture and 4.55 degrees on EyeDIAP dataset, with 23.4 % and 14.2 % improvement over prior arts on the two datasets respectively. The code related to this project is open-source and available at https://github.com/wuxinmei/EE_Net.git.
作者关键词: Gaze estimation; Appearance-based method; EG-Net; Attention mechanism; Compound model scaling
地址: [Wu, Xinmei; Li, Lin; Zhu, Haihong; Zhou, Gang] Wuhan Univ, Sch Resource & Environm Sci, Wuhan, Peoples R China.
[Li, Linfeng; Wang, Yanggang] Wuhan Highway Technol Corp, Bldg B3,Zone 2,Hangyu,WHU Sci Pk,Wudayuan RdEast L, Wuhan, Peoples R China.
[Su, Fei] Shandong Jianzhu Univ, Sch Surveying & Geoinformat, Jinan 250101, Peoples R China.
[He, Shen] Wuhan Metro Operat Co Ltd, 99 Jinghan Ave, Wuhan, Peoples R China.
[Long, Xue] Beijing Inst Struct & Environm Engn, 1 Donggaodi South Dahongmen Rd, Beijing, Peoples R China.
通讯作者地址: Li, L (通讯作者),Wuhan Univ, Sch Resource & Environm Sci, Wuhan, Peoples R China.
电子邮件地址: xinmwu@whu.edu.cn; lilin@whu.edu.cn; hhzhu@whu.edu.cn; 2014301130059@whu.edu.cn; linfengl@hwtc.com.cn; sufei21@sdjzu.edu.cn; shenhe09@qq.com; yanggangw@hwtc.com.cn; dragonme1@126.com
影响因子:8.5