Abstract
Being the elder population in constant growth, governments have to cope with higher expenses for elder care from year to year. Helping the elderly to extend their independent lifestyle is of pivotal importance to minimise those costs. That is the goal of the Ambient Assisted Living research field. Through the use of Information and Communication Technologies, it is possible to provide solutions to help the elderly live independently for as long as possible or to predict mental health issues that could seriously harm their independence. The key enablers for these solutions are the egocentric cameras and the egocentric action recognition techniques for the analysis of egocentric videos. This chapter proposes various of those techniques focused on the exploitation of intrinsic egocentric cues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nachwa Aboubakr, James L Crowley, and Rémi Ronfard. Recognizing manipulation actions from state-transformations. arXiv preprint arXiv:1906.05147, 2019.
Ahmad Akl, Jasper Snoek, and Alex Mihailidis. Unobtrusive detection of mild cognitive impairment in older adults through home monitoring. IEEE Journal of Biomedical and Health Informatics, 21(2):339–348, 2015.
Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, and Sergio Escalera. A survey on deep learning based approaches for action and gesture recognition in image sequences. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 476–483. IEEE, 2017.
Sven Bambach, Stefan Lee, David J Crandall, and Chen Yu. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the IEEE International Conference on Computer Vision, pages 1949–1957, 2015.
Ardhendu Behera, Matthew Chapman, Anthony G Cohn, and David C Hogg. Egocentric activity recognition using histograms of oriented pairwise relations. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, pages 22–30. IEEE, 2014.
Ardhendu Behera, David C Hogg, and Anthony G Cohn. Egocentric activity monitoring and recovery. In Asian Conference on Computer Vision, pages 519–532. Springer, 2012.
Allah Bux, Plamen Angelov, and Zulfiqar Habib. Vision based human activity recognition: a review. In Advances in Computational Intelligence Systems, pages 341–371. Springer, 2017.
Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
Alejandro Cartas, Petia Radeva, and Mariella Dimiccoli. Contextually driven first-person action recognition from videos.
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.
Liming Chen, Jesse Hoey, Chris D Nugent, Diane J Cook, and Zhiwen Yu. Sensor-based activity recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):790–808, 2012.
Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio W Mayol-Cuevas. You-do, i-learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video. In BMVC, volume 2, page 3, 2014.
Alireza Fathi and James M Rehg. Modeling actions through state changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2579–2586, 2013.
Alireza Fathi, Ali Farhadi, and James M Rehg. Understanding egocentric activities. In 2011 International Conference on Computer Vision, pages 407–414. IEEE, 2011.
Alireza Fathi, Xiaofeng Ren, and James M Rehg. Learning to recognize objects in egocentric activities. In CVPR 2011, pages 3281–3288. IEEE, 2011.
Alireza Fathi, Yin Li, and James M Rehg. Learning to recognize daily actions using gaze. In European Conference on Computer Vision, pages 314–327. Springer, 2012.
Amy Fire and Song-Chun Zhu. Learning perceptual causality from video. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):1–22, 2015.
Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448, 2015.
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2015.
Georgia Gkioxari, Ross Girshick, and Jitendra Malik. Contextual action recognition with r* cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1080–1088, 2015.
Nadee Goonawardene, Hwee-Pink Tan, and Lee Buay Tan. Unobtrusive detection of frailty in older adults. In International Conference on Human Aspects of IT for the Aged Population, pages 290–302. Springer, 2018.
Mary Hayhoe. Vision using routines: A functional account of vision. Visual Cognition, 7(1-3):43–64, 2000.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
Hongwen Kang, Martial Hebert, and Takeo Kanade. Discovering object instances from scenes of daily living. In 2011 International Conference on Computer Vision, pages 762–769. IEEE, 2011.
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, and Remco Veltkamp. Multitask learning to improve egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. Object detection-based location and activity classification from egocentric videos: A systematic analysis. In Smart Assisted Living, pages 119–145. Springer, 2020.
Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living. Perception, 28(11):1311–1328, 1999.
Yin Li, Zhefan Ye, and James M Rehg. Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 287–295, 2015.
Yin Li, Miao Liu, and James M Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), pages 619–635, 2018.
Jun Li, Xianglong Liu, Wenxuan Zhang, Mingyuan Zhang, Jingkuan Song, and Nicu Sebe. Spatio-temporal attention networks for action recognition and detection. IEEE Transactions on Multimedia, 2020.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014.
Yang Liu, Ping Wei, and Song-Chun Zhu. Jointly recognizing object fluents and tasks in egocentric videos. In Proceedings of the IEEE International Conference on Computer Vision, pages 2924–2932, 2017.
Minlong Lu, Ze-Nian Li, Yueming Wang, and Gang Pan. Deep attention network for egocentric action recognition. IEEE Transactions on Image Processing, 28(8):3703–3713, 2019.
Minlong Lu, Danping Liao, and Ze-Nian Li. Learning spatiotemporal attention for egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
Minghuang Ma, Haoqi Fan, and Kris M Kitani. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1894–1903, 2016.
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, and Hans Peter Graf. Attend and interact: Higher-order object interactions for video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6790–6800, 2018.
Steve Mann. ’wearcam’(the wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In Digest of Papers. Second International Symposium on Wearable Computers (Cat. No. 98EX215), pages 124–131. IEEE, 1998.
Kenji Matsuo, Kentaro Yamada, Satoshi Ueno, and Sei Naito. An attention-based activity recognition for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 551–556, 2014.
Tomas McCandless and Kristen Grauman. Object-centric spatio-temporal pyramids for egocentric activity recognition. In BMVC, volume 2, page 3. Citeseer, 2013.
Ajay K Mishra, Yiannis Aloimonos, Loong Fah Cheong, and Ashraf Kassim. Active visual segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):639–653, 2011.
Erik T Mueller. Commonsense reasoning: an event calculus based approach. Morgan Kaufmann, 2014.
Tomoya Nakatani, Ryohei Kuga, and Takuya Maekawa. Preliminary investigation of object-based activity recognition using egocentric video based on web knowledge. In Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia, pages 375–381, 2018.
Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. Recognition of activities of daily living with egocentric vision: A review. Sensors, 16(1):72, 2016.
Adrián Núñez-Marcos, Gorka Azkune, and Ignacio Arganda-Carreras. Object bounding box annotations for the GTEA Gaze+ dataset, July 2020.
Hamed Pirsiavash and Deva Ramanan. Detecting activities of daily living in first-person camera views. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2847–2854. IEEE, 2012.
Iris Rawtaer, Rathi Mahendran, Ee Heok Kua, Hwee Pink Tan, Hwee Xian Tan, Tih-Shih Lee, and Tze Pin Ng. Early detection of mild cognitive impairment with in-home sensors to monitor behavior patterns in community-dwelling senior citizens in singapore: Cross-sectional feasibility study. Journal of Medical Internet Research, 22(5):e16854, 2020.
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015.
Liyue Shen, Serena Yeung, Judy Hoffman, Greg Mori, and Li Fei-Fei. Scaling human-object interaction recognition through zero-shot learning. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1568–1576. IEEE, 2018.
Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pages 568–576, 2014.
Swathikiran Sudhakaran and Oswald Lanz. Attention is all we need: Nailing down object-centric attention for egocentric activity recognition. arXiv preprint arXiv:1807.11794, 2018.
Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9954–9963, 2019.
Li Sun, Ulrich Klank, and Michael Beetz. Eyewatchme—3d hand and object tracking for inside out activity analysis. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 9–16. IEEE, 2009.
Dipak Surie, Thomas Pederson, Fabien Lagriffoul, Lars-Erik Janlert, and Daniel Sjölie. Activity recognition using an egocentric perspective of everyday objects. In International Conference on Ubiquitous Intelligence and Computing, pages 246–257. Springer, 2007.
Bugra Tekin, Federica Bogo, and Marc Pollefeys. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4511–4520, 2019.
An Tran and Loong-Fah Cheong. Two-stream flow-guided convolutional attention networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3110–3119, 2017.
Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, 2013.
Sagar Verma, Pravin Nagar, Divam Gupta, and Chetan Arora. Making third person techniques recognize first-person actions in egocentric videos. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2301–2305. IEEE, 2018.
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1):60–79, 2013.
Heng Wang and Cordelia Schmid. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision, pages 3551–3558, 2013.
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision, pages 20–36. Springer, 2016.
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119:3–11, 2019.
Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Baidu-uts submission to the epic-kitchens action recognition challenge 2019. arXiv preprint arXiv:1906.09383, 2019.
Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Symbiotic attention with privileged information for egocentric action recognition. arXiv preprint arXiv:2002.03137, 2020.
Michael Wray, Davide Moltisanti, and Dima Damen. Towards an unequivocal representation of actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1127–1131, 2018.
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, pages 802–810, 2015.
Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. A comprehensive survey of vision-based human action recognition methods. Sensors, 19(5):1005, 2019.
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.
Yang Zhou, Bingbing Ni, Richang Hong, Xiaokang Yang, and Qi Tian. Cascaded interactional targeting network for egocentric video analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1904–1913, 2016.
Zheming Zuo, Longzhi Yang, Yonghong Peng, Fei Chao, and Yanpeng Qu. Gaze-informed egocentric action recognition for memory aid systems. IEEE Access, 6:12894–12904, 2018.
Acknowledgements
We gratefully acknowledge the support of the Basque Government’s Department of Education for the predoctoral funding of the first author. This work has been supported by the Spanish Government under the FuturAAL-Ego project (RTI2018-101045-A-C22) and the FuturAAL-Context project (RTI2018-101045-B-C21) and by the Basque Government under the Deustek project (IT-1078-16-D).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I. (2021). Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications. In: Alja’am, J., Al-Maadeed, S., Halabi, O. (eds) Emerging Technologies in Biomedical Engineering and Sustainable TeleMedicine. Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-14647-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-14647-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14646-7
Online ISBN: 978-3-030-14647-4
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)