Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications

Núñez-Marcos, Adrián; Azkune, Gorka; Arganda-Carreras, Ignacio

doi:10.1007/978-3-030-14647-4_10

Adrián Núñez-Marcos²³,
Gorka Azkune^24,25 &
Ignacio Arganda-Carreras^24,26,27

Part of the book series: Advances in Science, Technology & Innovation ((ASTI))

407 Accesses

Abstract

Being the elder population in constant growth, governments have to cope with higher expenses for elder care from year to year. Helping the elderly to extend their independent lifestyle is of pivotal importance to minimise those costs. That is the goal of the Ambient Assisted Living research field. Through the use of Information and Communication Technologies, it is possible to provide solutions to help the elderly live independently for as long as possible or to predict mental health issues that could seriously harm their independence. The key enablers for these solutions are the egocentric cameras and the egocentric action recognition techniques for the analysis of egocentric videos. This chapter proposes various of those techniques focused on the exploitation of intrinsic egocentric cues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.nih.gov/news-events/news-releases/worlds-older-population-grows-dramatically.

References

Nachwa Aboubakr, James L Crowley, and Rémi Ronfard. Recognizing manipulation actions from state-transformations. arXiv preprint arXiv:1906.05147, 2019.
Ahmad Akl, Jasper Snoek, and Alex Mihailidis. Unobtrusive detection of mild cognitive impairment in older adults through home monitoring. IEEE Journal of Biomedical and Health Informatics, 21(2):339–348, 2015.
Article Google Scholar
Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, and Sergio Escalera. A survey on deep learning based approaches for action and gesture recognition in image sequences. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 476–483. IEEE, 2017.
Google Scholar
Sven Bambach, Stefan Lee, David J Crandall, and Chen Yu. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the IEEE International Conference on Computer Vision, pages 1949–1957, 2015.
Google Scholar
Ardhendu Behera, Matthew Chapman, Anthony G Cohn, and David C Hogg. Egocentric activity recognition using histograms of oriented pairwise relations. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, pages 22–30. IEEE, 2014.
Google Scholar
Ardhendu Behera, David C Hogg, and Anthony G Cohn. Egocentric activity monitoring and recovery. In Asian Conference on Computer Vision, pages 519–532. Springer, 2012.
Google Scholar
Allah Bux, Plamen Angelov, and Zulfiqar Habib. Vision based human activity recognition: a review. In Advances in Computational Intelligence Systems, pages 341–371. Springer, 2017.
Google Scholar
Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
Google Scholar
Alejandro Cartas, Petia Radeva, and Mariella Dimiccoli. Contextually driven first-person action recognition from videos.
Google Scholar
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.
Liming Chen, Jesse Hoey, Chris D Nugent, Diane J Cook, and Zhiwen Yu. Sensor-based activity recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):790–808, 2012.
Google Scholar
Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio W Mayol-Cuevas. You-do, i-learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video. In BMVC, volume 2, page 3, 2014.
Google Scholar
Alireza Fathi and James M Rehg. Modeling actions through state changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2579–2586, 2013.
Google Scholar
Alireza Fathi, Ali Farhadi, and James M Rehg. Understanding egocentric activities. In 2011 International Conference on Computer Vision, pages 407–414. IEEE, 2011.
Google Scholar
Alireza Fathi, Xiaofeng Ren, and James M Rehg. Learning to recognize objects in egocentric activities. In CVPR 2011, pages 3281–3288. IEEE, 2011.
Google Scholar
Alireza Fathi, Yin Li, and James M Rehg. Learning to recognize daily actions using gaze. In European Conference on Computer Vision, pages 314–327. Springer, 2012.
Google Scholar
Amy Fire and Song-Chun Zhu. Learning perceptual causality from video. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):1–22, 2015.
Google Scholar
Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448, 2015.
Google Scholar
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2015.
Article Google Scholar
Georgia Gkioxari, Ross Girshick, and Jitendra Malik. Contextual action recognition with r* cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1080–1088, 2015.
Google Scholar
Nadee Goonawardene, Hwee-Pink Tan, and Lee Buay Tan. Unobtrusive detection of frailty in older adults. In International Conference on Human Aspects of IT for the Aged Population, pages 290–302. Springer, 2018.
Google Scholar
Mary Hayhoe. Vision using routines: A functional account of vision. Visual Cognition, 7(1-3):43–64, 2000.
Article Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
Google Scholar
Hongwen Kang, Martial Hebert, and Takeo Kanade. Discovering object instances from scenes of daily living. In 2011 International Conference on Computer Vision, pages 762–769. IEEE, 2011.
Google Scholar
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, and Remco Veltkamp. Multitask learning to improve egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
Google Scholar
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. Object detection-based location and activity classification from egocentric videos: A systematic analysis. In Smart Assisted Living, pages 119–145. Springer, 2020.
Google Scholar
Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living. Perception, 28(11):1311–1328, 1999.
Article Google Scholar
Yin Li, Zhefan Ye, and James M Rehg. Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 287–295, 2015.
Google Scholar
Yin Li, Miao Liu, and James M Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), pages 619–635, 2018.
Google Scholar
Jun Li, Xianglong Liu, Wenxuan Zhang, Mingyuan Zhang, Jingkuan Song, and Nicu Sebe. Spatio-temporal attention networks for action recognition and detection. IEEE Transactions on Multimedia, 2020.
Google Scholar
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014.
Google Scholar
Yang Liu, Ping Wei, and Song-Chun Zhu. Jointly recognizing object fluents and tasks in egocentric videos. In Proceedings of the IEEE International Conference on Computer Vision, pages 2924–2932, 2017.
Google Scholar
Minlong Lu, Ze-Nian Li, Yueming Wang, and Gang Pan. Deep attention network for egocentric action recognition. IEEE Transactions on Image Processing, 28(8):3703–3713, 2019.
Article MathSciNet Google Scholar
Minlong Lu, Danping Liao, and Ze-Nian Li. Learning spatiotemporal attention for egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
Google Scholar
Minghuang Ma, Haoqi Fan, and Kris M Kitani. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1894–1903, 2016.
Google Scholar
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, and Hans Peter Graf. Attend and interact: Higher-order object interactions for video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6790–6800, 2018.
Google Scholar
Steve Mann. ’wearcam’(the wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In Digest of Papers. Second International Symposium on Wearable Computers (Cat. No. 98EX215), pages 124–131. IEEE, 1998.
Google Scholar
Kenji Matsuo, Kentaro Yamada, Satoshi Ueno, and Sei Naito. An attention-based activity recognition for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 551–556, 2014.
Google Scholar
Tomas McCandless and Kristen Grauman. Object-centric spatio-temporal pyramids for egocentric activity recognition. In BMVC, volume 2, page 3. Citeseer, 2013.
Google Scholar
Ajay K Mishra, Yiannis Aloimonos, Loong Fah Cheong, and Ashraf Kassim. Active visual segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):639–653, 2011.
Google Scholar
Erik T Mueller. Commonsense reasoning: an event calculus based approach. Morgan Kaufmann, 2014.
Google Scholar
Tomoya Nakatani, Ryohei Kuga, and Takuya Maekawa. Preliminary investigation of object-based activity recognition using egocentric video based on web knowledge. In Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia, pages 375–381, 2018.
Google Scholar
Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. Recognition of activities of daily living with egocentric vision: A review. Sensors, 16(1):72, 2016.
Article Google Scholar
Adrián Núñez-Marcos, Gorka Azkune, and Ignacio Arganda-Carreras. Object bounding box annotations for the GTEA Gaze+ dataset, July 2020.
Google Scholar
Hamed Pirsiavash and Deva Ramanan. Detecting activities of daily living in first-person camera views. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2847–2854. IEEE, 2012.
Google Scholar
Iris Rawtaer, Rathi Mahendran, Ee Heok Kua, Hwee Pink Tan, Hwee Xian Tan, Tih-Shih Lee, and Tze Pin Ng. Early detection of mild cognitive impairment with in-home sensors to monitor behavior patterns in community-dwelling senior citizens in singapore: Cross-sectional feasibility study. Journal of Medical Internet Research, 22(5):e16854, 2020.
Google Scholar
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015.
Google Scholar
Liyue Shen, Serena Yeung, Judy Hoffman, Greg Mori, and Li Fei-Fei. Scaling human-object interaction recognition through zero-shot learning. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1568–1576. IEEE, 2018.
Google Scholar
Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pages 568–576, 2014.
Google Scholar
Swathikiran Sudhakaran and Oswald Lanz. Attention is all we need: Nailing down object-centric attention for egocentric activity recognition. arXiv preprint arXiv:1807.11794, 2018.
Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9954–9963, 2019.
Google Scholar
Li Sun, Ulrich Klank, and Michael Beetz. Eyewatchme—3d hand and object tracking for inside out activity analysis. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 9–16. IEEE, 2009.
Google Scholar
Dipak Surie, Thomas Pederson, Fabien Lagriffoul, Lars-Erik Janlert, and Daniel Sjölie. Activity recognition using an egocentric perspective of everyday objects. In International Conference on Ubiquitous Intelligence and Computing, pages 246–257. Springer, 2007.
Google Scholar
Bugra Tekin, Federica Bogo, and Marc Pollefeys. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4511–4520, 2019.
Google Scholar
An Tran and Loong-Fah Cheong. Two-stream flow-guided convolutional attention networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3110–3119, 2017.
Google Scholar
Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, 2013.
Google Scholar
Sagar Verma, Pravin Nagar, Divam Gupta, and Chetan Arora. Making third person techniques recognize first-person actions in egocentric videos. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2301–2305. IEEE, 2018.
Google Scholar
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1):60–79, 2013.
Article MathSciNet Google Scholar
Heng Wang and Cordelia Schmid. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision, pages 3551–3558, 2013.
Google Scholar
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision, pages 20–36. Springer, 2016.
Google Scholar
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119:3–11, 2019.
Article Google Scholar
Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Baidu-uts submission to the epic-kitchens action recognition challenge 2019. arXiv preprint arXiv:1906.09383, 2019.
Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Symbiotic attention with privileged information for egocentric action recognition. arXiv preprint arXiv:2002.03137, 2020.
Michael Wray, Davide Moltisanti, and Dima Damen. Towards an unequivocal representation of actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1127–1131, 2018.
Google Scholar
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, pages 802–810, 2015.
Google Scholar
Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. A comprehensive survey of vision-based human action recognition methods. Sensors, 19(5):1005, 2019.
Article Google Scholar
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.
Yang Zhou, Bingbing Ni, Richang Hong, Xiaokang Yang, and Qi Tian. Cascaded interactional targeting network for egocentric video analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1904–1913, 2016.
Google Scholar
Zheming Zuo, Longzhi Yang, Yonghong Peng, Fei Chao, and Yanpeng Qu. Gaze-informed egocentric action recognition for memory aid systems. IEEE Access, 6:12894–12904, 2018.
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the support of the Basque Government’s Department of Education for the predoctoral funding of the first author. This work has been supported by the Spanish Government under the FuturAAL-Ego project (RTI2018-101045-A-C22) and the FuturAAL-Context project (RTI2018-101045-B-C21) and by the Basque Government under the Deustek project (IT-1078-16-D).

Author information

Authors and Affiliations

Deustotech Institute, University of Deusto, Avenida de las Universidades, No. 24, 48007, Bilbao, Spain
Adrián Núñez-Marcos
Department of Computer Science and Artificial Intelligence, University of the Basque Country, P. Manuel Lardizabal 1, 20018, San Sebastian, Spain
Gorka Azkune & Ignacio Arganda-Carreras
IXA NLP Group, Faculty of Computer Science, University of the Basque Country, P. Manuel Lardizabal 1, 20018, San Sebastian, Spain
Gorka Azkune
Ikerbasque, Basque Foundation for Science, Maria Diaz de Haro 3, 48013, Bilbao, Spain
Ignacio Arganda-Carreras
Donostia International Physics Center (DIPC), Manuel Lardizabal 4, 20018, San Sebastian, Spain
Ignacio Arganda-Carreras

Authors

Adrián Núñez-Marcos
View author publications
You can also search for this author in PubMed Google Scholar
Gorka Azkune
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Arganda-Carreras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrián Núñez-Marcos .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar
Jihad Alja’am
Department of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar
Somaya Al-Maadeed
Department of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar
Osama Halabi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I. (2021). Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications. In: Alja’am, J., Al-Maadeed, S., Halabi, O. (eds) Emerging Technologies in Biomedical Engineering and Sustainable TeleMedicine. Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-14647-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-14647-4_10
Published: 18 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14646-7
Online ISBN: 978-3-030-14647-4
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics