Abstract
Making machines to anticipate human action is a complex research problem. Some of the recent research studies on computer vision and assistive driving have reported that the anticipation of driver’s action few seconds in advance is a challenging problem. These studies are based on the driver’s head movement tracking, eye gaze tracking, and spatiotemporal interest points. The study is aimed to address an important question of how to anticipate a driver’s action while driving and improve the anticipation time. The goal of this study is to review the existing deep learning framework for assistive driving. This paper differs from the existing solutions in two ways. First, it proposes a simplified framework using the driver’s inside video data and develops a driver’s movement tracking (DMT) algorithm. Majority of the existing state of the art is based on inside and outside features of the vehicles. Second, the proposed work tends to improve the image pattern recognition by introducing a fusion of spatiotemporal data points (STIPs) for movement tracking along with eye cuboids and then action anticipation by using deep learning. The proposed DMT algorithm tracks the driver’s movement using STIPs from the input video. Also, a fast eye gaze algorithm tracks eye movements. The features extracted from STIP and eye gaze are fused and analyzed by a deep recurrent neural network to improve the prediction time, thereby giving a few extra seconds to anticipate the driver’s correct action. The performance of the DMT algorithm is compared with the previous algorithms and found that DMT offers 30% improvement with regards to anticipating driver’s action over two recently proposed deep learning algorithms.
Similar content being viewed by others
References
Uddin, M.T., Uddiny, M.A.: Human activity recognition from wearable sensors using extremely randomized trees. In: 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE (2015)
Jalal, A., Kim, J.T., Kim, T.-S.: Human activity recognition using the labeled depth body parts information of depth silhouettes. In: Proceedings of the 6th International Symposium on Sustainable Healthy Buildings, Seoul, Korea. vol. 27 (2012)
Farooq, F., Ahmed, J., Zheng, L.: Facial expression recognition using hybrid features and self-organizing maps. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE (2017)
Jalal, A., et al.: Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ 22(1), 271–279 (2013)
Ahad, M.A.R., Kobashi, S., Tavares, J.M.R.S.: Advancements of image processing and vision in healthcare. J. Healthc. Eng. 2018, 8458024 (2018)
Jalal, A.: Security architecture for third generation (3 g) using gmhs cellular network. In: International Conference on Emerging Technologies, 2007. ICET 2007. IEEE
Jalal, A., Rasheed, Y.A.: Collaboration achievement along with performance maintenance in video streaming. In: Proceedings of the IEEE Conference on Interactive Computer Aided Learning, Villach, Austria. vol. 2628 (2007)
Jacobsen, D., Ott, P.: Cloud architecture for industrial image processing: platform for realtime inline quality assurance. In: 2017 IEEE 15th International Conference on Industrial Informatics (INDIN). IEEE (2017)
Mazhar, M., Rathore, U., et al.: Real-time continuous feature extraction in large size satellite images. J. Syst. Archit. 64, 122–132 (2016)
Farooq, A., Jalal, A., Kamal, S.: Dense RGB-D map-based human tracking and activity recognition using skin joints features and self-organizing map. KSII Trans. Internet Inf. Syst. 9(5), 1856–1869 (2015)
Jalal, A., Kim, S.: Global security using human face understanding under vision ubiquitous architecture system. World Acad. Sci. Eng. Technol. 13, 7–11 (2006)
Yoshomoto, H., Date, N., Yonemoto, S.: Vision-based real-time motion capture system using multiple cameras. In: Proceedings IEEE conference on multisensor fusion and integration for intelligent systems (2003)
Kamal, S., Jalal, A.: A hybrid feature extraction approach for human detection, tracking and activity recognition using depth sensors. Arab. J. Sci. Eng. 41(3), 1043–1051 (2016)
Jalal, A., Shahzad, A.: Multiple facial feature detection using vertex-modeling structure. In: Proceedings of the IEEE Conference on Interactive Computer Aided Learning, Villach, Austria. vol. 2628 (2007)
Huang, Q., Yang, J., Qiao, Y.: Person re-identification across multi-camera system based on local descriptors. In: 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC). IEEE (2012)
Li, Y., Xia, R., Huang, Q., Xie, W., Li, X.: Survey of spatio-temporal interest point detection algorithms in video. 5 (2017)
Mur, O., Frigola, M., Casals, A.: Modelling daily actions through hand-based spatio-temporal features. 978-1-4673-7509-2/15 (2015)
Happy, S.L., Routray, A.: Fuzzy histogram of optical flow orientations for micro-expression recognition. (2015) https://doi.org/10.1109/taffc.2017.2723386
Cuong, N.H., Hoang, H.T.: Eye-gaze detection with a single WebCAM based on geometry features extraction. In: 2010 11th International Conference on Control, Automation, Robotics and Vision Singapore, 7–10 December 2010
George, A., Routray, A.: Fast and accurate algorithm for eye localization for gaze tracking in low resolution images. IET Comput. Vis. 10(7), 660–669 (2016)
Hsiao, P.-Y., S.-S. Chou, F.-C. Huang: Generic 2-D Gaussian smoothing filter for noisy image processing. In: TENCON 2007-2007 IEEE Region 10 Conference
Piyathilaka, L., Kodagoda, S.: Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE (2013)
Jalal, A., Kamal, S., Kim, D.: A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14(7), 11735–11759 (2014)
Jalal, A., et al.: Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 61, 295–308 (2017)
Jalal, A., Kamal, S., Kim, D.: Shape and motion features approach for activity tracking and recognition from kinect video camera. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (WAINA). IEEE (2015)
Jalal, A., Kim, Y.: Dense depth maps-based human pose tracking and recognition in dynamic scenes using ridge data. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE (2014)
Jalal, A., Kamal, S., Kim, D.: Individual detection-tracking-recognition using depth activity images. In: 2015 12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI). IEEE (2015)
Rezaei, M., Klette, R.: Look at the driver, look at the road: no distraction! no accident!. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Kumar, P., Perrollaz, M., Lefevre, S., Laugier, C.: Learning-based approach for online lane change intention prediction. In: IEEE International Vehicle Symposium Proceedings (2013)
Frohlich, B., Enzweiler, M., Franke, U.: Will this car change the lane?—Turn signal recognition in the frequency domain. In: IEEE International Vehicle Symposium Proceedings (2014)
Doshi, A., Morris, B., Trivedi, M.M.: On-road prediction of driver’s intent with multimodal sensory cues. IEEE Pervasive Comput. (2011)
Bhatt, D., Gite, S. (2016) Novel driver behavior model analysis using hidden Markov model to increase road safety in smart cities. In: ACM Conference ICTCS’16
Shia, V., Gao, Y., Vasudevan, R., Campbell, K.D., Lin, T., Borrelli, F., Bajcsy, R.: Semiautonomous vehicular control using driver modeling. IEEE Trans. Intell. Transp. Syst. 15(6), 2696–2709 (2014)
Vasudevan, R., Shia, V., Gao, Y., Cervera-Navarro, R., Bajcsy, R., Borrelli, F.: Safe semi-autonomous control with enhanced driver modeling. In: American Control Conference (2012)
Jabon, M.E., Bailenson, J.N., Pontikakis, E.D., Takayama, L., Nass, C.: Facial expression analysis for predicting unsafe driving behavior. IEEE Pervasive Comput. 10, 84–95 (2011)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Wang, Z., Mülling, K., Deisenroth, M., Amor, H., Vogt, D., Schölkopf, B., Peters, J.: Probabilistic movement modeling for intention inference in human–robot interaction. Int. J. Robot. Res. 32(7) (2013)
Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. IEEE Trans. Pattern Anal. Mach. Intell. (2015)
Zhang, C., Zhang, Z.: A survey of recent advances in face detection. Technical report, Microsoft Research (2010)
Koppula, H., Saxena, A.: Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation. In: Proceedings of the International Conference on Machine Learning (2013)
Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: ICCV, vol. 38, issue 1 (2015)
Jain, A., Koppula, H.S., Soh, S., Raghavan, B., Singh, A., Saxena, A.: Brain4Cars: car that knows before you do via sensory-fusion deep learning architecture. In: ICCV, vol. 38 (2016)
Chen, I.-K., et al.: A real-time system for object detection and location reminding with rgb-d camera. In: 2014 IEEE International Conference on Consumer Electronics (ICCE). IEEE (2014)
Kamal, S., Jalal, A., Kim, D.: Depth images-based human detection, tracking and activity recognition using spatiotemporal features and modified HMM. J. Electr. Eng. Technol 11(3), 1921–1926 (2016)
Maria L, Massaru L, Ferreira E.: Digital image processing in remote sensing. In: Proceedings of Conference on Computer Graphics and Image Processing (2009)
Jalal, A., Kim, Y., Kim, D.: Ridge body parts features for human pose estimation and recognition from RGB-D video data. In: 2014 International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE (2014)
Jalal, A., et al.: Human activity recognition via the features of labeled depth body parts. In: International Conference on Smart Homes and Health Telematics. Springer, Berlin, Heidelberg (2012)
Jalal, A., Kim, J.T., Kim, T.-S.: Development of a life logging system via depth imaging-based human activity recognition for smart homes. In: Proceedings of the International Symposium on Sustainable Healthy Buildings, Seoul, Korea. vol. 19 (2012)
Procházka, A., et al.: Satellite image processing and air pollution detection. In: Proceedings of 2000 IEEE international conference on acoustics, speech, and signal processing, ICASSP’00, vol. 4. IEEE (2000)
Virmani, S., Gite, S.: Developing a novel algorithm for identifying driver’s behaviour in ADAS using deep learning. IJCTA 10(8), 573–579 (2017)
Bergasa, L.M., Almería, D., Almazán, J., Yebes, J.J., Arroyo, R.: Drivesafe: an app for alerting inattentive drivers and scoring driving behaviors. In: Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014, pp 240–245
Wijayagunawardhane, N.R.B., Jinasena, S.D., Sandaruwan, C.B., Dharmapriya, W.A.N.S., Samarasinghe, R.: SmartV: intelligent vigilance monitoring based on sensor fusion and driving dynamics. In: Proceedings of the 2013 8th IEEE International Conference on Industrial and Information Systems (ICIIS), Peradeniya, Sri Lanka, 17–20 December 2013, pp. 507–512
Tango, F., Botta, M.: Real-time detection system of driver distraction using machine learning. IEEE Trans. Intell. Transp. Syst. 14, 894–905 (2013)
Dong, Y., Hu, Z., Uchimura, K., Murayama, N.: Driver inattention monitoring system for intelligent vehicles: a review. IEEE Trans. Intell. Transp. Syst. 12, 596–614 (2011)
Bosch urban. http://bit.ly/1feM3JM. Accessed 23 April 2015
Gite, S., Agrawal, H.: Early prediction of driver’s action using deep neural networks. Int. J. Inf. Retr. Res. (2019). https://doi.org/10.4018/ijirr. (in press)
Al-Sultan, S., Al-Bayatti, A., Zedan, H.: Context-aware driver behavior detection system in intelligent transportation systems. IEEE Trans. Veh. Technol. 62, 4264–4275 (2013)
Gite, S., Aggrawal, H.: On context awareness for multisensor data fusion in IOT. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 85–93
Martin, S., et al.: Dynamics of Driver’s gaze: explorations in behavior modeling and maneuver prediction. IEEE Trans. Intell. Veh. 3(2), 141–150 (2018)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) Human Behavior Understanding. Lecture Notes in Computer Science, vol. 7065, pp. 29–39. Springer, Berlin (2011)
Dong, W., Li, J., Yao, R., Li, C., Yuan, T., Wang, L.: Characterizing driving styles with deep learning. (2016). arXiv:1607.03611
Wu, D., Sharma, N., Blumenstein, M.: Recent advances in video-based human action recognition using deep learning: a review. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Jalal, A., et al.: Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 61, 295–308 (2017)
Hammerla, N.Y., Halloran, S., Ploetz, T.; Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016)
Nweke, H.F., et al.: Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst. Appl. 105 (2018)
Bux, A.: Vision-based human action recognition using machine learning techniques. Dissertation. Lancaster University (2017)
Fridman, Lex, et al. “MIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based Analysis of Driver Behavior and Interaction with Automation.” arXiv preprint arXiv:1711.06976 (2017)
Martinez, C.M., et al.: Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey. IEEE Trans. Intell. Transp. Syst. 19(3), 666–676 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Wu, H., et al.: Human activity recognition based on the combined svm&hmm. In: 2014 IEEE International Conference on Information and Automation (ICIA). IEEE (2014)
Tang, K., Zhu, S., Xu, Y., Wang, F.: Modeling drivers’ dynamic decision-making behavior during the phase transition period: an analytical approach based on hidden Markov model theory. IEEE Trans. Intell. Transp. Syst. (2015). https://doi.org/10.1109/tits.2015.2462738
Svozil, D., Kvasnicka, V., Pospichal, J.: Introduction to multi-layer feed-forward neural networks. Chem. Intell. Lab. Syst. 39(1), 43–62 (1997)
https://towardsdatascience.com/deeplearning-feedforward-neural-network-26a6705dbdc7. Accessed 3 Sept 2018
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gite, S., Agrawal, H. & Kotecha, K. Early anticipation of driver’s maneuver in semiautonomous vehicles using deep learning. Prog Artif Intell 8, 293–305 (2019). https://doi.org/10.1007/s13748-019-00177-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-019-00177-z