Abstract
This paper presents a method to reduce the time spent by a robot with cognitive abilities when looking for objects in unknown locations. It describes how machine learning techniques can be used to decide which places should be inspected first, based on images that the robot acquires passively. The proposal is composed of two concurrent processes. The first one uses the aforementioned images to generate a description of the types of objects found in each object container seen by the robot. This is done passively, regardless of the task being performed. The containers can be tables, boxes, shelves or any other kind of container of known shape whose contents can be seen from a distance. The second process uses the previously computed estimation of the contents of the containers to decide which is the most likely container having the object to be found. This second process is deliberative and takes place only when the robot needs to find an object, whether because it is explicitly asked to locate one or because it is needed as a step to fulfil the mission of the robot. Upon failure to guess the right container, the robot can continue making guesses until the object is found. Guesses are made based on the semantic distance between the object to find and the description of the types of the objects found in each object container. The paper provides quantitative results comparing the efficiency of the proposed method and two base approaches.
Similar content being viewed by others
Notes
Let us assume that a robot located in a room \(r_1\) is supposed to approach a table \(t_1\), located in room \(r_2\) to fetch a bottle of water for a user. A possible plan could comprise, moving to room \(r_2\), then approaching table \(t_1\) and finally detecting a bottle of water on it. Let us also assume that another bottle of water gets into the field of view of the robot as it moves towards room \(r_2\). If and only if the bottle of water detector is activated before approaching table \(t_1\), it could be detected and the plan could be optimized using such bottle instead.
References
Aloimonos Y (1993) Active perception. Lawrence Erlbaum, Hillsdale
Bissmarck F, Svensson M, Tolt G (2015) Efficient algorithms for next best view evaluation. In: IEEE/RSJ international conference on intelligent robots and systems
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Canziani A, Culurciello E (2015) Visual attention with deep neural networks. In: Information sciences and systems (CISS), 2015 49th annual conference on, pp. 1–3, March 2015
Carrasco M (2011) Visual attention: the past 25 years. Vis Res 51(13):1484–1525
Connolly C (1985) The determination of next best views. In: Robotics and automation. Proceedings. 1985 IEEE international conference on, vol 2, pp 432–435. IEEE
Egeth HE (1966) Parallel versus serial processes in multidimensional stimulus discrimination. Atten Percept Psychophys 1(4):245–252
Foote T (2013) TF: the transform library. In: Technologies for practical robot applications (TePRA), 2013 IEEE international conference on, open-source software workshop, pp 1–6, April 2013
Forssén P-E, Meger D, Lai K, Helmer S, Little JJ, Lowe DG (2008) Informed visual search: combining attention and object recognition. In: Robotics and automation, 2008. icra 2008. IEEE international conference on, pp 935–942. IEEE
Gutierrez MA, Banchs RE, D’Haro LF (2015) Perceptive parallel processes coordinating geometry and texture. In: Proceedings of Workshop on Multimodal Semantics for Robotic Systems 2015, Hamburg, pp 30–35
Gutiérrez MA, Manso LJ, Pandya H, Núñez P (2017) A passive learning sensor architecture for multimodal image labeling: an application for social robots. Sensors 17(2):353
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In arXiv:1512.03385
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Lee S, Lim J, Suh IH (2015) Incremental learning from a single seed image for object detection. In: Intelligent robots and systems (IROS), 2015 IEEE/RSJ international conference on, pp 1905–1912. IEEE
Manso LJ et al (2010) RoboComp: a tool-based robotics framework. In: Simulation, modeling and programming for autonomous robots, pp 251–262. Springer
Manso LJ, Bustos P, Bachiller P, Núñez P (2015) A perception-aware architecture for autonomous robots. Int J Adv Robot Syst 12(174):13
Manso LJ, Calderita LV, Bustos P, Bandera A (2016) Use and advances in the active grammar-based modeling architecture. In: Proceedings of the workshop of physical agents, pp 1–25
Martinez Mozos O, Chollet F, Murakami K, Morooka K, Tsuji T, Kurazume R, Hasegawa T (2012) Tracing commodities in indoor environments for service robotics. In: IFAC Proceedings Volumes, vol 45, Elsevier, pp 71–76
Mikolov T, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems
Milliez G, Warnier M, Clodic A, Alami R (2014) A framework for endowing an interactive robot with reasoning capabilities about perspective-taking and belief management. In: The 23rd IEEE international symposium on robot and human interactive communication, pp 1103–1109. IEEE
Mnih V, Heess N, Graves A, Kavukcuoglu K (2015) Recurrent models of visual attention. In: Advances in neural information processing systems, vol 27
Müller HJ, Krummenacher J (2006) Visual search and selective attention. Vis Cognit 14(4–8):389–410
Pillai S, Leonard J (2015) Monocular slam supported object recognition. arXiv preprint arXiv:1506.01732
Quigley M et al (2009) ROS: an open-source robot operating system. In: Proc. of ICRA workshop on open source software
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rothenstein AL, Tsotsos JK (2008) Attention links sensing to recognition. Image Vis Comput 26(1):114–126
Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3d recognition and pose using the viewpoint feature histogram. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on, pp 2155–2162. IEEE
Sternberg S et al (1966) High-speed scanning in human memory. Science 153(3736):652–654
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognit Psychol 12(1):97–136
Tsotsos JK (2017) Attention and cognition: principles to guide modeling. In: Computational and cognitive neuroscience of vision, pp 277–295. Springer
Van der Maaten L, Hinton G (2012) Visualizing non-metric similarities in multiple maps. Mach Learn 87(1):33–55
Wallenberg M, Forssén P-E (2010) Embodied object recognition using adaptive target observations. Cognit Comput 2(4):316–325
Walther D, Rutishauser U, Koch C, Perona P (2005) Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Comput Vis Image Underst 100(1):41–63
Wolfe JM, Gray W (2007) Guided search 4.0. Integrated models of cognitive systems, pp 99–119
Xu K, Ba J, Kiros R, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044,
Acknowledgements
This work has been partially supported by the MICINN Project TIN2015-65686-C5-5-R, by the Extremaduran Government Project GR15120, by the Red de Excelencia “Red de Agentes Físicos” TIN2015-71693-REDT and by MEC project PHBP14/00083. Funding was provided by Junta de Extremadura (Ayudas Consolidación Grupos Investigación Catalogados).
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling editor: Antonio Bandera (University of Malaga); Reviewers: David Meger (McGill University), Antonio Palomino (Fundación Magtel).
This article is part of the Special Issue on ‘Cognitive Robotics’ guest-edited by Antonio Bandera, Jorge Dias, and Luis Manso.
Rights and permissions
About this article
Cite this article
Manso, L.J., Gutierrez, M.A., Bustos, P. et al. Integrating planning perception and action for informed object search. Cogn Process 19, 285–296 (2018). https://doi.org/10.1007/s10339-017-0828-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-017-0828-3