Abstract
The continued success of deep convolution neural networks (CNN) in computer vision can be directly linked to vast amounts of data and tremendous processing resources for training such non-linear models. However, depending on the task, the available amount of data varies significantly. Particularly robotic systems usually rely on small amounts of data, as producing and annotating them is extremely robot and task specific (e.g. grasping) and therefore prohibitive. Recently, in order to address the aforementioned problem of small datasets in robotic vision, a common practice is to reuse features that are already learned by a CNN within a large-scale task and apply them to different small scale ones. This transfer of learning shows some promising results as an alternative, but nevertheless it can not be compared with the performance of a CNN that is specifically trained from the beginning for that specific task. Thus, many researchers turned to synthetic datasets for training, since they can be produced easily and cost effectively. The main issue of such datasets that already exist, is the lack of photorealism both in terms of background and lighting. Herein, we are proposing a framework for the generation of completely synthetic datasets that includes all types of data that state-of-the-art algorithms in object recognition, and tracking need for their training. Thus, we can improve robotic perception without deploying the robot in time-consuming real-world scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
An example dataset generated using the proposed framework will be publicly available upon the publication of the paper at hand.
References
3DFZephyr (2020). https://www.3dflow.net/3df-zephyr-pro-3d-models-from-photos/. Accessed 30 Apr 2020
Blender Online Community: Blender - a 3D modelling and rendering package. Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org. Accessed 30 Apr 2020
HdriHaven (2020). https://hdrihaven.com/. Accessed 30 April 2020
Orbec: Orbec structured light camera (2020). https://orbbec3d.com/product-astra-pro/. Accessed 30 Apr 2020
Agarwal, A., Triggs, B.: A local basis representation for estimating human pose from cluttered images. In: Asian Conference on Computer Vision, pp. 50–59. Springer (2006)
Browatzki, B., Fischer, J., Graf, B., Bülthoff, H.H., Wallraven, C.: Going into depth: evaluating 2D and 3D cues for object classification on a new, large-scale object dataset. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1189–1195. IEEE (2011)
Chetverikov, D., Stepanov, D., Krsek, P.: Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm. Image Vis. Comput. 23(3), 299–309 (2005)
Freedman, B., Shpunt, A., Machline, M., Arieli, Y.: Depth mapping using projected patterns, 23 July 2013, US Patent 8,493,496
Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A.: Intel realsense stereoscopic depth cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–10 (2017)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation, pp. 1817–1824. IEEE (2011)
Mariolis, I., Peleka, G., Kargakos, A., Malassiotis, S.: Pose and category recognition of highly deformable objects using deep learning. In: 2015 International Conference on Advanced Robotics (ICAR), pp. 655–662. IEEE (2015)
Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 593–600 (2005)
Pollefeys, M., Gool, L.V.: From images to 3D models. Commun. ACM 45(7), 50–55 (2002)
Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. Int. J. Robot. Res. 27(2), 157–173 (2008)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Acknowledgement
This work has been supported by the European Union’s Horizon 2020 research and innovation programme funded project namely: “Co-production CeLL performing Human-Robot Collaborative AssEmbly (CoLLaboratE)” under the grant agreement with no: 820767.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peleka, G., Mariolis, I., Tzovaras, D. (2021). Generating 2.5D Photorealistic Synthetic Datasets for Training Machine Vision Algorithms. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds) 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020). SOCO 2020. Advances in Intelligent Systems and Computing, vol 1268. Springer, Cham. https://doi.org/10.1007/978-3-030-57802-2_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-57802-2_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57801-5
Online ISBN: 978-3-030-57802-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)