Abstract
Recognizing which part of an object is graspable or not is important for intelligent robot to perform some complicated tasks. In order to obtain good grasping performance, learning rich representations efficiently from multi-modal RGB-D images is crucial. To address this problem, in this paper, we propose an effective multi-modal deep extreme learning machine structure. In this structure, unsupervised hierarchical extreme learning machine (ELM) is conducted for feature extraction for RGB and depth modalities separately. Then, the shared layer is developed by combining both RGB and depth features. Finally, the ELM is used as supervised feature classifier for final decision. Experimental validation on Cornell grasping dataset illustrates that the proposed multiple modality fusion method achieves better grasp recognition performance.
Similar content being viewed by others
References
Akusok, A., Miche, Y., Karhunen, J., Bjork, K. M., Nian, R., & Lendasse, A. (2015). Arbitrary category classification of websites based on image content. IEEE Computational Intelligence Magazine, 10(2), 30–41.
Bai, J., & Wu, Y. (2014). SAE-RNN deep learning for RGB-D based object recognition. In Intelligent computing theory. Lecture notes in computer science, Vol. 8588, pp. 235–240.
Beksi, W. J., & Papanikolopoulos, N. (2015). Object classification using dictionary learning and RGB-D covariance descriptors. In International conference on robotics and automation (ICRA) (pp. 1–6).
Bicchi, A., & Kumar, V. (2000). Robotic grasping and contact: A review. In International conference on robotics and automation (ICRA) (pp. 348–353).
Bohg, J., Morales, A., Asfour, T., & Kragic, D. (2014). Data-driven grasp synthesis—A survey. IEEE Transactions on Robotics, 30(2), 289–309.
Cambria, E., & Huang, G. (2013). Extreme learning machines-representational learning with ELMs for big data. IEEE Intelligent Systems, 28(6), 30–59.
Cao, J. W., Chen, T., & Fan, J. Y. (2015). Landmark recognition with compact BoW histogram and ensemble ELM. Multimedia Tools and Applications. doi:10.1007/s11042-014-2424-1.
Cao, J., & Lin, Z. (2015). Extreme learning machine on high dimensional and large data applications: A survey. Mathematical Problems in Engineering. doi:10.1155/2015/103796.
Cao, J., Lin, Z., Huang, G.-B., & Liu, N. (2012). Voting based extreme learning machine. Information Sciences, 185(1), 66–77.
Chen, Y., Yao, E., & Basu, A. (2015). A 128 channel extreme learning machine based neural decoder for brain machine interfaces. IEEE Transactions on Biomedical Circuits and Systems (in press).
Ding, S., Zhang, N., Xu, X., Guo, L., & Zhang, J. (2015). Deep extreme learning machine and its application in EEG classification. Mathematical Problems in Engineering. doi:10.1155/2015/129021.
Feng, G., Huang, G., Lin, Q., & Gay, R. (2009). Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Transactions on Neural Networks, 20(8), 1352–1357.
Huang, G., Zhu, Q., & Siew, C. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of international joint conference on neural network (IJCNN) (Vol. 2, pp. 985–990).
Huang, G. B. (2014). An insight into extreme learning machines: Random neurons, random features and kernels. Cognitive Computation, 61(1), 376–390.
Huang, G., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transaction on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(2), 513–529.
Huang, G., Zhu, Q., & Siew, C. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70, 489–501.
Hu, X., Zhang, X., Liu, M., Chen, Y., Li, P., Liu, J., et al. (2016). High precision intelligent flexible grasping front-end with CMOS interface for robots application. Science China Information Sciences, 59, 032203(11).
Jhuo, I. H., Gao, S., Zhuang, L., & Lee, D. T. (2015). Unsupervised feature learning for RGB-D image classification. In Asian conference on computer vision (ACCV) (pp. 276–289).
Jiang, C. F., Chang, C. C., & Huang, S. H. (2012). Regions of interest extraction from SPECT images for neural degeneration assessment using multimodality image fusion. Multidimensional Systems and Signal Processing, 23(4), 437–449.
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In International conference on robotics and automation (ICRA) (pp. 1817–1824).
Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724.
Ouyang, W., Chu, X., & Wang, X. (2014). Multi-source deep learning for human pose estimation. In Computer vision and pattern recognition (CVPR) (pp. 2337–2344).
Porter, W. A., & Liu, W. (1994). Object recognition by a massively parallel 2-D neural architecture. Multidimensional Systems and Signal Processing, 5(2), 179–201.
Sahbani, A., El-Khoury, S., & Bidaud, P. (2012). An overview of 3D object grasp synthesis algorithms. Robotics and Autonomous Systems, 60, 326–336.
Saxena, A., Driemeyer, J., & Ng, A. Y. (2008). Robotic grasping of novel objects using vision. The International Journal of Robotics Research, 27(2), 157–173.
Srivastava, N., & Salakhutdinov, R. (2012). Learning representations for multi-modal data with deep belief nets. In International conference on machine learning workshop (pp. 1–8).
Tang, J., Deng, C., & Huang, G. (2015). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems. doi:10.1109/TNNLS.2015.2424995.
Uzair, M., Shafait, F., Ghanem, B., & Mian, A. (2015). Representation learning with deep extreme learning machines for efficient image set classification. arXiv preprint arXiv:1503.02445, pp. 1–10.
Wang, A., Lu, J., Wang, G., Cai, J., & Cham, T. J. (2014). Multimodal unsupervised feature learning for RGB-D scene labeling. In European conference on computer vision (ECCV) (pp. 453–467).
Wang, W., Ooi, B. C., Yang, X., Zhang, D., & Zhuang, Y. (2014). Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment, 7(8), 649–660.
Wang, J., Su, G., Xiong, Y., Chen, J., Shang, Y., Liu, J., et al. (2013). Sparse representation for face recognition based on constraint sampling and face alignment. Tsinghua Science and Technology, 1, 62–67.
Yuan, Y., & Sun, F. (2015). Data fusion-based resilient control system under DoS attacks: A game theoretic approach. International Journal of Control Automation and Systems, 13(3), 513–520.
Yu, W., Zhuang, F., He, Q., & Shi, Z. (2015). Learning deep representations via extreme learning machines. Neurocomputing, 149, 308–315.
Zaki, M., Ghalwash, A., & Elkouny, A. A. (1996). CNN: A speaker recognition system using a cascaded neural network. Multidimensional Systems and Signal Processing, 7(1), 87–99.
Zhu, W., Miao, J., Qing, L., & Huang, G. (in press). Hierarchical extreme learning machine for unsupervised representation learning. Neurocomputing.
Acknowledgments
This work was supported in part by the National Key Project for Basic Research of China under Grant 2013CB329403; in part by National High-tech Research and Development Plan under Grant 2015AA042306; in part by the National Natural Science Foundation of China under Grants 61210013 and 61450011; and in part by the Tsinghua University Initiative Scientific Research Program under Grant 20131089295.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wei, J., Liu, H., Yan, G. et al. Robotic grasping recognition using multi-modal deep extreme learning machine. Multidim Syst Sign Process 28, 817–833 (2017). https://doi.org/10.1007/s11045-016-0389-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-016-0389-0