Align-Yolact: a one-stage semantic segmentation network for real-time object detection

Lin, Shaodan; Zhu, Kexin; Feng, Chen; Chen, Zhide

doi:10.1007/s12652-021-03340-4

Align-Yolact: a one-stage semantic segmentation network for real-time object detection

Original Research
Published: 19 June 2021

Volume 14, pages 863–870, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Shaodan Lin ORCID: orcid.org/0000-0001-5095-1515¹,
Kexin Zhu²,
Chen Feng³ &
…
Zhide Chen³

887 Accesses
2 Citations
Explore all metrics

Abstract

Object detection is a classic problem in computer vision. The main bottleneck of object detection lies in the fusion of multi-scale features. In this paper, we systematically study the design choices of neural network architecture for real-time object detection, and propose an Align-Yolact to improve the instance segmentation accuracy. Firstly, we propose a weighted bounding box, which improves the accurate positioning of the bounding box. Secondly, we add a bi-directional feature pyramid network to the feature fusion, which improves the mask quality and small target accuracy. Owing to these optimizations and better backbones, we achieve the SOTA results including both detection efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

References

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Bolya D, Zhou C, Xiao F et al (2019) YOLACT: real-time instance segmentation. In: The IEEE International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Korea, pp 9157–9166
Chen RC (2020) Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image vis Comput 87:47–56
Google Scholar
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: The 2009 IEEE conference on computer vision and pattern recognition (CVPR), 20–21 June 2009, Miami, pp 248–255
Duan K, Bai S, Xie L et al (2019) Centernet: Keypoint triplets for object detection. In: The IEEE International Conference on Computer Vision (ICCV), 27 October- 2 November 2019, Korea, pp 6569–6578
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–19 June 2019, Las Vegas, pp 7036–7045
Girshick R (2015) Fast r-cnn. In: The IEEE international conference on computer vision (ICCV), 7–13 December 2015, Chile, pp 1440–1448
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE conference on computer vision and pattern recognition, 23–28 June 2014, Ohio, pp 580–587
Hariharan B, Arbeláez P, Bourdev L et al (2011) Semantic contours from inverse detectors. In: The 2011 International Conference on Computer Vision (ICCV), 6–13 November 2011, Barcelona, pp 991–998
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: The IEEE international conference on computer vision (ICCV), 22–29 October 2017, Italy, pp 2961–2969
Kong T, Sun F, Liu H et al (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Article MATH Google Scholar
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: The European Conference on Computer Vision (ECCV), 8–14 September 2018, Munich, pp 734–750
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: The IEEE/CVF conference on computer vision and pattern recognition, 13–19 June 2020, Long Beach, pp 13906–13915
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition, 18–22 June 2018, Salt Lake City, pp 8759–8768
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: the IEEE conference on computer vision and pattern recognition (CVPR), 18–22 June 2018, Salt Lake City, pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556
Tan M, Le Q V (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, 25–28 January 2019, Taiyuan, pp 6105–6114
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: The IEEE/CVF conference on computer vision and pattern recognition, 13–19 June 2020, Long Beach, pp 10781–10790
Wang X, Kong T, Shen C et al (2020) SOLO: segmenting objects by locations. In: The European Conference on Computer Vision (ECCV), 23–28 August 2020, online, pp 649–665
Yang Z, Liu S, Hu H, Wang L (2019) Reppoints: point set representation for object detection. In: The IEEE International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Korea, pp 9657–9666
Zhang H, Wu C, Zhang Z et al (2020) Resnest: split-attention networks. arXiv preprint arXiv: 2004.08955

Download references

Acknowledgements

The authors would like to thank all the participants taken part in the experiments. This work was supported in part by the National Science Foundation of China (Grant No. 61841701) and Fujian Vocational College Intelligent Equipment Application Technology Collaborative Innovation Center Construction Project (Grant No. 2016-7) and the Science and Technology Project from Transportation Department of FuJian Province (Grant No. 201934).

Author information

Authors and Affiliations

College of Information and Intelligent Transportation, Fujian Chuanzheng Communications College, Fuzhou, 350007, China
Shaodan Lin
Department of Information Engineering, Sun Yat-Sen University of Taiwan, Kaohsiung, Taiwan, 80424, China
Kexin Zhu
College of Mathematics and Informatics, Fujian Normal University, Fuzhou, 350007, China
Chen Feng & Zhide Chen

Authors

Shaodan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Kexin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhide Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaodan Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, S., Zhu, K., Feng, C. et al. Align-Yolact: a one-stage semantic segmentation network for real-time object detection. J Ambient Intell Human Comput 14, 863–870 (2023). https://doi.org/10.1007/s12652-021-03340-4

Download citation

Received: 28 August 2020
Accepted: 07 June 2021
Published: 19 June 2021
Issue Date: February 2023
DOI: https://doi.org/10.1007/s12652-021-03340-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Align-Yolact: a one-stage semantic segmentation network for real-time object detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Align-Yolact: a one-stage semantic segmentation network for real-time object detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation