DCT-DWT-FFT Based Method for Text Detection in Underwater Images

Banerjee, Ayan; Shivakumara, Palaiahnakote; Pal, Soumyajit; Pal, Umapada; Liu, Cheng-Lin

doi:10.1007/978-3-031-02444-3_16

Ayan Banerjee¹⁰,
Palaiahnakote Shivakumara¹¹,
Soumyajit Pal¹⁰,
Umapada Pal¹⁰ &
…
Cheng-Lin Liu^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Included in the following conference series:

Asian Conference on Pattern Recognition

916 Accesses
3 Citations

Abstract

Text detection in underwater images is an open challenge because of the distortions caused by refraction, absorption of light, particles, and variations depending on depth, color, and nature of water. Unlike existing methods aimed at text detection in natural scene images, in this paper, we have proposed a novel method for text detection in underwater images through a new enhancement model. Based on observations that fine details of text in image share with high energy, spatial resolution, and brightness, we consider Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT) for image enhancement to highlight the text features. The enhanced image is fed to a modified Character Region Awareness for Text Detection (CRAFT) model to detect text in underwater images. To explore enhancement methods, we evaluate six combinations of image enhancement techniques, namely, DCT-DWT-FFT, DCT-FFT-DWT, DWT-DCT-FFT, DWT-FFT-DCT, FFT-DCT-DWT, FFT-DWT-DCT. Experimental results on our dataset of underwater images and benchmark datasets of natural scene text detection, namely, MSRA-TD500, ICDAR 2019 MLT, ICDAR 2019 ArT, Total-Text, CTW1500, and COCO Text show that the proposed method performs well for both underwater and natural scene images and outperforms the existing methods on all the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xue, M., et al.: Deep invariant texture features for water image classification. SN Appl. Sci. 2(12), 1–19 (2020). https://doi.org/10.1007/s42452-020-03882-w
Article Google Scholar
Kezebou, L., Oludare, V., Panetta, K., Againa, S.S.: Underwater object tracking benchmark and dataset. In: Proceedings of the HST (2019). https://doi.org/10.1109/HST47167.2019.9032954
Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the CVPR, pp. 11753–11762 (2020)
Google Scholar
Cao, Y., Ma, S., Pan, H.: FDTA: Fully convolutional scene text detection with text attention. IEEE Access 155441–155449 (2020)
Google Scholar
Zhang, W., Xiang, S.: Face anti-spoofing detection based on DWT-LBP-DCT features. Signal Process. Image Commun. (2020). https://doi.org/10.1016/j.image.2020.115990
Article Google Scholar
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the CVPR, pp. 9365–9374 (2019)
Google Scholar
Liu, C., Yang, C., Hou, J.B., Wu, L.H., Zhu, X.B., Xiao, L.: GCCNet: Grouped channel composition network for scene text detection. Neurocomputing 454, 135–151 (2021)
Article Google Scholar
Shi, J., Chen, L., Su, F.: Accurate arbitrary-shaped scene text detection via iterative polynomial parameter regression. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12624, pp. 241–256. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69535-4_15
Chapter Google Scholar
Qin, X., Jiang, J., Yuan, C.A., Qiao, S., Fan, W.: Arbitrary shape natural scene text detection method based on soft attention mechanism and dilated convolution. IEEE Access 122685–122694 (2020)
Google Scholar
Dai, P., Li, Y., Zhang, H., Li, J., Cao, X.: Accurate scene text detection via scale-aware data augmentation and shape similarity constraint. IEEE Trans. Multim. (2021). https://doi.org/10.1109/TMM.2021.3073575
Article Google Scholar
Hu, Z., Wu, X., Wang, J.: TCATD: text contour attention for scene text detection. In: Proceedings of the ICPR, pp. 1083–1088 (2021)
Google Scholar
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 71–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Chapter Google Scholar
Deng, G., Ming, Y., Xue, J.-H.: RFRN: A recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453, 465–481 (2021)
Article Google Scholar
Liu, J., Zhong, Q., Yuan, Y., Su, H., Du, B.: SemiText: scene text detection with semi-supervised learning. Neurocomputing 407, 343–353 (2020)
Article Google Scholar
Xue, M., et al.: Arbitrarily-oriented text detection in low light natural scene images. IEEE Trans. Multim. 23, 2706–2720 (2020)
Article Google Scholar
Chowdhury, P.N., et al.: A new episodic learning-based network for text detection on human body in sports images. IEEE Trans Circuits Syst. Video Technol. (2021). https://doi.org/10.1109/TCSVT.2021.3092713
Article Google Scholar
Chowdhury, T., Shivakumara, P., Pal, U., Tong, L., Raghavendra, R., Chanda, S.: DCINN: deformable convolution and inception based neural network for tattoo text detection through skin region. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II, pp. 335–350. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_22
Chapter Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the CVPR, pp. 2642–2651 (2017)
Google Scholar
Roy, S., Shivakumara, P., Pal, U., Lu, T., Kumar, G.H.: Delaunay triangulation-based text detection from multi-view images of natural scene. Pattern Recogn. Lett. 129, 92–100 (2020)
Article Google Scholar
Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z.: ICDAR2019 robust reading challenge on arbitrarily-shaped text-RRC-ArT. In: Proceedings of the ICDAR, pp. 1571–1576 (2019)
Google Scholar

Download references

Acknowledgements

The work of Cheng-Lin Liu was supported by the National Key Research and Development Program under Grant No. 2018AAA0100400 and the National Natural Science Foundation of China under Grant No. 61721004. This work also partially supported by TIH, ISI, Kolkata.

Author information

Authors and Affiliations

Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Ayan Banerjee, Soumyajit Pal & Umapada Pal
Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara
National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Authors

Ayan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Soumyajit Pal
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Palaiahnakote Shivakumara .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Christian Wallraven
Nanjing University, Nanjing, China
Qingshan Liu
Osaka University, Osaka, Japan
Hajime Nagahara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banerjee, A., Shivakumara, P., Pal, S., Pal, U., Liu, CL. (2022). DCT-DWT-FFT Based Method for Text Detection in Underwater Images. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-02444-3_16
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics