Multi-task Model for Comic Book Image Analysis

Nguyen, Nhu-Van; Rigaud, Christophe; Burie, Jean-Christophe

doi:10.1007/978-3-030-05716-9_57

Nhu-Van Nguyen¹⁹,
Christophe Rigaud¹⁹ &
Jean-Christophe Burie¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

International Conference on Multimedia Modeling

2301 Accesses
10 Citations

Abstract

Comic book image analysis methods often propose multiple algorithms or models for multiple tasks like panels and characters detection, balloons segmentation and text recognition, etc. In this work, we aim to reduce the complexity for comic book image analysis by proposing one model which can learn multiple tasks called Comic MTL. In addition to the detection task and segmentation task, we integrate the relation analysis task for balloons and characters into the Comic MTL model. The experiments with our model are carried out on the eBDtheque dataset which contains the annotations for panels, balloons, characters and also the relations balloon-character. We show that the Comic MTL model can detect the association between balloons and their speakers (comic characters) and handle other tasks like panels, characters detection and balloons segmentation with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimedia 17(11), 1949–1959 (2015)
Article Google Scholar
Arai, K., Tolle, H.: Method for automatic e-comic scene frame extraction for reading comic on mobile devices. In: 7th International Conference on Information Technology: New Generations, pp. 370–375. IEEE Computer Society, Washington DC (2010)
Google Scholar
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. (IJIP) 4(6), 669–676 (2011)
Google Scholar
Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4 (2018)
Google Scholar
Chu, W.T., Cheng, W.C.: Manga-specific features and latent style model formanga style analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1332–1336, March 2016
Google Scholar
Chu, W.T., Li, W.W.: Manga FaceNet: face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415. ACM (2017)
Google Scholar
Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Fujino, S., Mori, N., Matsumoto, K.: Recognizing the order of four-scene comics by evolutionary deep learning. In: De La Prieta, F., Omatu, S., Fernández-Caballero, A. (eds.) DCAI 2018. AISC, vol. 800, pp. 136–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-94649-8_17
Chapter Google Scholar
Guérin, C., et al.: eBDtheque: a representative database of comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149, August 2013
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017)
Google Scholar
Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 424–428, March 2012
Google Scholar
In, Y., Oie, T., Higuchi, M., Kawasaki, S., Koike, A., Murakami, H.: Fast frame decomposition and sorting by contour tracing for mobile phone comic images. Int. J. Syst. Appl. Eng. Dev. 5(2), 216–223 (2011)
Google Scholar
Li, L., Wang, Y., Tang, Z., Gao, L.: Automatic comic page segmentation based on polygon detection. Multimedia Tools Appl. 69(1), 171–197 (2014)
Article Google Scholar
Liu, X., Li, C., Zhu, H., Wong, T.T., Xu, X.: Text-aware balloon extraction from manga. Vis. Computer 32(4), 501–511 (2016)
Article Google Scholar
Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. CoRR abs/1510.04389 (2015)
Google Scholar
Nguyen, N.V., Rigaud, C., Burie, J.: Comic characters detection using deep learning. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, MANPU 2017, Kyoto, Japan, 9–15 November 2017, pp. 41–46 (2017)
Google Scholar
Nguyen, N., Rigaud, C., Burie, J.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Article Google Scholar
Obispo, S.L., Kuboi, T.: Element detection in Japanese comic book panels (2014)
Google Scholar
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR abs/1803.08670 (2018)
Google Scholar
Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method formanga. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 1125–1128. ACM, New York (2014)
Google Scholar
Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: Miesenberger, K., Karshmer, A., Penaz, P., Zagler, W. (eds.) ICCHP 2012. LNCS, vol. 7382, pp. 471–478. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31522-0_71
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Inc. (2015)
Google Scholar
Rigaud, C., et al.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355, August 2015
Google Scholar
Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, Kyoto, Japan, 9–15 November, pp. 29–34 (2017)
Google Scholar
Rigaud, C., Burie, J.-C., Ogier, J.-M.: Text-independent speech balloon segmentation for comics and manga. In: Lamiroy, B., Dueire Lins, R. (eds.) GREC 2015. LNCS, vol. 9657, pp. 133–147. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52159-6_10
Chapter Google Scholar
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. Int. J. Doc. Anal. Recogn. (IJDAR) 18(3), 199–221 (2015)
Article Google Scholar
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: An active contour model for speech balloon detection in comics. In: Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1240–1244, August 2013
Google Scholar
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: Automatic text localisation in scanned comic books. In: Proceedings of the 8th International Conference on Computer Vision Theory and Applications (VISAPP) (2013)
Google Scholar
Rigaud, C., Tsopze, N., Burie, J.-C., Ogier, J.-M.: Robust frame and text extraction from comic books. In: Kwon, Y.-B., Ogier, J.-M. (eds.) GREC 2011. LNCS, vol. 7423, pp. 129–138. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36824-0_13
Chapter Google Scholar
Singh, S.P., Markovitch, S. (eds.): Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, California, USA (2017)
Google Scholar
Stommel, M., Merhej, L.I., Müller, M.G.: Segmentation-free detection of comic panels. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 633–640. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33564-8_76
Chapter Google Scholar
Sun, W., Burie, J.C., Ogier, J.M., Kise, K.: Specific comic character detection using local feature matching. In: 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 275–279 (2013)
Google Scholar
Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. 87–D(6), 1370–1376 (2004)
Google Scholar
Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 676–684, June 2015
Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. CoRR abs/1707.08114 (2017). http://arxiv.org/abs/1707.08114
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Chapter Google Scholar

Download references

Acknowledgement

This work is supported by the CPER NUMERIC programme funded by the Region Nouvelle Aquitaine, CDA, Charente Maritime French Department, La Rochelle conurbation authority (CDA) and the European Union through the FEDER funding”.

Author information

Authors and Affiliations

Laboratoire L3i, Université de La Rochelle, 17042, La Rochelle CEDEX 1, France
Nhu-Van Nguyen, Christophe Rigaud & Jean-Christophe Burie

Authors

Nhu-Van Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rigaud
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Burie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nhu-Van Nguyen .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, NV., Rigaud, C., Burie, JC. (2019). Multi-task Model for Comic Book Image Analysis. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_57
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics