Abstract
The W2VV++ model BoW variant integrated to VIRET and SOMHunter systems has proven its effectiveness in the previous Video Browser Showdown competition in 2020. As a next experimental interactive search prototype to benchmark, we consider a simple system relying on the more complex BERT variant of the W2VV++ model, accepting a rich text input. The input can be provided by keyboard or by speech processed by a third-party cloud service. The motivation for the more complex BERT variant is its good performance for rich text descriptions that can be provided for known-item search tasks. At the same time, users will be instructed to specify as rich text description about the searched scene as possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that we are aware that reverse events could be quite common within a single video as similar cuts may repeat frequently. Nonetheless, this would still reduce the task at hand to merely finding the right scene within a video.
- 2.
Note that padding is employed on the edges of individual videos.
References
Alateeq, A., Roantree, M., Gurrin, C.: Voxento: a prototype voice-controlled interactive search engine for lifelogs. In: Proceedings of the Third Annual Workshop on Lifelog Search Challenge, LSC 2020, pp. 77–81. ACM, New York (2020)
Blažek, A., Lokoč, J., Skopal, T.: Video retrieval with feature signature sketches. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 25–36. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_3
Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed. Tools Appl. 76(4), 5539–5571 (2016). https://doi.org/10.1007/s11042-016-3661-2
Hirzel, M., Schneider, S., Tangwongsan, K.: Sliding-window aggregation algorithms: tutorial. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pp. 11–14. ACM (2017)
Klement, E.P., Mesiar, R., Pap, E.: Families of t-norms. In: Klement, E.P., Mesiar, R., Pap, E. (eds.) Triangular Norms, vol. 8, pp. 101–119. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-015-9540-7_4
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, 21–25 October 2019, pp. 1786–1794 (2019)
Lokoč, J., et al.: A W2VV++ case study with automated and interactive text-to-video retrieval. In: Proceedings of the 28th ACM International Conference on Multimedia, MM 2020. ACM, New York (2020)
Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018)
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimed. Comput. Commun. Appl. 15(1), 29:1–29:18 (2019)
Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, pp. 1777–1785. ACM, New York (2019)
Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: VIRET: a video retrieval tool for interactive known-item search. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, pp. 177–181. ACM, New York (2019)
Mettes, P., Koelma, D.C., Snoek, C.G.M.: Shuffled imagenet banks for video event detection and search. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 16(2), 1–21 (2020)
Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29
Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66
Yuan, J., et al.: Video browser showdown by NUS. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 642–645. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27355-1_64
Acknowledgements
This paper has been supported by the Charles University Grant Agency (GA UK) project number 1310920, by Czech Science Foundation (GAČR) project 19-22071Y and by Charles University grant SVV-260588.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Peška, L., Kovalčík, G., Souček, T., Škrhák, V., Lokoč, J. (2021). W2VV++ BERT Model at VBS 2021. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)