Abstract
Extended Entity Relationship (or EER) modeling is an important step after application requirements for data analysis are gathered, and is critical for translating user requirements to a given executable data model (e.g., relational, or for this paper Multilayer Networks or MLNs.) EER modeling provides a more precise understanding of the application and data requirements and an unambiguous representation from which the data model (on which analysis is performed) can be generated algorithmically. EER has played a central role in the modeling of user-level requirements to relational, object oriented etc. UML, whose roots are in EER modeling, is extensively used in the industry.
Although big data analysis has warranted many new data models, not much attention has been paid to their modeling from requirements. Going straight from application requirements to data model and analysis, especially for complex data sets, is likely to be difficult, error prone, and not extensible to say the least. Hence for data models used in big data analysis, such as Multilayer Networks, there is a need to transform the user/application requirements using a modeling approach such as EER.
In this paper, we start with application requirements of complex data sets including analysis objectives and show how the EER approach can be leveraged for modeling given data to generate MLNs and appropriate analysis expressions on them. This is timely as MLNs are gaining popularity (and also subsume graphs) as a meaningful data representation for big data analysis.
For demonstrating the algorithm and applicability of the proposed approach, we demonstrate our approach on three data sets to generate MLNs, to map analysis requirements into expressions on MLNs. We also demonstrate it for three types of MLNs. The data sets are from DBLP (Database Bibliography-Computer Science Publications), IMDb, a large international movie data set, and US commercial airlines. Our experimental analysis validate modeling and mapping. We do not elaborate on computations as it is a separate topic in itself. The correctness of results are verified using independently available ground truth.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that the relationship details can change based on analysis objectives.
- 2.
Choice of coefficient reflects relationship quality and its value can be based on how actors are weighted against genres. We have chosen 0.9 for relating actors in their top genres.
References
DBLP dataset. http://dblp.uni-trier.de/xml/
The internet movie database. ftp://ftp.fu-berlin.de/pub/misc/movies/database/
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1–39 (2008)
Blondel, V.D., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008)
Chakravarthy, S., Beera, R., Balachandran, R.: DB-subdue: database approach to graph mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 341–350. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_42
Chen, P.P.S.: The entity-relationship model–toward a unified view of data. ACM Trans. Database Syst. (TODS) 1(1), 9–36 (1976)
Das, S., Santra, A., Bodra, J., Chakravarthy, S.: Query processing on large graphs: approaches to scalability and response time trade offs. Data Knowl. Eng. 126, 101736 (2020)
De Virgilio, R., Maccioni, A., Torlone, R.: Model-driven design of graph databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 172–185. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12206-9_14
Edmonds, J.: Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Natl. Bureau Stand. B 69(125–130), 55–56 (1965)
Elmasri, R.: Fundamentals of database systems. Pearson Education India (2008)
Graves, M., Bergeman, E.R., Lawrence, C.B.: Graph database systems. IEEE Eng. Med. Biol. Mag. 14(6), 737–745 (1995)
Jayaram, N., Khan, A., Li, C., Yan, X., Elmasri, R.: Querying knowledge graphs by example entity tuples. IEEE Trans. Knowl. Data Eng. 27, 2797–2811 (2015)
Kim, J., Lee, J.: Community detection in multi-layer graphs: a survey. SIGMOD Rec. 44(3), 37–48 (2015)
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. CoRR abs/1309.7233 (2013)
Melamed, D.: Community structures in bipartite networks: a dual-projection approach. PLoS ONE 9(5), e97823 (2014)
Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)
Pokornỳ, J.: Conceptual and database modelling of graph databases. In: Proceedings of the 20th International Database Engineering & Applications Symposium (2016)
Roy-Hubara, N., Rokach, L., Shapira, B., Shoval, P.: Modeling graph database schema. IT Professional 19(6), 34–43 (2017)
Santra, A., Bhowmick, S., Chakravarthy, S.: Efficient community re-creation in multilayer networks using Boolean operations. In: International Conference on Computational Science (2017)
Santra, A., Bhowmick, S., Chakravarthy, S.: Hubify: efficient estimation of central entities across multiplex layer compositions. In: IEEE ICDM Workshops (2017)
Reddy, P.K., Sureka, A., Chakravarthy, S., Bhalla, S. (eds.): BDA 2017. LNCS, vol. 10721. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72413-3
Santra, A., Komar, K.S., Bhowmick, S., Chakravarthy, S.: A new community definition for multilayer networks and a novel approach for its efficient computation. arXiv preprint arXiv:2004.09625 (2020)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)
Stolworthy, J.: Dark universe: Johnny Depp and Javier Bardem join tom cruise in universal’s monster movie franchise (2017). https://www.independent.co.uk/us
Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Exp. Newslett. 14(2), 20–28 (2013)
Vu, X.S., Santra, A., Chakravarthy, S., Jiang, L.: Generic multilayer network data analysis with the fusion of content and structure. In: CICLing 2019 (2019)
Acknowledgments
For this work, Dr. Chakravarthy was partly supported by NSF Grant 1955798 and Dr. Bhowmick was partly supported by NSF grant 1916084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Komar, K.S., Santra, A., Bhowmick, S., Chakravarthy, S. (2020). EER\(\rightarrow \)MLN: EER Approach for Modeling, Mapping, and Analyzing Complex Data Using Multilayer Networks (MLNs). In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds) Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12400. Springer, Cham. https://doi.org/10.1007/978-3-030-62522-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-62522-1_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62521-4
Online ISBN: 978-3-030-62522-1
eBook Packages: Computer ScienceComputer Science (R0)