Skip to main content
Log in

A multi-source heterogeneous spatial big data fusion method based on multiple similarity and voting decision

  • Mathematical Methods in Data Science
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Data fusion is an efficient way to achieve an improved accuracy and more specific inferences by fusing and aggregating data from different sensors. However, due to the increasing complexity of spatial data with massive and multi-source heterogeneous characteristics, the existing methods cannot satisfy quite well the requirement for the integrity of data and the accuracy of fusion results in some specific situations. By considering the geographical properties of spatial data, a multi-source heterogeneous spatial big data fusion method based on multiple similarity and voting decision (SDFSV) is proposed in this paper, which develops a three-step record linking algorithm to improve the quality of entity recognition for the incremental fusion of massive data. Then, a one-time voting algorithm is introduced into the proposed method, so that the data conflicts can be significantly reduced and thus the accuracy of the data fusion can be improved. And a relation deduction method based on rule and entity recognition is presented to enhance the data integrity. In addition, in order to promote traceability and interpretability of fusion results, it is necessary to construct a data traceability mechanism. Experimental results show that SDFSV has an improved performance by using the data of Beijing Medical Institutions collected from 10 data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data used in this study are generated by the author’s independent experiment.

References

  • Bansal N, Blum A, Chawla S (2004) Correlation clustering. Mach Learn 56(1):89–113

    Article  MathSciNet  MATH  Google Scholar 

  • Bellahsene Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, Berlin

    Book  MATH  Google Scholar 

  • Bordes A, Usunier N, Garcia-Duran A (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems, pp 2787–2795

  • Bramer M, Macintosh A, Coenen F (2000) Research and development in intelligent systems XVI. Springer, London

    Book  MATH  Google Scholar 

  • Burger JD, Henderson JC, Morgan WT (2002) Statistical named entity recognizer adaptation. In: Proceedings of the sixth conference on natural language learning at HLT-NAACL, pp 1–4

  • Carreras X, Màrquez L, Padró L (2002) Named entity extraction using AdaBoost. In: Proceedings of the sixth conference on natural language learning, pp 1–4

  • Chang JP, Chen ZS, Wang ZJ, Jin L, Pedrycz W (2022) Assessing the spatial synergy between integrated urban rail transit system and urban form: a BULI-based MCLSGA model with wisdom of crowds. IEEE Trans Fuzzy Syst

  • Charikar M, Guruswami V, Wirth A (2005) Clustering with qualitative information. J Comput Syst Sci 71(3):360–383

    Article  MathSciNet  MATH  Google Scholar 

  • Che X, Mi J, Chen D (2018) Information fusion and numerical characterization of a multi-source information system. Knowl Based Syst 145:121–133

    Article  Google Scholar 

  • Chen ZS, Liu XL, Chin KS, Pedrycz W, Tsui KL, Skibniewski MJ (2021) Online-review analysis based large-scale group decision-making for determining passenger demands and evaluating passenger satisfaction: case study of high-speed rail system in China. Inf Fusion 69:22–39

    Article  Google Scholar 

  • Chen ZS, Zhang X, Rodriguez RM, Pedrycz W, Martinez L, Skibniewski MJ (2022) Expertise-structure and risk-appetite-integrated two-tiered collective opinion generation framework for large scale group decision making. IEEE Trans Fuzzy Syst

  • Curran JR, Clark S (2003) Language independent NER using a maximum entropy tagger. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 164–167

  • Dong X L, Berti-Equille L, Srivastava D (2009) Integrating conflicting data: the role of source dependence. In: Proceedings of the VLDB endowment, pp 550–561

  • Dong XL, Naumann F (2009) Data fusion: resolving data conflicts for integration. In: Proceedings of the VLDB endowment, pp 1654–1655

  • Dong XL, Saha B, Srivastava D (2012) Less is more: selecting sources wisely for integration. In: Proceedings of the VLDB endowment, pp 37–48

  • Elmagarmid AK, Ipeirotis PG, Verykios VS (2006) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16

    Article  Google Scholar 

  • Getoor L, Machanavajjhala A (2012) Entity resolution: theory, practice and open challenges. In: Proceedings of the VLDB endowment, pp 2018–2019

  • Hernández MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Disc 2(1):9–37

    Article  Google Scholar 

  • Hong L, Zou L, Lian X, Yu PS (2015) Subgraph matching with set similarity in a large graph database. IEEE Trans Knowl Data Eng 27(9):2507–2521

    Article  Google Scholar 

  • Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991

  • Klein LA (2004) Sensor and data fusion: a tool for information assessment and decision making. SPIE, Washington

    Book  Google Scholar 

  • Kolb L, Thor A, Rahm E (2012) Load balancing for map reduce-based entity resolution. In: Proceedings of the IEEE 28th international conference on data engineering, pp 618–629

  • Kou G, Olgu Akdeniz Ö, Dinçer H, Yüksel S (2021) Fintech investments in European banks: a hybrid IT2 fuzzy multidimensional decision-making approach. Financ Innov 7(1):1–28

    Article  Google Scholar 

  • Li T, Kou G, Peng Y (2020) Improving malicious URLs detection via feature engineering: linear and nonlinear space transformation methods. Inf Syst 91:101494

    Article  Google Scholar 

  • Li G, Kou G, Peng Y (2021a) Heterogeneous large-scale group decision making using fuzzy cluster analysis and its application to emergency response plan selection. IEEE Trans Syst Man Cybern Syst 52(6):3391–3403

    Article  Google Scholar 

  • Li T, Kou G, Peng Y, Shi Y (2017) Classifying with adaptive hyper-spheres: an incremental classifier based on competitive learning. IEEE Trans Syst Man Cybern Syst 50(4):1218–1229

    Article  Google Scholar 

  • Li T, Kou G, Peng Y, Yu PY (2021b) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern

  • Mayfield J, McNamee P, Piatko C (2003) Named entity recognition using hundreds of thousands of features. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 184–187

  • McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 188–191

  • Meng X, Du Z (2016) Research on the big data fusion: issues and challenges. J Comput Res Dev 53(2):231–246

    Google Scholar 

  • Nakamura EF, Loureiro AAF, Frery AC (2007) Information fusion for wireless sensor networks: Methods, models, and classifications. ACM Comput Surv CSUR 39(3):9-es

    Article  Google Scholar 

  • Papadakis G, Koutrika G, Palpanas T, Nejdl W (2013) Meta-blocking: taking entity resolution to the next level. IEEE Trans Knowl Data Eng 26(8):1946–1960

    Article  Google Scholar 

  • Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  • Rajeswari V, Kavitha M, Varughese DK (2019) A weighted graph-oriented ontology matching algorithm for enhancing ontology mapping and alignment in semantic web. Soft Comput 23(18):8661–8676

    Article  Google Scholar 

  • Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  • Singh MK, Dutta A, Venkatesh KS (2020) Multi-sensor data fusion for accurate surface modeling. Soft Comput 24(19):14449–14462

    Article  Google Scholar 

  • Spaccapietra S (2005) Journal on data semantics IV. Springer, Berlin

    Book  MATH  Google Scholar 

  • Tahani H, Keller JM (1990) Information fusion in computer vision using the fuzzy integral. IEEE Trans Syst Man Cybern 20(3):733–741

    Article  Google Scholar 

  • Tao X, Liu L, Zhao F, Huang Y, Liang Y, Zhu S (2019) Ontology and weighted DS evidence theory-based vulnerability data fusion method. J Univ Comput Sci 25(3):203–221

    MathSciNet  Google Scholar 

  • Varshney PK (1997) Multisensor data fusion. Electron Commun Eng J 9(6):245–253

    Article  Google Scholar 

  • Wang F, Hu L, Zhou J, Hu J, Zhao K (2017) A semantics-based approach to multi-source heterogeneous information fusion in the internet of things. Soft Comput 21(8):2005–2013

    Article  Google Scholar 

  • Wang D, Zou L, Zhao D (2015) Top-k queries on RDF graphs. Inf Sci 316:201–217

    Article  MATH  Google Scholar 

  • Xiao F (2022) GEJS: a generalized evidential divergence measure for multisource information fusion. IEEE Trans Syst Man Cybern Syst

  • Xiao F, Cao Z, Lin C T (2022a) A complex weighted discounting multisource information fusion with its application in pattern classification. IEEE Trans Knowl Data Eng

  • Xiao F, Pedrycz W (2022) Negation of the quantum mass function for multisource quantum information fusion with its application to pattern classification. IEEE Trans Pattern Anal Mach Intell

  • Xiao F, Wen J, Pedrycz W (2022b) Generalized divergence-based decision making method with an application to pattern classification. IEEE Trans Knowl Data Eng

  • Xu W, Yu J (2017) A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf Sci 378:410–423

    Article  MATH  Google Scholar 

  • Yager RR, Liu L (2008) Classic works of the Dempster–Shafer theory of belief functions. Springer, Berlin

    Book  MATH  Google Scholar 

  • Yinglei H, Dexin Q, Shengyuan Z (2022) Smart transportation travel model based on multiple data sources fusion for defense systems. Soft Comput 26(7):3247–3259

    Article  Google Scholar 

  • Zhao K, Sun R, Li L, Hou M, Yuan G, Sun R (2021) An optimal evidential data fusion algorithm based on the new divergence measure of basic probability assignment. Soft Comput 25(17):11449–11457

    Article  Google Scholar 

  • Zhao K, Li L, Chen Z, Sun R, Yuan G, Li J (2022) A survey: optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Appl Soft Comput 109075

  • Zhu Z, Li G (2017) A preliminary study on knowledge fusion from the overall perspective of data, information, and knowledge—the association and comparison of data fusion, information fusion and knowledge fusion (in Chinese). Intell Theory Pract 40(2):12–18

    Google Scholar 

Download references

Funding

This research was funded by National Key Research and Development Program of China, Grant number 2016YFB0501805, and National Development and Reform Commission of China, Grant number JZNYYY001.

Author information

Authors and Affiliations

Authors

Contributions

Each author had made some contribution to this article.

Corresponding author

Correspondence to Ruizhi Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Zhou, J. & Sun, R. A multi-source heterogeneous spatial big data fusion method based on multiple similarity and voting decision. Soft Comput 27, 2479–2492 (2023). https://doi.org/10.1007/s00500-022-07734-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07734-0

Keywords

Navigation