Research of LSH and Outliers Detection

Wang, Ying-yan; Zeng, Rui; Li, Ming-zhong; Li, Fang

doi:10.1007/978-3-642-34038-3_11

Ying-yan Wang³,
Rui Zeng³,
Ming-zhong Li³ &
…
Fang Li³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 307))

Included in the following conference series:

International Conference on Information Computing and Applications

1105 Accesses

Abstract

Data supplied mining executive funded data repository of a subset of the sample after the capital data set can replace the original database to reduce order to repeatedly search the database of the time, therefore there is a lot of algorithms has been proposed future at reasonable sampling-owned data sets so that it more real to reflect the original database. These algorithms by data from randomly selected set, select or deletion of swap some noise records of the transactions to make more meaningful rules can be extracted out of the future. We observed that the sample data set is composed of cluster transaction data. Each cluster consists of the similar nature of the information in some of the arguments. Therefore, the removal of outliers should be based on each cluster as a unit without the data set should be based on the entire sample. In order to consider for high-dimensional data encountered curse of dimensionality of the problem. We have studied LSH (Locality the Sensitive the Hashing) the technology to do a cluster of all cut the main point of view through multiple hybrid hash function high similar to the level of trading discipline record will have higher opportunity gathered in the same cluster, the contrary is each other collision reduce the chances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 4th International Conference on Knowledge Discovery and Data Mining (1996)
Google Scholar
Klein, A., Do, H.-H., Lehner, W.: Representing data quality for streaming and static data. In: The International Workshop on Ambient Intelligence, Media, and Sensing, AIMS Workshop, pp. 3–10 (2007)
Google Scholar
Fayyad, U.M., Reina, C.A., Bradley, P.S.: Initialization of iterative refinement clustering algorithms. In: The 4th International Conference on Knowledge Discovery and Data Mining, pp. 194–198 (1998)
Google Scholar
Kraemer, J., Seeger, B.: Pipes - A public infrastructure for processing and exploring streams. In: Weikum, G., et al. (eds.) The 9th International Conference on Management of Data, pp. 925–926. ACM (2004)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: The ACM International Conference on Management of Data, pp. 73–84 (1998)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers (2006)
Google Scholar
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)
Article MATH Google Scholar
Mielke, M., Mueller, H., Naumann, F.: Ein data-quality-wettbewerb. Datenbank-Spektrum 14, 34–37 (2005)
Google Scholar
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: A Relevance-Extended Multi-Dimensional Model for a Data Warehouse Contextualized with Documents. In: Proc. Eighth ACM Int’l Workshop Data Warehousing and Olap (Dolap 2005), pp. 19–28 (2005)
Google Scholar
Nabli, A., Soussi, A., Feki, J., Ben-abdallah, H., Gargouri, F.: Owards an Automatic Data Mart Design. In: 7th International Conference on Enterprise Information Systems (ICEIS 2005), Miami, USA, pp. 226–231 (May 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electro-Mechanical and Information Technology, Yiwu Industrial & Commercial College, Yiwu, China
Ying-yan Wang, Rui Zeng, Ming-zhong Li & Fang Li

Authors

Ying-yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ming-zhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Fang Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Sciences, Hebei United University, 063000, Tangshan, Hebei, China
Chunfeng Liu & Aimin Yang &
Northeastern University at Qinhuangdao, Hebei, China
Leizhen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Yy., Zeng, R., Li, Mz., Li, F. (2012). Research of LSH and Outliers Detection. In: Liu, C., Wang, L., Yang, A. (eds) Information Computing and Applications. ICICA 2012. Communications in Computer and Information Science, vol 307. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34038-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-34038-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34037-6
Online ISBN: 978-3-642-34038-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics