Abstract
Data supplied mining executive funded data repository of a subset of the sample after the capital data set can replace the original database to reduce order to repeatedly search the database of the time, therefore there is a lot of algorithms has been proposed future at reasonable sampling-owned data sets so that it more real to reflect the original database. These algorithms by data from randomly selected set, select or deletion of swap some noise records of the transactions to make more meaningful rules can be extracted out of the future. We observed that the sample data set is composed of cluster transaction data. Each cluster consists of the similar nature of the information in some of the arguments. Therefore, the removal of outliers should be based on each cluster as a unit without the data set should be based on the entire sample. In order to consider for high-dimensional data encountered curse of dimensionality of the problem. We have studied LSH (Locality the Sensitive the Hashing) the technology to do a cluster of all cut the main point of view through multiple hybrid hash function high similar to the level of trading discipline record will have higher opportunity gathered in the same cluster, the contrary is each other collision reduce the chances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 4th International Conference on Knowledge Discovery and Data Mining (1996)
Klein, A., Do, H.-H., Lehner, W.: Representing data quality for streaming and static data. In: The International Workshop on Ambient Intelligence, Media, and Sensing, AIMS Workshop, pp. 3–10 (2007)
Fayyad, U.M., Reina, C.A., Bradley, P.S.: Initialization of iterative refinement clustering algorithms. In: The 4th International Conference on Knowledge Discovery and Data Mining, pp. 194–198 (1998)
Kraemer, J., Seeger, B.: Pipes - A public infrastructure for processing and exploring streams. In: Weikum, G., et al. (eds.) The 9th International Conference on Management of Data, pp. 925–926. ACM (2004)
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: The ACM International Conference on Management of Data, pp. 73–84 (1998)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers (2006)
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)
Mielke, M., Mueller, H., Naumann, F.: Ein data-quality-wettbewerb. Datenbank-Spektrum 14, 34–37 (2005)
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: A Relevance-Extended Multi-Dimensional Model for a Data Warehouse Contextualized with Documents. In: Proc. Eighth ACM Int’l Workshop Data Warehousing and Olap (Dolap 2005), pp. 19–28 (2005)
Nabli, A., Soussi, A., Feki, J., Ben-abdallah, H., Gargouri, F.: Owards an Automatic Data Mart Design. In: 7th International Conference on Enterprise Information Systems (ICEIS 2005), Miami, USA, pp. 226–231 (May 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Yy., Zeng, R., Li, Mz., Li, F. (2012). Research of LSH and Outliers Detection. In: Liu, C., Wang, L., Yang, A. (eds) Information Computing and Applications. ICICA 2012. Communications in Computer and Information Science, vol 307. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34038-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-34038-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34037-6
Online ISBN: 978-3-642-34038-3
eBook Packages: Computer ScienceComputer Science (R0)