Abstract
This paper analyses the method of zonal correlation text analysis based on the comparison of the distribution of word counts in the J 1 zones of two or more texts. The compared texts are divided into the J 0, J 1, and J 2 zones in accordance with the previously developed interpretation of Bradford’s law. A comparative analysis of the parameters of words that are contained in the J 1 text zones is performed. The distance, which points to the degree of semantic proximity of the compared texts, is calculated. The described method can be applied for automatic classification and author attribution of texts. In addition, it allows one to obtain adequate results based on a limited number of parameters.
Similar content being viewed by others
References
Yatsko, V.A., Computational linguistics or linguistic informatics?, Autom. Doc. Math. Linguist., 2014, vol. 48, no. 3, pp. 149–157.
Piantadosi, S.T., Zipf’s word frequency law in natural language: a critical review and future directions, 2014. http://colala.bcs.rochester.edu/papers/piantadosi2014zipfs.pdf.
West, M., The mystery of Zipf, 2008. http://plus.maths.org/content/mystery-zipf.
Sorell, J., Zipf’s law and vocabulary, in The Encyclopedia of Applied Linguistics, Chapelle, C.A., Ed., Oxford: Blackwell, 2012. http://www.academia.edu/550703/Zipfs-Law-and-Vocabulary.
Bol’shakova, E.I., Klyshinskii, E.S., Lande, D.V., et al., Avtomaticheskaya obrabotka tekstov na estestvennom yazyke i komp’yuternaya lingvistika: ucheb. posobie (Automatic Text Processing in Natural Language and Computer Linguistics: Handbook), Ìoscow: MIEP, 2011.
Mogilev, A.V. and Listrova, L.V., Informatsiya i informatsionnye protsessy. Sotsial’naya informatika (Information and Information Processes. Social Informatics), St. Petersburg: BKHV-Petersburg, 2006.
Yatsko, V.A., The interpretation of Bradford’s law in terms of geometric progression, Autom. Doc. Math. Linguist., 2012, vol. 46, no. 2, pp. 112–117.
Anthony L., AntConc3.1.3, 2012. http://www.antlab.sci.waseda.ac.jp/antconc-index.html.
Yatsko, V. A., Method of zonal data analysis, V Mire Nauch. Otrk., 2013, no. 6.1, pp. 166–182.
Hadi, W.M., Thabtah, M., and Abdel-Jaber, H., A comparative study using vector space model with Knearest neighbor on text categorization data, Proc. of the World Congress on Engineering (London, 2007), Ao, S.I., Gelman, L., Hukins, D., Hunter, A., and Korsunsky, A.M., Eds., London, 2007, vol. 1, pp. 296–300. http://www.iaeng.org/publication/WCE2007/WCE2007-pp296-300.pdf.
Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.
Santini, M., Description of 3 feature sets for automatic identification of genres in web pages, 2005–2006. http://www.nltg.brighton.ac.uk/home/Marina.Santini/three-feature-sets.pdf.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © V.A. Yatsko, 2014, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2014, No. 10, pp. 26–30.
About this article
Cite this article
Yatsko, V.A. The method of zonal correlation text analysis. Autom. Doc. Math. Linguist. 48, 259–263 (2014). https://doi.org/10.3103/S0005105514050057
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0005105514050057