A sentence structure-based approach to unsupervised author identification

Ferilli, Stefano

doi:10.1007/s10844-014-0349-9

A sentence structure-based approach to unsupervised author identification

Published: 19 December 2014

Volume 46, pages 1–19, (2016)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Stefano Ferilli¹

556 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Assessing whether two documents were written by the same author is a crucial task, especially in the Internet age, with possible applications to philology and forensics. The problem has been tackled in the literature by exploiting frequency-based approaches, numeric techniques or writing style analysis. Focusing on this last perspective, this paper proposes a novel technique that takes into account the structure of sentences, assuming that it is strictly related to the author’s writing style. Specifically, a (collection of) text(s) in natural language written by a given author is translated into a set of First-Order Logic descriptions, and a model of the author’s writing habits is obtained as the result of clustering these descriptions. Then, if an overlapping exists between the models of a known author and of an unknown one, the conclusion can be drawn that they are the same person. Among the advantages of this approach, it does not need a training phase, and performs well also on short texts and/or small collections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Relational Unsupervised Approach to Author Identification

Unsupervised Author Identification and Characterization

Computer-Aided Forensic Authorship Identification in Criminology

References

Argamon, S., Saric, M., Stein, S S. (2003). Style mining of electronic messages for multiple authorship discrimination: first results In Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (Eds.), Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, (pp. 475–480): ACM.
Argamon, S., Whitelaw, C., Chase, P., Hota, S. R., Garg, N., Levitan, S (2007). Stylistic text classification using functional lexical features: Research articles. Journal American Society Information Science Technology, 58(6), 802–822.
Article Google Scholar
Church, K., & Hanks, P. (1990). Word association norms, mutual information and lexicography. Computational Linguistics, 16, 22–29.
Google Scholar
De Marneffe, M., Maccartney, B., Manning, C.D. (2006). Generating typed dependency parses from phrase structure parses. In Proc. Int’l Conf. on Language Resources and Evaluation (LREC) (pp. 449–454).
Diederich, J., Kindermann, J., Leopold, E., Paass, G. (2003). Authorship attribution with support vector machines. Applied Intelligence, 19(1-2), 109–123.
Article MATH Google Scholar
Fellbaum, C. (Ed.) (1998). WordNet: An Electronic Lexical Database. Cambridge: MIT Press.
Feng, V. W., & Hirst, G. (2013). Authorship verication with entity coherence and other rich linguistic features notebook for pan at clef 2013 In Forner, P., Navigli, R., Tufis, D. (Eds.), CLEF 2013 Labs and Workshops - Online Working Notes, PROMISE, Padua, Italy.
Ferilli, S., Basile, T. M., Biba, M., Mauro, N.D., Esposito, F. (2009a). A general similarity framework for horn clause logic. Fundamenta Informaticæ, 90(1-2), 43–46.
MATH Google Scholar
Ferilli, S., Biba, M., Di Mauro, N., Basile, T., Esposito, F. (2009b). Plugging taxonomic similarity in first-order logic horn clauses comparison. In: Emergent Perspectives in Artificial Intelligence, Lecture Notes in Artificial Intelligence, (pp. 131–140): Springer.
Ferilli, S., Leuzzi, F., Rotella, F. (2011). Cooperating techniques for extracting conceptual taxonomies from text. In: Proceedings of The Workshop on Mining Complex Patterns at AI*IA XIIth Conference.
van Halteren, H. (2004). Linguistic profiling for author recognition and verification. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’04.
Juola, P., & Stamatatos, E. (2013). Overview of the author identification task at PAN 2013. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization. 4th International Conference of the CLEF Initiative (CLEF 2013), http://www.uni-weimar.de/medien/webis/research/events/pan-13/pan13-papers-final/pan13-authorship-verification/juola13-overview.pdf.
Klein, D., & Manning, C.D. (2003). Fast exact inference with a factored model for natural language parsing. In Advances in neural information processing systems Vol. 15: MIT Press.
Leuzzi, F., Ferilli, S., Rotella, F. (2013). Improving robustness and flexibility of concept taxonomy learning from text In Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (Eds.), New Frontiers in Mining Complex Patterns - First International Workshop, NFMCP 2012, Held in Conjunction with ECML/PKDD 2012, Bristol, UK, September 24, 2012, Revised Selected Papers, CCIS, vol 7765, (pp. 232–244). Berlin Heidelberg: Springer-Verlag.
Google Scholar
Li, J., Zheng, R., Chen, H. (2006). From fingerprint to writeprint. Commun ACM, 49(4), 76–82.
Article Google Scholar
Lloyd, J.W. (1987). Foundations of logic programming, 2nd Edition: Springer.
Lowe, D., & Matthews, R. (1995). Shakespeare vs. fletcher: A stylometric analysis by radial basis functions. Computers and the Humanities, 29(6), 449–461.
Article Google Scholar
Mccarthy, P. M., Lewis, G. A., Dufty, D. F., Mcnamara, D. S. (2006). Analyzing writing styles with coh-metrix In Sutcliffe, G., & Goebel, R. (Eds.), Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS), (pp. 764–769): AAAI Press.
Qiu, L., Kan, M.Y., Chua, T.S. (2004). A public reference implementation of the RAP anaphora resolution algorithm. In Proceedings of the 4th international conference on language resources and evaluation, LREC 2004, May 26-28, (pp. 291–294). Lisbon, Portugal: European Language Resources Association.
Raghavan, S., Kovashka, A., Mooney, R. (2010). Authorship attribution using probabilistic context-free grammars. In Proceedings of the ACL 2010 conference short papers, ACLShort ’10, (pp. 38–42). Stroudsburg: Association for Computational Linguistics.
Seidman, S. (2013). Authorship verification using the impostors method – notebook for PAN at CLEF 2013. In Forner, P., Navigli, R., Tufis, D. (Eds.) CLEF 2013 labs and workshops - online working notes, PROMISE. Padua, Italy.
Settles, B (2010). Active learning literature survey. Tech. Rep. Computer Sciences 1648, University of Wisconsin-Madison.
Tweedie, F. J., Singh, S., Holmes, D.I. (1996). Neural network applications in stylometry: The federalist papers. Computers and the Humanities, 30(1), 1–10.
Article Google Scholar
Vilariño, D., Pinto, D., Gómez, H., León, S., Castillo, E. (2013). Lexical-syntactic and graph-based features for authorship verification, -notebook for pan at clef 2013 In Forner, P., Navigli, R., Tufis, D. (Eds.), CLEF 2013 labs and workshops - online working notes, PROMISE, Padua, Italy.
Zheng, R., Li, J., Chen, H., Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal American Society Information Science Technology, 57(3), 378–393.
Article Google Scholar

Download references

Acknowledgments

The author would like to thank Fabio Leuzzi and Fulvio Rotella for their contribution in implementing the proposed approach, and for the useful hints in setting up the strategy. This work was partially funded by the Italian PON 2007-2013 project PON02_00563_3489339 “Puglia@Service”.

Author information

Authors and Affiliations

Dipartimento di Informatica – Centro Interdipartimentale per la Logica e sue Applicazioni, Universitá di Bari, Bari, Italy
Stefano Ferilli

Authors

Stefano Ferilli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Ferilli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferilli, S. A sentence structure-based approach to unsupervised author identification. J Intell Inf Syst 46, 1–19 (2016). https://doi.org/10.1007/s10844-014-0349-9

Download citation

Received: 27 May 2014
Revised: 26 November 2014
Accepted: 27 November 2014
Published: 19 December 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10844-014-0349-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sentence structure-based approach to unsupervised author identification

Abstract

Access this article

Similar content being viewed by others

A Relational Unsupervised Approach to Author Identification

Unsupervised Author Identification and Characterization

Computer-Aided Forensic Authorship Identification in Criminology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sentence structure-based approach to unsupervised author identification

Abstract

Access this article

Similar content being viewed by others

A Relational Unsupervised Approach to Author Identification

Unsupervised Author Identification and Characterization

Computer-Aided Forensic Authorship Identification in Criminology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation