APIReal: an API recognition and linking approach for online developer forums

Ye, Deheng; Bao, Lingfeng; Xing, Zhenchang; Lin, Shang-Wei

doi:10.1007/s10664-018-9608-7

APIReal: an API recognition and linking approach for online developer forums

Published: 05 March 2018

Volume 23, pages 3129–3160, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Deheng Ye^1,2,
Lingfeng Bao ORCID: orcid.org/0000-0003-1846-0921³,
Zhenchang Xing⁴ &
…
Shang-Wei Lin¹

830 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

When discussing programming issues on social platforms (e.g, Stack Overflow, Twitter), developers often mention APIs in natural language texts. Extracting API mentions from natural language texts serves as the prerequisite to effective indexing and searching for API-related information in software engineering social content. The task of extracting API mentions from natural language texts involves two steps: 1) distinguishing API mentions from other English words (i.e., API recognition), 2) disambiguating a recognized API mention to its unique fully qualified name (i.e., API linking). Software engineering social content lacks consistent API mentions and sentence writing format. As a result, API recognition and linking have to deal with the inherent ambiguity of API mentions in informal text, for example, due to the ambiguity between the API sense of a common word and the normal sense of the word (e.g., append, apply and merge), the simple name of an API can map to several APIs of the same library or of different libraries, or different writing forms of an API should be linked to the same API. In this paper, we propose a semi-supervised machine learning approach that exploits name synonyms and rich semantic context of API mentions for API recognition in informal text. Based on the results of our API recognition approach, we further propose an API linking approach leveraging a set of domain-specific heuristics, including mention-mention similarity, scope filtering, and mention-entry similarity, to determine which API in the knowledge base a recognized API actually refers to. To evaluate our API recognition approach, we use 1205 API mentions of three libraries (Pandas, Numpy, and Matplotlib) from Stack Overflow text. We also evaluate our API linking approach with 120 recognized API mentions of these three libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://github.com/google/code-prettify
Scrapy, http://scrapy.org/
CRFSuite, http://www.chokkan.org/software/crfsuite/
Brown Clustering, https://github.com/percyliang/brown-cluster
Word2vec, https://code.google.com/archive/p/word2vec/
Sofia-ML, https://code.google.com/archive/p/sofia-ml/
http://www.signll.org/conll/

References

Abdalkareem R, Shihab E, Rilling J (2017) On code reuse from stackoverflow: an exploratory study on android apps. Inf Softw Technol 88:148–158
Article Google Scholar
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng (TSE) 28(10):970–983
Article Google Scholar
Bacchelli A, D’Ambros M, Lanza M, Robbes R (2009) Benchmarking lightweight techniques to link e-mails and source code. In: Proceedings of the 16th working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 205–214
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE). ACM, New York, pp 375–384
Bacchelli A, Cleve A, Lanza M, Mocci A (2011) Extracting structured data from natural language documents with island parsing. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 476–479
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Chen F, Kim S (2015) Crowd debugging. In: Proceedings of the 10th joint meeting on foundations of software engineering (FSE). ACM, New York, pp 320–332
Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: EMNLP, Citeseer, pp 1025–1035
Dagenais B, Robillard MP (2012) Recovering traceability links between an api and its learning resources. In: Proceedings of the 34th international conference on software engineering (ICSE). IEEE, Piscataway, pp 47–57
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing q&a sites (t). In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 307–318
Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: EMNLP, pp 110–120
Ji Z, Sun A, Cong G, Han J (2016) Joint recognition and linking of fine-grained locations from tweets. In: Proceedings of the 25th international conference on world wide web (WWW), International World Wide Web Conferences Steering Committee, pp 1271–1281
Jiang HY, Nguyen TN, Chen X, Jaygarl H, Chang CK (2008) Incremental latent semantic indexing for automatic traceability link evolution management. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering (ASE), IEEE Computer Society, pp 59–68
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth international conference on machine learning, ICML ’01, pp 282–289
Li C, Sun A (2014) Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, New York, pp 43–52
Liang P (2005) Semi-supervised learning for natural language. PhD thesis, Citeseer
Liao W, Veeramachaneni S (2009) A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, pp 58–65
Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? a study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension (ICPC). ACM, New York, pp 83–94
Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-Volume 1, Association for Computational Linguistics, pp 359–367
Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: ACL (1), pp 1304–1311
Marcus A, Maletic J et al. (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE). IEEE, Piscataway, pp 125–135
Mihalcea R (2004) Co-training and self-training for word sense disambiguation. In: CoNLL, pp 33– 40
Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, New York, pp 233–242
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Milne D, Witte IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management. ACM , New York, pp 509–518
Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of eighth working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 13–22
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv (CSUR) 41(2):10
Article Google Scholar
Parnin C, Treude C, Grammel L, Storey MA (2012) Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech Rep
Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: SANER
Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of international conference on software engineering (ICSE). IEEE Press, Piscataway, pp 832–841
Shen W, Wang J, Luo P, Wang M (2012) Liege:: Link entities in web lists with knowledge base. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’12, pp 1424–1432
Subramanian S, Inozemtseva L, Holmes R (2014) Live api documentation. In: Proceedings of the 36th international conference on software engineering (ICSE). ACM, New York, pp 643–652
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 384–394
Wang M, Manning CD (2013) Effect of non-linear deep architecture in sequence labeling. In: IJCNLP, pp 1285–1291
Wu D, Lee WS, Ye N, Chieu HL (2009) Domain adaptive bootstrapping for named entity recognition. In: Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 3-Volume 3, Association for Computational Linguistics, pp 1523–1532
Wu N, Hou D, Liu Q (2016) Linking usage tutorials into api client code pp 22–28
Yao Y, Sun A (2015) Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web pp 1–23
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting on association for computational linguistics, association for computational linguistics, pp 189–196
Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016a) Software-specific named entity recognition in software engineering social content. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution and reengineering (SANER)
Ye D, Xing Z, Li J, Kapre N (2016b) Software-specific part-of-speech tagging: An experimental study on stack overflow. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, New York, SAC ’16, pp 1378–1385. https://doi.org/10.1145/2851613.2851772
Yu M, Zhao T, Dong D, Tian H, Yu D (2013) Compound embedding features for semi-supervised learning. In: HLT-NAACL, pp 563–568
Zheng W, Zhang Q, Lyu M (2011) Cross-library api recommendation using web search engines. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, New York, pp 480–483

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, Singapore
Deheng Ye & Shang-Wei Lin
Tencent AI Lab, Shenzhen, China
Deheng Ye
College of Computer Science, Zhejiang University, Hangzhou, China
Lingfeng Bao
Research School of Computer Science, Australian National University, Canberra, Australia
Zhenchang Xing

Authors

Deheng Ye
View author publications
You can also search for this author in PubMed Google Scholar
Lingfeng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenchang Xing
View author publications
You can also search for this author in PubMed Google Scholar
Shang-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingfeng Bao.

Additional information

Communicated by: Denys Poshyvanyk

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, D., Bao, L., Xing, Z. et al. APIReal: an API recognition and linking approach for online developer forums. Empir Software Eng 23, 3129–3160 (2018). https://doi.org/10.1007/s10664-018-9608-7

Download citation

Published: 05 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10664-018-9608-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

APIReal: an API recognition and linking approach for online developer forums

Abstract

Access this article

Similar content being viewed by others

JARAD: An Approach for Java API Mention Recognition and Disambiguation in Stack Overflow

Automatic recognizing relevant fragments of APIs using API references

RASOP: An API Recommendation Method Based on Word Embedding Technology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

APIReal: an API recognition and linking approach for online developer forums

Abstract

Access this article

Similar content being viewed by others

JARAD: An Approach for Java API Mention Recognition and Disambiguation in Stack Overflow

Automatic recognizing relevant fragments of APIs using API references

RASOP: An API Recommendation Method Based on Word Embedding Technology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation