Abstract
Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.
Similar content being viewed by others
Notes
Notice although Google is blocked in China, developers can use Google by using agent service such as Shadowsocks. See https://shadowsocks.com/ for more details.
CSDN is one of the largest technical blog site in China, see http://www.csdn.net/ for more details.
Zhihu is one of the largest Q&A site in China, see http://www.zhihu.com/ for more details.
Cloudera is a software company that provides Apache Hadoop-based software, support and services, and training to business customers (https://www.cloudera.com/).
We identified another 6 search tasks in the open-ended interviews.
Wensong Zhang is the co-founder of the project Linux Virtual Server, see https://en.wikipedia.org/wiki/Linux_Virtual_Server for more details.
☛D An expletive was masked out.
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
References
Krugle (2014) http://opensearch.krugle.org/projects/
Koders (2016) http://www.koders.com
Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: a search engine for open source code supporting structure-based search Proceedings of the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications, ACM, pp 681–682
Bajracharya SK, Lopes CV (2009) Mining search topics from a code search engine usage log Proceedings of the 6th international working conference on mining software repositories (MSR), IEEE
Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empir Softw Eng 17(4-5):424–466
Bao L, Xing Z, Wang X, Zhou B (2015a) Tracking and analyzing cross-cutting activities in developers’ daily work Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 277–282
Bao L, Ye D, Xing Z, Xia X, Wang X (2015b) Activityspace: a remembrance framework to support interapplication information needs Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 864–869
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1589–1598
Broder A (2002) A taxonomy of web search ACM SIGIR Forum, ACM, vol 36, pp 3–10
Cutrell E, Guan Z (2007) What are you looking for?: an eye-tracking study of information usage in web search Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 407–416
Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia AD, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering Proceedings of the 35th international conference on software engineering (ICSE), pp 842–851
Jansen BJ, Spink A, Saracevic T (2000) Real life, real users, and real needs: a study and analysis of user queries on the web. Inf Process Manag 36(2):207–227
Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng (TSE) 32(12):971–987
Lee U, Liu Z, Cho J (2005) Automatic identification of user goals in web search Proceedings of the 14th international conference on world wide web (WWW), ACM, pp 391–400
Lemos OAL, Bajracharya SK, Ossher J, Morla RS, Masiero PC, Baldi P, Lopes CV (2007) Codegenie: using test-cases to search and reuse source code Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE), ACM, pp 525–526
Li H, Xing Z, Peng X, Zhao W (2013) What help do developers seek, when and how? Proceedings of the 20th working conference on reverse engineering (WCRE), IEEE, pp 142–151
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2009) Sourcerer: mining and searching internet-scale software repositories. Data Min Knowl Disc 18(2):300–336
Ponzanelli L, Bacchelli A, Lanza M (2013) Seahawk: Stack overflow in the ide Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 1295–1298
Rahman MM, Yeasmin S, Roy CK (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions Software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), 2014, IEEE, pp 194–203
Rose DE, Levinson D (2004) Understanding user goals in web search Proceedings of the 13th international conference on world wide web (WWW), ACM, pp 13–19
Sadowski C, Stolee KT, Elbaum S (2015) How developers search for code: a case study Proceedings of the 10th joint meeting on foundations of software engineering (FSE), ACM, pp 191–201
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3):507–512
Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks Proceedings of the 14th ACM SIGSOFT international symposium on foundations of software engineering, ACM, pp 23–34
Silverstein C, Marais H, Henzinger M, Moricz M (1999) Analysis of a very large web search engine query log ACM SIGIR Forum, ACM, vol 33, pp 6–12
Sim SE, Clarke CL, Holt RC (1998) Archetypal source code searches: a survey of software developers and maintainers Proceedings of the 6th international workshop on program comprehension (IWPC), IEEE, pp 180–187
Sim SE, Umarji M, Ratanotayanon S, Lopes CV (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol (TOSEM) 21(1):4
Sim SE, Philip K, Umarji M, Agarwala M, Gallardo-Valencia R, Lopes CV, Ratanotayanon S (2012) Software reuse through methodical component reuse and amethodical snippet remixing Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1361–1370
Spink A, Jansen BJ, Wolfram D, Saracevic T (2002) From e-sex to e-commerce: Web search changes. Computer 35(3):107–109
Stolee KT, Elbaum S, Dobos D (2014) Solving the search for source code. ACM Trans Softw Eng Methodol (TOSEM) 23(3):26
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction model. IEEE Trans Softw Eng (TSE) 43(1):1–18
Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web?: Nier track Proceedings of the 33rd international conference on software engineering (ICSE), IEEE, pp 804–807
Wuensch KL (2005) What is a likert scale? and how do you pronounce’likert?’. East Carolina University
Acknowledgments
The authors thank to all the developers who participated in this study. This research is supported by NSFC Program (No.61602403) and National Key Technology R&D Program of the Ministry of Science and Technology of China under grant 2015BAH17F01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Emerson Murphy-Hill
Rights and permissions
About this article
Cite this article
Xia, X., Bao, L., Lo, D. et al. What do developers search for on the web?. Empir Software Eng 22, 3149–3185 (2017). https://doi.org/10.1007/s10664-017-9514-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9514-4