Abstract
Information retrieval systems, based on keyword match, are evolving to question answering systems that return short passages or direct answers to questions, rather than URLs pointing to whole pages. Most open-domain question answering systems depend on manually designed hierarchies of question types. A question is first classified to a fixed type, and then hand-engineered rules associated with the type yield keywords and/or predictive annotations that are likely to match indexed answer passages. Here we seek a more data-driven approach, assisted by machine learning. We propose a simple log-linear model over a pair of feature vectors, one derived from the question and the other derived from the a candidate passage. Features are extracted using a lexical network and surface context as in named entity extraction, except that there is no direct supervision available in the form of fixed entity types and their examples. Using the log-linear model, we filter candidate passages and see substantial improvement in the mean rank at which the first answer is found. The model parameters distill and reveal linguistic artifacts coupling questions and their answers, which can be used for better annotation and indexing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: WWW Conference, pp. 169–178 (2001)
Breck, E., Burger, J., House, D., Light, M., Mani, I.: Answering from Large Document Collections. In: AAAI Fall Symposium on Question Answering Systems (1999)
Chen, S.F., Rosenfeld, R.: A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, Carnegie Mellon University (1999)
Clarke, C.L.A., Cormack, G.V., Lynam, T.R.: Exploiting redundancy in question answering. In: SIGIR, pp. 358–365 (2001)
Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web question answering: Is more always better? In: SIGIR, pp. 291–298 (2002)
Etzioni, O., Cafarella, M., et al.: Web-scale information extraction in KnowItAll. In: WWW Conference. ACM, New York (2004)
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V., Morarescu, P.: FALCON: Boosting knowledge for answer engines. In: TREC 9, pp. 479–488. NIST (2000)
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., Lin, C.-Y.: Question answering in Webclopedia. In: TRECÂ 9, NIST (2001)
Katz, B., Lin, J.: Selectively using relations to improve precision in question answering. In: EACL Workshop on Natural Language Processing for Question Answering, Budapest, Hungary (2003)
Kwok, C., Etzioni, O., Weld, D.S.: Scaling question answering to the Web. In: WWW Conference, Hong Kong, vol. 10, pp. 150–161 (2001)
Light, M., Mann, G., Riloff, E., Breck, E.: Analyses for elucidating current question answering technology. Journal of Natural Language Engineering 7(4), 325–342 (2001)
Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 7(4), 343–360 (2001)
McCallum, A.: Efficiently inducing features of conditional random fields. In: UAI (2003)
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An online lexical database. International Journal of Lexicography (1993)
Nyberg, E., Mitamura, T., Callan, J., Carbonell, J., Frederking, R., Collins-Thompson, K., Hiyakumoto, L., Huang, Y., Huttenhower, C., Judy, S., Ko, J., Kupsc, A., Lita, L.V., Pedro, V., Svoboda, D., Durme, B.V.: The JAVELIN question-answering system at TREC 2003: A multi-strategy approach with dynamic planning. In: TREC, vol. 12 (2003)
Prager, J., Brown, E., Coden, A., Radev, D.: Question-answering by predictive annotation. In: SIGIR, pp. 184–191. ACM, New York (2000)
Radev, D., Fan, W., Qi, H., Wu, H., Grewal, A.: Probabilistic question answering on the web. In: WWW Conference, pp. 408–419 (2002)
Ramakrishnan, G., Chakrabarti, S., Paranjpe, D.A., Bhattacharyya, P.: Is question answering an acquired skill? In: WWW Conference, New York, pp. 111–120 (2004)
Suzuki, J., Hirao, T., Sasaki, Y., Maeda, E.: Hierarchical directed acyclic graph kernel: Methods for structured natural language data. In: ACL, pp. 32–39 (2003)
Tellex, S., Katz, B., et al.: Quantitative evaluation of passage retrieval algorithms for question answering. In: SIGIR, pp. 41–47 (2003)
Voorhees, E.: Overview of the TREC 2001 question answering track. In: The Tenth Text REtrieval Conference. NIST Special Publication, vol. 500-250, pp. 42–51 (2001)
Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: ACL, Las Cruces, NM, vol. 32, pp. 88–95 (1994)
Zhang, D., Lee, W.S.: A language modeling approach to passage question answering. In: Text REtrieval Conference (TREC), NIST, vol. 12 (November 2003)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR, Toronto, Canada. ACM, New York (2003)
Zhang, J., Yang, Y.: Robustness of regularized linear classification methods in text categorization. In: SIGIR, pp. 190–197. ACM, New York (2003)
Zheng, Z.: AnswerBus question answering system. In: HLT (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakrabarti, S. (2006). Discovering Links Between Lexical and Surface Features in Questions and Answers. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2004. Lecture Notes in Computer Science(), vol 3932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11899402_8
Download citation
DOI: https://doi.org/10.1007/11899402_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47127-1
Online ISBN: 978-3-540-47128-8
eBook Packages: Computer ScienceComputer Science (R0)