Skip to main content

CatS: A Classification-Powered Meta-Search Engine

  • Chapter
Advances in Web Intelligence and Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 23))

Summary

CatS is a meta-search engine that utilizes text classification techniques to improve the presentation of search results. After posting a query, the user is offered an opportunity to refine the results by browsing through a category tree derived from the dmoz Open Directory topic hierarchy. This paper describes some key aspects of the system (including HTML parsing, classification and displaying of results), outlines the text categorization experiments performed in order to choose the right parameters for classification, and puts the system into the context of related work on (meta-)search engines. The approach of using a separate category tree represents an extension of the standard relevance list, and provides a way to refine the search on need, offering the user a non-imposing, but potentially powerful tool for locating needed information quickly and efficiently. The current implementation of CatS may be considered a baseline, on top of which many enhancements are possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991.

    Google Scholar 

  2. D. Butler. Souped-up search engines. Nature, 405:112–115, May 2000.

    Article  Google Scholar 

  3. H. Chen and S. T. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of CHI00, Human Factors in Computing Systems, pages 145–152, 2000.

    Google Scholar 

  4. P. Ferragina and A. Gulli. A personalized search engine based on Web-snippet hierarchical clustering. In Proceedings of WWW05, 14th International World Wide Web Conference, pages 801–810, Chiba, Japan, 2005.

    Google Scholar 

  5. Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.

    Article  MATH  Google Scholar 

  6. P. Jackson and I. Moulinier. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins, 2002.

    Google Scholar 

  7. S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 13(3):637–649, 2001.

    Article  MATH  Google Scholar 

  8. A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes. Multinomial naive bayes for text categorization revisited. In Proceedings of AI2004, 17th Australian Joint Conference on Artificial Intelligence, LNAI 3339, pages 488–499, Cairns, Australia, 2004.

    Google Scholar 

  9. I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of ECML97, 7th European Conference on Machine Learning, pages 412–420, 1997.

    Google Scholar 

  10. D. Lawrie and W. B. Croft. Generating hierarchical summaries for Web searches. In Pro ceedings of SIGIR03, 26th ACM International Conference on Research and Development in Information Retrieval, Toronto, Canada, 2003.

    Google Scholar 

  11. D. Mladenić. Machine Learning on non-homogenous, distributed text data. PhD thesis, University of Ljubljana, Slovenia, 1998.

    Google Scholar 

  12. C. Nadeau and Y. Bengio. Inference for the generalization error. Machine Learning, 52(3), 2003.

    Google Scholar 

  13. S. Osiński and D. Weiss. A concept-driven algorithm for clustering search results. IEEE Intelligent Systems, 20(3):48–54, 2005.

    Article  Google Scholar 

  14. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished manuscript, 1998.

    Google Scholar 

  15. J. Platt. Fast training of Support Vector Machines using Sequential Minimal Optimization. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods-Support Vector Learning. MIT Press, 1999.

    Google Scholar 

  16. M. F. Porter. An algorithm for suffix stripping. Program, 14(3): 130–137, 1980.

    Google Scholar 

  17. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

    Google Scholar 

  18. M. Radovanović. Machine learning in Web mining. Master’s thesis, Department of Mathematics and Informatics, University of Novi Sad, Serbia and Montenegro, 2006. To appear.

    Google Scholar 

  19. M. Radovanović and M. Ivanović. Search based on ontologies. In Proceedings of PRIM2004, 16th Conference on Applied Mathematics, Budva, Serbia and Montenegro, 2004.

    Google Scholar 

  20. M. Radovanović and M. Ivanović. Document representations for classification of short Web-page descriptions. To appear, 2006.

    Google Scholar 

  21. J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of ICML03, 20th International Conference on Machine Learning, 2003.

    Google Scholar 

  22. G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971.

    Google Scholar 

  23. F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1–47, 2002.

    Article  Google Scholar 

  24. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, 2nd edition, 2005.

    Google Scholar 

  25. Y.-F. Wu and X. Chen. Extracting features from Web search returned hits for hierarchical classification. In Proceedings of IKE03, International Conference on Information and Knowledge Engineering, Las Vegas, Nevada, USA, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Radovanović, M., Ivanović, M. (2006). CatS: A Classification-Powered Meta-Search Engine. In: Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds) Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33880-2_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-33880-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33879-6

  • Online ISBN: 978-3-540-33880-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics