CatS: A Classification-Powered Meta-Search Engine

Radovanović, Miloš; Ivanović, Mirjana

doi:10.1007/3-540-33880-2_20

Miloš Radovanović⁷ &
Mirjana Ivanović⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 23))

674 Accesses
10 Citations
3 Altmetric

Summary

CatS is a meta-search engine that utilizes text classification techniques to improve the presentation of search results. After posting a query, the user is offered an opportunity to refine the results by browsing through a category tree derived from the dmoz Open Directory topic hierarchy. This paper describes some key aspects of the system (including HTML parsing, classification and displaying of results), outlines the text categorization experiments performed in order to choose the right parameters for classification, and puts the system into the context of related work on (meta-)search engines. The approach of using a separate category tree represents an extension of the standard relevance list, and provides a way to refine the search on need, offering the user a non-imposing, but potentially powerful tool for locating needed information quickly and efficiently. The current implementation of CatS may be considered a baseline, on top of which many enhancements are possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991.
Google Scholar
D. Butler. Souped-up search engines. Nature, 405:112–115, May 2000.
Article Google Scholar
H. Chen and S. T. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of CHI00, Human Factors in Computing Systems, pages 145–152, 2000.
Google Scholar
P. Ferragina and A. Gulli. A personalized search engine based on Web-snippet hierarchical clustering. In Proceedings of WWW05, 14th International World Wide Web Conference, pages 801–810, Chiba, Japan, 2005.
Google Scholar
Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
Article MATH Google Scholar
P. Jackson and I. Moulinier. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins, 2002.
Google Scholar
S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 13(3):637–649, 2001.
Article MATH Google Scholar
A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes. Multinomial naive bayes for text categorization revisited. In Proceedings of AI2004, 17th Australian Joint Conference on Artificial Intelligence, LNAI 3339, pages 488–499, Cairns, Australia, 2004.
Google Scholar
I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of ECML97, 7th European Conference on Machine Learning, pages 412–420, 1997.
Google Scholar
D. Lawrie and W. B. Croft. Generating hierarchical summaries for Web searches. In Pro ceedings of SIGIR03, 26th ACM International Conference on Research and Development in Information Retrieval, Toronto, Canada, 2003.
Google Scholar
D. Mladenić. Machine Learning on non-homogenous, distributed text data. PhD thesis, University of Ljubljana, Slovenia, 1998.
Google Scholar
C. Nadeau and Y. Bengio. Inference for the generalization error. Machine Learning, 52(3), 2003.
Google Scholar
S. Osiński and D. Weiss. A concept-driven algorithm for clustering search results. IEEE Intelligent Systems, 20(3):48–54, 2005.
Article Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished manuscript, 1998.
Google Scholar
J. Platt. Fast training of Support Vector Machines using Sequential Minimal Optimization. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods-Support Vector Learning. MIT Press, 1999.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program, 14(3): 130–137, 1980.
Google Scholar
R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
Google Scholar
M. Radovanović. Machine learning in Web mining. Master’s thesis, Department of Mathematics and Informatics, University of Novi Sad, Serbia and Montenegro, 2006. To appear.
Google Scholar
M. Radovanović and M. Ivanović. Search based on ontologies. In Proceedings of PRIM2004, 16th Conference on Applied Mathematics, Budva, Serbia and Montenegro, 2004.
Google Scholar
M. Radovanović and M. Ivanović. Document representations for classification of short Web-page descriptions. To appear, 2006.
Google Scholar
J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of ICML03, 20th International Conference on Machine Learning, 2003.
Google Scholar
G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971.
Google Scholar
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1–47, 2002.
Article Google Scholar
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, 2nd edition, 2005.
Google Scholar
Y.-F. Wu and X. Chen. Extracting features from Web search returned hits for hierarchical classification. In Proceedings of IKE03, International Conference on Information and Knowledge Engineering, Las Vegas, Nevada, USA, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science, Department of Mathematics and Informatics, University of Novi Sad, Trg D. Obradovića 4, 21000, Novi Sad, Serbia and Montenegro
Miloš Radovanović & Mirjana Ivanović

Authors

Miloš Radovanović
View author publications
You can also search for this author in PubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Mark Last
Institute of Computer Sciences, Technical University of Lodz, ul. Wolczanska 215, 93-1005, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak
Department of Software Engineering, ORT Braude College, POB. 78, 21982, Karmiel, Israel
Zeev Volkovich
Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Ave., ENB 118, Tampa, FL, 33620, USA
Abraham Kandel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Radovanović, M., Ivanović, M. (2006). CatS: A Classification-Powered Meta-Search Engine. In: Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds) Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33880-2_20

Download citation

DOI: https://doi.org/10.1007/3-540-33880-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33879-6
Online ISBN: 978-3-540-33880-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics