Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

Abstract

Existing clustering techniques have many drawbacks and this includes being trapped in a local optima. In this paper, we introduce the utilization of a new meta-heuristics algorithm, namely the Firefly algorithm (FA) to increase solution diversity. FA is a nature-inspired algorithm that is used in many optimization problems. The FA is realized in document clustering by executing it on Reuters-21578 database. The algorithm identifies documents that has the highest light intensity in a search space and represents it as a centroid. This is followed by recognizing similar documents using the cosine similarity function. Documents that are similar to the centroid are located into one cluster and dissimilar in the other. Experiments performed on the chosen dataset produce high values of Purity and F-measure. Hence, suggesting that the proposed Firefly algorithm is a possible approach in document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Das, S., Abraham, A., Konar, A.: Metaheuristic Clustering, Springer, Heidelberg (2009).

    Google Scholar 

  2. AnithaElavarasi, S., Akilandeswari, J., Sathiyabma, B.: A survay on Partition Clustering Algorithms. In: International journal of Enterprise Computing and Business Systems, vol. 1, issue 1, (2011).

    Google Scholar 

  3. Ye, N., Gauch, S., Wang, Q., Luong, H.:An Adaptive Ontology based Hierarchical Browsing System for CiteSeerX. In: Second International Conference on Knowledge and Systems Engineering (KSE), pp. 203–208, IEEE, (2010).

    Google Scholar 

  4. Wilson, H., Boots, B., Millward, A. A.: A Comparison of Hierarchical and Partitional Clustering Techniques for Multispectral Image Classification. vol.3, pp. 1624-1626, (2002).

    Google Scholar 

  5. Xu, Y.: Hybrid clustering with application to web mining. In: Proceedings of the International Conference on Active Media Technology (AMT 2005), pp. 574–578,IEEE, (2005).

    Google Scholar 

  6. Aliguliyev, R. M.: Clustering of Document Collection- A Weighted Approach. In: Expert Systems with Applications,vol. 36, issue 4, pp. 7904–7916,Elsevier, (2009).

    Google Scholar 

  7. Boley, D.: Principal Direction Divisive Partitioning. In: Data Mining and Knowledge Discovery, vol. 2, issue. 4, pp. 325 – 344, ACM, (1998).

    Google Scholar 

  8. Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F., Liu, K. A.: Fast Divisive Clustering Algorithm Using an Improved Discrete Particle Swarm Optimizer. In:Pattern Recognition Letters, vol. 31, issue. 11, pp. 1216-1225,Elsevier, (2010).

    Google Scholar 

  9. Rana, S., Jasola,S., Kumar,R.: A Hybrid Sequential Approach for Data Clustering using K-means and Particle Swarm Optimization Algorithm. In: International Journal of Engineering, Science and Technology,vol. 2, No. 6, pp. 167-176, (2010).

    Google Scholar 

  10. Bache, K., Lichman, M.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science,(2013).

  11. Yang,X. S.: Nature-inspired Metaheuristic Algorithms, 2nd ed., Luniver press, United Kingdom, (2011).

    Google Scholar 

  12. Horng, M. H., Jiang, T. W.: Multilevel Image Thresholding Selection based on theFirefly Algorithm. In: 7th International Conference on Ubiquitous Intelligence & Computing and 7th International Conference on Autonomic & Trusted Computing (UIC/ATC), pp. 58 – 63, IEEE, (2010).

    Google Scholar 

  13. Senthilnath, J., Omkar, S. N., Mani, V.: Clustering Using Firefly Algorithm: Performance Study. In: Swarm and Evolutionary Computation, vol. 1, issue. 3, pp. 164-171, Elsevier, (2011).

    Google Scholar 

  14. Hassanzadeh, T., Meybodi, M. R.:A New Hybrid Approach for Data Clustering Using Firefly Algorithm and K-means. In: 16thIEEECSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 007 – 011, (2012).

    Google Scholar 

  15. Abshouri, A. A., Bakhtiary,A.: A New Clustering Method Based on Firefly and KHM. In: Journal of Communication and Computer, vol. 9, pp. 387-391, (2012).

    Google Scholar 

  16. Xu, G., Zhang,Y., Li, L.: Web mining and social networking, Techniques and application, New York, Springer, (2011).

    Google Scholar 

  17. Manning, C. D., Raghavan,P., Schütze,H.: Introduction to Information Retrieval, 1 ed., Cambridge University Press, (2008).

    Google Scholar 

  18. Lewis,D.: The reuters-21578 text categorizationtest collection, 1999.[Online].Available:http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html.

  19. Murugesan, K, Zhang,J.: Hybrid Bisect K-means Clustering Algorthim. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216 – 219, IEEE, (2011).

    Google Scholar 

  20. Meghabghab, G., Kandel, A.: Search Engines,Link Analysis,and User’s Web Behaviour, Berlin Heidelberg: Springer-Verlag, (2008).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Athraa Jasim Mohammed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Singapore

About this paper

Cite this paper

Mohammed, A.J., Yusof, Y., Husni, H. (2014). Weight-Based Firefly Algorithm for Document Clustering. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-4585-18-7_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-4585-17-0

  • Online ISBN: 978-981-4585-18-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics