Skip to main content
Log in

Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Risk analysis is one of the most essential business activities because it discovers unknown risks such as financial risk, recovery risk, investment risk, operational risk, credit risk, debit risk, and so on. Clustering is a data mining technique that uses data behavior and nature to discover unexpected risks in business data. In a big data setup, clustering algorithms encounter execution time and cluster quality-related challenges due to the primary attribute of big data. This study suggests a Stratified Systematic Sampling Extension (SSE) approach for risk analysis in big data mining using a single machine execution by clustering methodology. Sampling is a data reduction technique that saves computation time and improves cluster quality, scalability, and speed of the clustering algorithm. The proposed sampling plan first formulates the stratum by selecting the minimum variance dimension and then selects samples from each stratum using random linear systematic sampling. The clustering algorithm produces robust clusters in terms of risk and non-risk group with the help of sample data and extends the sample-based clustering results to final clustering results utilizing Euclidean distance. The performance of the SSE-based clustering algorithm has been compared to existing K-means and K-means ++ algorithms using Davies Bouldin score, Silhouette coefficient, Scattering Density between clusters Validity, Scattering Distance Validity and CPU time validation metrics on financial risk datasets. The experimental results demonstrate that the SSE-based clustering algorithm achieved better clustering objectives in terms of cluster compaction, separation, density, and variance while minimizing iterations, distance computation, data comparison, and computational time. The statistical analysis reveals that the proposed sampling plan attained statistical significance by employing the Friedman test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Funding

This study received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamlesh Kumar Pandey.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, K.K., Shukla, D. Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data. Int J Syst Assur Eng Manag 13, 1239–1253 (2022). https://doi.org/10.1007/s13198-021-01424-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01424-0

Keywords

Navigation