A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study

Arfaee, Mohammad; Bahari, Arman; Khalilzadeh, Mohammad

doi:10.1007/s10639-021-10699-6

A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study

Published: 13 August 2021

Volume 27, pages 2209–2239, (2022)
Cite this article

Education and Information Technologies Aims and scope Submit manuscript

445 Accesses
5 Citations
Explore all metrics

Abstract

Human resources training is considered an effective solution in empowering human resources. Organizations try to have effective educational planning for this precious resource by identifying shortcomings through a need assessment. This study provides a model based on organizational data analysis to achieve a unique and appropriate training planning for each staff. Therefore, job performance, organizational promotion and lay-off have become the basis for staff training planning. For this purpose, the tax assessor’s information was investigated. Then, the CRISP-DM methodology was selected, and the project was implemented. Furthermore, a decision tree model was selected to extract unknown rules and patterns in the educational decision-making staff; the neural network model was selected as the predictive model to predict the target variables. The results revealed the decision tree for predicting job performance variables and organizational promotion status, and the neural network model was more effective in predicting service lay-off variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Predicting Skilled Employees’ Using Machine Learning Techniques

Research on the Structural Optimization of the Data Mining-Based Enterprise Human Resource Management

Article 11 June 2022

Mining High Performance Managers Based on the Results of Psychological Tests

Notes

Using MATLAB software.
This variable has two classes which indicate whether it is upgraded (1) or not (2).
The confidence index of each rule to its initial probability.
Recall (sensitivity) represents the percentage of all predictions categorized by the model correctly (Lui et al., 2021).
Precision (positive predictive value) means the percentage of relevant model predictions (Lui et al., 2021).

References

Abri Aghdam, K., Aghajani, A., Kanani, F., Sanjari, M. S., Chaibakhsh, S., Shirvaniyan, F., Moosavi, D., & Moghaddasi, M. (2021). A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders, 47, 102658. https://doi.org/10.1016/j.msard.2020.102658
Article Google Scholar
Abtahi, S. H. (2004). Training and upgrading human capital. Poyandeh Publications. (In Persian).
Google Scholar
Akhavan, M., & Kazemi Gorji, A. (2019). The impact of training on productivity and human resources to investigate the role of intermediary organizational agility and intellectual capital (the case of the eighth base Babai martyr of prey). Journal of Training in Police Sciences, 26(26), 25–54. (In Persian).
Google Scholar
Alipour, K., Prdsry, I. G., & Zolfaghari Zafarani, R. (2019). Provide a model to improve the efficiency of human resource training in Islamic Azad University. Journal of New Approaches in Educational Administration, 10(38), 179–208. (In Persian).
Google Scholar
Aparnak, A., Ghasemi, P. (2016). Measuring the performance of bank employees with a multi-criteria decision approach. In: 2nd International Conference on Modern Research in Management and Industrial Engineering.
Ashraf, M., Zaman, M., & Ahmed, M. (2020). an intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167, 1471–1483. https://doi.org/10.1016/j.procs.2020.03.358
Article Google Scholar
Asif, R., Merceron, A., Ali, S. A., & Ghani Haider, N. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194. https://doi.org/10.1016/j.compedu.2017.05.007
Article Google Scholar
Burgos, C., Campanario, M. L., Peña, D. D., Lara, J. A., Lizcano, D., & Martínez, M. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers & Electrical Engineering, 66, 541–556. https://doi.org/10.1016/j.compeleceng.2017.03.005
Article Google Scholar
Carnevale, J. B., & Hatak, I. (2020). Employee Adjustment and well-being in the Era of COVID-19: Implications for human resource management. Journal of Business Research, 116, 183–187. https://doi.org/10.1016/j.jbusres.2020.05.037
Article Google Scholar
Costa, E. B., Fonseca, B., Almeida Santana, M., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/j.chb.2017.01.047
Article Google Scholar
Chung, J., Ko, N., Kim, H., & Yoon, J. (2021). Inventor profile mining approach for prospective human resource scouting. Journal of Informatics, 15(1), 101–103. https://doi.org/10.1016/j.joi.2020.101103
Article Google Scholar
Entezari , M.S. (2015). The role of education on labor productivity and quality management in education and business excellence model. In: 2nd International Conference on New Research in Management, Economics and Accounting. https://civilica.com/doc/439929. (In Persian)
Hand, D. J., Christen, P., & Kirielle, N. F. (2021). An interpretable transformation of the F-measure. Machine Learning, 110, 451–456. https://doi.org/10.1007/s10994-021-05964-1
Article MathSciNet MATH Google Scholar
Hatami, J. (2016). The challenge of teaching humanities in Iranian universities: A qualitative study. Journal of Research in Education Systems, 10(32), 234–273. (In Persian).
Google Scholar
Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., Murray, D. J., & Long, Q. (2018). Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems, 161, 134–146. https://doi.org/10.1016/j.knosys.2018.07.042
Article Google Scholar
Huber, S., Wiemer, H., Schneider, D., & Ihlenfeldt, S. (2019). DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model. Procedia CIRP, 79, 403–408. https://doi.org/10.1016/j.procir.2019.02.106
Article Google Scholar
Imani, F., Aghabakhshi, H., & Ghaedi Mohammadi, M. J. (2013). The effect of short-term in-service training courses on the performance of municipal employees in Tehran’s District 7 in 1989. Case study. Social Research, 5(17), 29–46. (In Persian).
Google Scholar
Liu, P., Qingqing, W., & Liu, W. (2021). Enterprise human resource management platform based on FPGA and data mining. Microprocessors and Microsystems, 80, 103330. https://doi.org/10.1016/j.micpro.2020.103330
Article Google Scholar
Liu, S., Jiang, H., Wu, Z., & Li, X. (2022). Data synthesis using deep feature enhanced generative adversarial networks for rolling bearing imbalanced fault diagnosis. Mechanical Systems and Signal Processing, 163, 108139. https://doi.org/10.1016/j.ymssp.2021.108139
Article Google Scholar
Mills, K. E., Weary, D. M., & von Keyserlingk, M. A. G. (2021). Graduate student literature review: Challenges and opportunities for human resource management on dairy farms. Journal of Dairy Science, 104(1), 1192–1202. https://doi.org/10.3168/jds.2020-18455
Article Google Scholar
Molaei, N., Goldar, Z., & Emdadifar, O. (2010). Investigating the relationship between in-service training and dimensions of human resource empowerment in staff and operations managers of Shazand Arak oil refinery. Human Resource Management in the Oil Industry, 1(4), 101–126.
Google Scholar
Rahmani, K., Daryadel, A. (2017). Modeling the qualification of human resources in organizations with the approach of neural networks. In: The Second International Conference on Management and Accounting, Tehran. https://civilica.com/doc/642989. (In Persian)
Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & Education, 143, 103–676. https://doi.org/10.1016/j.compedu.2019.103676
Article Google Scholar
Zhou, H. F., Zhang, J. W., Zhou, Y. Q., Guo, X. J., & Ma, Y. (2021). A feature selection algorithm of decision tree based on feature weight. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.113842
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of Iranian National Tax Administration.

Author information

Authors and Affiliations

Department of Industrial Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Mohammad Arfaee
Department of Industrial Engineering, Faculty of Industry and Mining, University of Sistan and Baluchestan, Zahedan, Iran
Arman Bahari
CENTRUM Católica Graduate Business School, Pontificia Universidad Católica del Perú, Lima, Peru
Mohammad Khalilzadeh

Authors

Mohammad Arfaee
View author publications
You can also search for this author in PubMed Google Scholar
Arman Bahari
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Khalilzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MA wrote the literature review, gathered and analyzed the data. AB translated the manuscript. MK edited the manuscript.

Corresponding author

Correspondence to Mohammad Khalilzadeh.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Appendix 1 Data preparation

In the data processing and preparation stage, according to the data mining standard, data cleansing operations were performed in order to improve the quality of the data set. At this stage, Excel and Modelling software were used to check the logic of the data, inconsistencies, outlier and lost data, and related actions were performed as follows.

1.
The personnel ID number field was excluded from the number of project variables because of not performing any computational operations on the personnel ID number field. This field was only for the purpose of linking among other datasets and has no role in data modeling as an input variable.
2.
Since the data of the personnel number field is numerical and data mining software is also known as numerical, the storage format of this field was changed from numeric to string and. The decimal numbers obtained from entering the software were also removed and modified.
3.
Considering the personal and job information databases received from the human resources department were multiple, the educational information databases were transactional and had to be aggregated and integrated with each other in the preparation of the data. Thus, sequential unification was performed for the training data set and the personnel number field was also considered as a key field.
4.
The aggregation of personal, professional, and Training data sets was done in the first step. In addition, the final set was obtained in the form of a record dataset with 473 records and 120 initial fields.
5.
The records of the "Tax assessor”, as the target population of this study, were separated from other records and the new and integrated dataset reduced to 370 records.
6.
The information of functional data set for the tax assessors including that of the annual performance evaluation score and the number of offers presented in the suggestion system from 90 to 95 was added to the software. The integration operation was also performed with the integrated dataset of the first stage.
7.
In order to predict the amount of human resource promotion, a field called promotion score was prepared based on the information in the human resources dataset. According to the number of promotions of each person during 2012–2017, the individual promotion score was determined. It should be noted that the maximum number of promotions for tax staff is 4 degrees from the audit assistant to senior auditor.
8.
The datasets of the tax assessment tests of the tax assessors of the General administration of large taxpayers were entered into the software and linked to the dataset accumulated from the previous steps with the key field of personnel ID number.
9.
After entering and aggregating all data sets, the number of fields for 370 records of assessors’ staff increased to 143 fields.
10.
The first step of reducing variables (dimensions) was done through removing 107 unrelated and unnecessary fields based on the organization experts’ opinions. Thus, 36 main fields were remained. According to the experts’ opinion, the deleted fields were deemed unnecessary in solving the problems raised for this research.

After data aggregation and integration operations, the proceedings related to increasing data quality were performed as follows:

1.
Converting the format of the age variable (field) from a string to a quantity (integer) in order to perform the necessary calculations.
2.
Performing indexing to increase the quality of modeling and its upgrading through constructing the indicators of average score of appraisal performance during the period under review" and "score of the system of proposals in the period under review.
3.
Average scores of performance appraisal and suggestion system are related to the scores of these two variables during 2012–2017 were introduced as two new variables as the replacement for the previous variables of performance appraisal and suggestion system.
4.
The variable of the date of the course was converted into the minimum and maximum date of participation in the training courses. Then, two important educational indicators were created. It means the time interval between the first and the last training course in terms of year for each person and the time interval of the last training course have been passed so far.
5.
Date variable of holding training courses was removed from the list of primary variables after becoming two key indicators
6.
Considering the number of people in the diploma and associate’s degrees, a total of 19 people was reached. These two levels were combined into one degree, i.e. “Associate" in order to improve the quality of this variable.
7.
According to the various data of the variable field of study (various fields of study) and the frequency of different trends, classification was done in terms of economic, accounting, management, and other sets.
8.
Due to the existence of only one record with the job title of "audit assistant ", this record was combined with the job title of "tax Auditor” to improve the quality of modeling. In addition, the position of representative of the organization in tax dispute resolution board remained unintegrated due to its importance despite being small (Appendix 1: Table 9).
9.
Given that 64 records with less than 1 year of service were recorded in the human resources dataset, this number was changed to one year in the data preparation process.
10.
Format of the record of service field was converted from a string to a small number (integer number), and coding and the conditioning were applied to replace years less than one year to one year (high threshold).
11.
Moreover, given the fact that the variables "years" and "age" did not have a normal distribution and this could affect the data mining algorithms in the construction of the model based on Appendix 1: Table 10, they were converted from continuous to class-sequential format.
12.
Regarding outlier data, the specified intervals for outlier data in the variables with normal distribution 3 σ to 5 σ and the quadratic range 2–3 for the variables with abnormal distribution, were considered. In this evaluation, two values were detected for the outlier data, which were replaced by threshold values.
13.
After reviewing the outliers, the lost data was reviewed. Regarding the use of office automation system about human resources in the last 5 years, the process of storing human resources information has been such that the lost data were not significantly observed. In addition, in the information system of the education department, the information of previous years could not be retrieved due to the launch of the educational information system in the last year. For example, given that 32% of variable records such as "training course scores" were empty of the staff training file database and it was not possible to retrieve and count them, we could not comment on them clearly (for example, the maximum score and score of each lesson was not clear) because of the lack of knowledge about the interval between the scores. Thus, the use of this variable as well as other useful variables which could be a criterion for evaluation courses was omitted.

Table 9 "Organizational post title" Variable in the project data set

Full size table

Table 10 Leveling of “organizational service record” variable

Full size table

Appendix 2

See Table 11.

Table 11 Proposed algorithm (C5.0) for target variables

Full size table

Appendix 3

See Table 12.

Table 12 Performance of target variables under default and advanced settings mode

Full size table

Appendix 4

See Fig. 1.

Appendix 5

See Table 13.

Table 13 Ranking of influential variables in the formation of the selected predictor tree

Full size table

Appendix 6

See Table 14.

Table 14 Extracted important rules for significant classes of target variables

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arfaee, M., Bahari, A. & Khalilzadeh, M. A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study. Educ Inf Technol 27, 2209–2239 (2022). https://doi.org/10.1007/s10639-021-10699-6

Download citation

Received: 01 May 2021
Accepted: 30 July 2021
Published: 13 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10639-021-10699-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study

Abstract

Access this article

Similar content being viewed by others

A Study on Predicting Skilled Employees’ Using Machine Learning Techniques

Research on the Structural Optimization of the Data Mining-Based Enterprise Human Resource Management

Mining High Performance Managers Based on the Results of Psychological Tests

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Appendices

Appendix

Appendix 1

Data preparation

Appendix 2

Appendix 3

Appendix 4

Appendix 5

Appendix 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study

Abstract

Access this article

Similar content being viewed by others

A Study on Predicting Skilled Employees’ Using Machine Learning Techniques

Research on the Structural Optimization of the Data Mining-Based Enterprise Human Resource Management

Mining High Performance Managers Based on the Results of Psychological Tests

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Appendices

Appendix

Appendix 1

Data preparation

Appendix 2

Appendix 3

Appendix 4

Appendix 5

Appendix 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation