Interrater reliability and agreement of performance ratings: A methodological comparison

Fleenor, John W.; Fleenor, Julie B.; Grossnickle, William F.

doi:10.1007/BF02249609

Interrater reliability and agreement of performance ratings: A methodological comparison

Published: March 1996

Volume 10, pages 367–380, (1996)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

John W. Fleenor¹,
Julie B. Fleenor² &
William F. Grossnickle³

479 Accesses
26 Citations
Explore all metrics

Abstract

This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interrater reliability estimators tested against true interrater reliabilities

Article Open access 29 August 2022

Rater and Item Measurement Reports

Incorporating Groupwork into Performance Assessments: Psychometric Issues

References

Berry, K., & Mielke, P. (1988). A generalization of Cohen's kappa agreement measure to interval measurement and multiple raters.Educational and Psychological Measurement, 48, 921–933.
Google Scholar
Berry, K., & Mielke, P. (1990). A generalized agreement measure.Educational and Psychological Measurement, 50, 123–125.
Google Scholar
Campion, M., & Pursell, E. (1981).Plymouth Fiber Extraboard Validation Report. New Bern, NC: Weyerhaeuser Co.
Google Scholar
Campion, M., Pursell, E., & Brown, B. (1988). Structured interviewing: Raising the psychometric properties of the employment interview.Personnel Psychology, 41, 25–42.
Google Scholar
Cronbach, L. Gleser, G., Nanda, H., & Rajaratnam, N. (1972).The dependability of behavioral measurements. New York: Wiley.
Google Scholar
Ghiselli, E., Campbell, J., & Zedeck, S. (1981).Measurement theory for the behavioral sciences. San Francisco: W. H. Freeman.
Google Scholar
Guilford, J., & Fruchter, B. (1978).Fundamental statistics in psychology and education. New York: McGraw Hill.
Google Scholar
James, L., Demaree, R., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias.Journal of Applied Psychology, 69, 322–327.
Google Scholar
Hayes, W. (1988).Statistics. Fort Worth, TX: Holt, Rinehart and Winston.
Google Scholar
Kozlowski, S., & Hattrup, K. (1992). A disagreement about within group agreement: Disentangling issues of consistency versus consensus.Journal of Applied Psychology, 77, 161–167.
Google Scholar
Lawlis, F., & Lu, E. (1972). Judgment of counseling process: Reliability, agreement, and error.Psychological Bulletin, 78, 17–20.
Google Scholar
Rothstein, H. (1990). Interrater reliability of job performance ratings: Growth to asymptote level with increasing opportunity to observe.Journal of Applied Psychology, 75, 85–98.
Google Scholar
Saal, F, Downey, R., & Lahey, M. (1980). Rating the ratings: Assessing the psychometric quality of rating data.Psychological Bulletin, 88, 413–428.
Google Scholar
SAS. (1990).SAS/STAT user's guide. Vol. 2. Cary, NC: SAS Institute.
Google Scholar
Schneider, B., & Schmitt, N. (1986).Staffing organizations. Glen View, Il: Scott, Foresman.
Google Scholar
Shrout, P., & Fleiss, J. (1979). Intraclass correlations: Uses in assessing rater reliability.Psychological Bulletin, 86, 420–428.
Google Scholar
Tinsley, H., & Weiss, D. (1975). Interrater reliability and agreement of subjective judgments.Journal of Counseling Psychology, 22, 358–376.
Google Scholar
Tornow, W. (1993). Perceptions or reality: Is multi-perspective measurement a means or an end?Human Resource Management, 32, 221–229.
Google Scholar
Winer, B. (1971).Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.
Google Scholar

Download references

Author information

Authors and Affiliations

Product Development Research, Center for Creative Leadership, One Leadership Place, 27410, Greensboro, NC
John W. Fleenor
Learning Services Corp., USA
Julie B. Fleenor
East Carolina University, USA
William F. Grossnickle

Authors

John W. Fleenor
View author publications
You can also search for this author in PubMed Google Scholar
Julie B. Fleenor
View author publications
You can also search for this author in PubMed Google Scholar
William F. Grossnickle
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fleenor, J.W., Fleenor, J.B. & Grossnickle, W.F. Interrater reliability and agreement of performance ratings: A methodological comparison. J Bus Psychol 10, 367–380 (1996). https://doi.org/10.1007/BF02249609

Download citation

Issue Date: March 1996
DOI: https://doi.org/10.1007/BF02249609

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interrater reliability and agreement of performance ratings: A methodological comparison

Abstract

Access this article

Similar content being viewed by others

Interrater reliability estimators tested against true interrater reliabilities

Rater and Item Measurement Reports

Incorporating Groupwork into Performance Assessments: Psychometric Issues

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interrater reliability and agreement of performance ratings: A methodological comparison

Abstract

Access this article

Similar content being viewed by others

Interrater reliability estimators tested against true interrater reliabilities

Rater and Item Measurement Reports

Incorporating Groupwork into Performance Assessments: Psychometric Issues

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation