Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Abdelnaeem, Ahmed Omar; Rehan Youssef, Aliaa; Mahmoud, Nesreen Fawzy; Fayaz, Nadia Abdalazeem; Vining, Robert

doi:10.1007/s00586-020-06712-0

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Review Article
Published: 20 January 2021

Volume 30, pages 957–989, (2021)
Cite this article

European Spine Journal Aims and scope Submit manuscript

1353 Accesses
12 Altmetric
Explore all metrics

Abstract

Objectives

To identify and critically appraise studies evaluating psychometric properties of functionally oriented diagnostic classification systems for Non-Specific Chronic Low Back Pain (NS-CLBP).

Methods

This review employed methodology consistent with PRISMA guidelines. Electronic databases and journals: (PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice) were searched from inception until January 2020. Included studies evaluated the validity and reliability of NS-CLBP diagnostic classification systems in adults. Risk of bias was assessed using a Critical Appraisal Tool.

Results

Twenty-two studies were eligible: Five investigated inter-rater reliability, and 17 studies analyzed validity of O’Sullivan’s classification system (OCS, n = 15), motor control impairment (MCI) test battery (n = 1), and Pain Behavior Assessment (PBA, n = 1). Evidence from multiple low risk of bias studies demonstrates that OCS has moderate to excellent inter-rater reliability (kappa > 0.4). Also, two low risk of bias studies support of OCS-MCI subcategory. Three tests within the MCI test battery show acceptable inter- and intra-rater reliability for clinical use (the "sitting knee extension," the “one leg stance,” and the “pelvic tilt” tests). Evidence for the reliability and validity of the PBA is limited to one high bias risk study.

Conclusions

Multiple low risk of bias studies demonstrate strong inter-rater reliability for OCS classification specifically OCS-MCI subcategory. Future studies with low risk of bias are needed to evaluate reliability and validity of the MCI test battery and the PBA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a standard set of outcome measures for non-specific low back pain in Dutch primary care physiotherapy practices: a Delphi study

Article Open access 19 April 2019

Improving Rehabilitation Research to Optimize Care and Outcomes for People with Chronic Primary Low Back Pain: Methodological and Reporting Recommendations from a WHO Systematic Review Series

Article Open access 22 November 2023

Validity of the French version of the Core Outcome Measures Index for low back pain patients: a prospective cohort study

Article 15 July 2014

References

Hartvigsen J, Hancock MJ, Kongsted A et al (2018) What low back pain is and why we need to pay attention. Lancet 391:2356–2367. https://doi.org/10.1016/S0140-6736(18)30480-X
Article PubMed Google Scholar
Woolf AD, Pfleger B (2003) Burden of major musculoskeletal conditions. Bull World Health Organ 81:646–656
PubMed PubMed Central Google Scholar
Buchbinder R, van Tulder M, Öberg B et al (2018) Low back pain: a call for action. Lancet 391:2384–2388. https://doi.org/10.1016/S0140-6736(18)30488-4
Article PubMed Google Scholar
da C Menezes Costa L, Maher CG, Hancock MJ, et al (2012) The prognosis of acute and persistent low-back pain: a meta-analysis. CMAJ 184:E613–E624. https://doi.org/10.1503/cmaj.111271
Article Google Scholar
Burton AK, McClune TD, Clarke RD, Main CJ (2004) Long-term follow-up of patients with low back pain attending for manipulative care: outcomes and predictors. Man Ther 9:30–35. https://doi.org/10.1016/s1356-689x(03)00052-3
Article PubMed Google Scholar
Wáng YXJ, Wu A-M, Ruiz Santiago F, Nogueira-Barbosa MH (2018) Informed appropriate imaging for low back pain management: a narrative review. J Orthop Transl 15:21–34. https://doi.org/10.1016/j.jot.2018.07.009
Article Google Scholar
Hancock MJ, Maher CG, Latimer J et al (2007) Systematic review of tests to identify the disc, SIJ or facet joint as the source of low back pain. Eur Spine J 16:1539–1550. https://doi.org/10.1007/s00586-007-0391-1
Article CAS PubMed PubMed Central Google Scholar
Maher C, Underwood M, Buchbinder R (2017) Non-specific low back pain. Lancet 389:736–747. https://doi.org/10.1016/S0140-6736(16)30970-9
Article PubMed Google Scholar
Balagué F, Mannion AF, Pellisé F, Cedraschi C (2012) Non-specific low back pain. Lancet 379:482–491. https://doi.org/10.1016/S0140-6736(11)60610-7
Article PubMed Google Scholar
Vining RD, Minkalis AL, Shannon ZK, Twist EJ (2019) Development of an evidence-based practical diagnostic checklist and corresponding clinical exam for low back pain. J Manipulative Physiol Ther 42:665–676. https://doi.org/10.1016/j.jmpt.2019.08.003
Article PubMed Google Scholar
Patel S, Psychol C, Friede T et al (2012) Systematic review of randomized controlled trials of clinical prediction rules for physical therapy in low back pain. Spine. https://doi.org/10.1097/BRS.0b013e31827b158f
Article PubMed PubMed Central Google Scholar
Amundsen PA, Evans DW, Rajendran D et al (2018) Inclusion and exclusion criteria used in non-specific low back pain trials: a review of randomised controlled trials published between 2006 and 2012. BMC Musculoskelet Disord 19:113. https://doi.org/10.1186/s12891-018-2034-6
Article PubMed PubMed Central Google Scholar
Foster NE, Hill JC, Hay EM (2011) Subgrouping patients with low back pain in primary care: are we getting any better at it? Man Ther 16:3–8. https://doi.org/10.1016/j.math.2010.05.013
Article PubMed Google Scholar
Petersen T, Laslett M, Thorsen H et al (2003) Diagnostic classification of non-specific low back pain. A new system integrating patho-anatomic and clinical categories. Physiother Theory Pract 19:213–237. https://doi.org/10.1080/09593980390246760
Article Google Scholar
Vining R, Potocki E, Seidman M, Morgenthal P (2013) An evidence-based diagnostic classification system for low back pain. J Can Chiropr Assoc 57:189–204
PubMed PubMed Central Google Scholar
Spitzer WO, LeBlanc FE, Dupuis M, Abenhaim L, Belanger AY, Bloch R, Bombardier C, Cruess RL, Drouin G, Duval-Hesler N, Laflamme J, Lamoureux G, Nachemson A, Page JJ, Rossignol M, Salmi LR, Salois-Arsenault S, Suissa SW-DS (1987) Scientific approach to the assessment and management of activity-related spinal disorders. A monograph for clinicians. Report of the Quebec Task Force on Spinal Disorders. Spine 12:S1-59
Article Google Scholar
Alrwaily M, Timko M, Schneider M et al (2016) Treatment-based classification system for low back pain: revision and update. Phys Ther 96:1057–1066. https://doi.org/10.2522/ptj.20150345
Article PubMed Google Scholar
Cosio D, Lin E (2018) Role of active versus passive complementary and integrative health approaches in pain management. Glob Adv Heal Med 7:216495611876849. https://doi.org/10.1177/2164956118768492
Article Google Scholar
Alhowimel A, AlOtaibi M, Radford K, Coulson N (2018) Psychosocial factors associated with change in pain and disability outcomes in chronic low back pain patients treated by physiotherapist: a systematic review. SAGE Open Med. https://doi.org/10.1177/2050312118757387
Article PubMed PubMed Central Google Scholar
Booth A, Clarke M, Dooley G et al (2012) The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev 1:2. https://doi.org/10.1186/2046-4053-1-2
Article PubMed PubMed Central Google Scholar
Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6:e1000097. https://doi.org/10.1371/journal.pmed.1000097
Article PubMed PubMed Central Google Scholar
Brink Y, Louw QA (2012) Clinical instruments: reliability and validity critical appraisal. J Eval Clin Pract 18:1126–1132. https://doi.org/10.1111/j.1365-2753.2011.01707.x
Article PubMed Google Scholar
May S, Littlewood C, Bishop A (2006) Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Aust J Physiother 52:91–102. https://doi.org/10.1016/S0004-9514(06)70044-7
Article PubMed Google Scholar
May S, Chance-Larsen K, Littlewood C et al (2010) Reliability of physical examination tests used in the assessment of patients with shoulder problems: a systematic review. Physiotherapy 96:179–190
Article PubMed Google Scholar
Barrett E, McCreesh K, Lewis J (2014) Reliability and validity of non-radiographic methods of thoracic kyphosis measurement: a systematic review. Man Ther 19:10–17. https://doi.org/10.1016/j.math.2013.09.003
Article PubMed Google Scholar
Vibe Fersum K, O’Sullivan PB, Kvale A, Skouen JS (2009) Inter-examiner reliability of a classification system for patients with non-specific low back pain. Man Ther 14:555–561. https://doi.org/10.1016/j.math.2008.08.003
Article CAS PubMed Google Scholar
Luomajoki H, Kool J (2007) Reliability of movement control tests in the lumbar spine. BMC Musculoskelet Disord 8:90. https://doi.org/10.1186/1471-2474-8-90
Article PubMed PubMed Central Google Scholar
Dankaerts W, O’Sullivan PB, Straker LM et al (2006) The inter-examiner reliability of a classification method for non-specific chronic low back pain patients with motor control impairment. Man Ther 11:28–39. https://doi.org/10.1016/j.math.2005.02.001
Article CAS PubMed Google Scholar
Enoch F, Kjaer P, Elkjaer A et al (2011) Inter-examiner reproducibility of tests for lumbar motor control. BMC Musculoskelet Disord 12:114. https://doi.org/10.1186/1471-2474-12-114
Article PubMed PubMed Central Google Scholar
O’Sullivan PB, Mitchell T, Bulich P et al (2006) The relationship beween posture and back muscle endurance in industrial workers with flexion-related low back pain. Man Ther 11:264–271. https://doi.org/10.1016/j.math.2005.04.004
Article PubMed Google Scholar
O’Sullivan K, Verschueren S, Van Hoof W et al (2013) Lumbar repositioning error in sitting: healthy controls versus people with sitting-related non-specific chronic low back pain (flexion pattern). Man Ther 18:526–532. https://doi.org/10.1016/j.math.2013.05.005
Article PubMed Google Scholar
O’Sullivan PB, Beales DJ, Beetham JA et al (2002) Altered motor control strategies in subjects with sacroiliac joint pain during the active straight-leg-raise test. Spine 27:E1-8. https://doi.org/10.1097/00007632-200201010-00015
Article PubMed Google Scholar
O’Sullivan PB, Burnett A, Floyd AN et al (2003) Lumbar repositioning deficit in a specific low back pain population. Spine 28:1074–1079. https://doi.org/10.1097/01.BRS.0000061990.56113.6F
Article PubMed Google Scholar
Hungerford B, Gilleard W, Hodges P (2003) Evidence of altered lumbopelvic muscle recruitment in the presence of sacroiliac joint pain. Spine 28:1593–1600. https://doi.org/10.1097/00007632-200307150-00022
Article PubMed Google Scholar
Burnett A, Cornelius M, Dankaerts W, O’Sullivan P (2004) Spinal kinematics and trunk muscle activity in cyclists: a comparison between healthy controls and non-specific chronic low back pain subjects—a pilot investigation. Man Ther 9:211–219. https://doi.org/10.1016/j.math.2004.06.002
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Differences in sitting postures are associated with nonspecific chronic low back pain disorders when patients are subclassified. Spine 31:698–704. https://doi.org/10.1097/01.brs.0000202532.76925.d2
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Altered patterns of superficial trunk muscle activation during sitting in nonspecific chronic low back pain patients: importance of subclassification. Spine 31:2017–2023. https://doi.org/10.1097/01.brs.0000228728.11076.82
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A et al (2009) Discriminating healthy controls and two clinical subgroups of nonspecific chronic low back pain patients using trunk muscle activation and lumbosacral kinematics of postures and movements: a statistical classification model. Spine 34:1610–1618. https://doi.org/10.1097/BRS.0b013e3181aa6175
Article PubMed Google Scholar
Beales DJ, Ther MM, O’Sullivan PB, Briffa NK (2009) Motor control patterns during an active straight leg raise in chronic pelvic girdle pain subjects. Spine 34:861–870. https://doi.org/10.1097/BRS.0b013e318198d212
Article PubMed Google Scholar
Sheeran L, Sparkes V, Caterson B et al (2012) Spinal position sense and trunk muscle activity during sitting and standing in nonspecific chronic low back pain: classification analysis. Spine 37:E486–E495. https://doi.org/10.1097/BRS.0b013e31823b00ce
Article PubMed Google Scholar
Van Hoof W, Volkaerts K, O’Sullivan K et al (2012) Comparing lower lumbar kinematics in cyclists with low back pain (flexion pattern) versus asymptomatic controls—field study using a wireless posture monitoring system. Man Ther 17:312–317. https://doi.org/10.1016/j.math.2012.02.012
Article PubMed Google Scholar
Hemming R, Sheeran L, van deursen R, Sparkes V, (2019) Investigating differences in trunk muscle activity in non-specific chronic low back pain subgroups and no-low back pain controls during functional tasks: a case-control study. BMC Musculoskelet Disord 20:459. https://doi.org/10.1186/s12891-019-2843-2
Article PubMed PubMed Central Google Scholar
Hemming R, Sheeran L, van Deursen R, Sparkes V (2017) Non-specific chronic low back pain: differences in spinal kinematics in subgroups during functional tasks. Eur Spine J. https://doi.org/10.1007/s00586-017-5217-1
Article PubMed Google Scholar
Sheeran L, Sparkes V, Whatling G et al (2019) Identifying non-specific low back pain clinical subgroups from sitting and standing repositioning posture tasks using a novel cardiff Dempster–Shafer theory classifier. Clin Biomech. https://doi.org/10.1016/j.clinbiomech.2019.10.004
Article Google Scholar
Biele C, Moller D, von Piekartz H et al (2019) Validity of increasing the number of motor control tests within a test battery for discrimination of low back pain conditions in people attending a physiotherapy clinic: a case–control study. BMJ Open 9:e032340. https://doi.org/10.1136/bmjopen-2019-032340
Article PubMed PubMed Central Google Scholar
Meyer K, Klipstein A, Oesch P et al (2016) Development and validation of a pain behavior assessment in patients with chronic low back pain. J Occup Rehabil 26:103–113. https://doi.org/10.1007/s10926-015-9593-2
Article PubMed Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data data for categorical of observer agreement. Biometrics 33:159–174
Article CAS PubMed Google Scholar
Ford J (2003) A systematic review on methodology of classification system research for low back pain. In: Musculoskeletal physiotherapy Australia 13th biennial conference, Sydney, Australia, 2003
Anderson JA (1977) Problems of classification of low-back pain. Rheumatol Rehabil 16:34–36. https://doi.org/10.1093/rheumatology/16.1.34
Article CAS PubMed Google Scholar
Deyo RA, Haselkorn J, Hoffman R, Kent DL (1994) Designing studies of diagnostic tests for low back pain or radiculopathy. Spine 19:2057S-2065S. https://doi.org/10.1097/00007632-199409151-00007
Article CAS PubMed Google Scholar
Fairbank JCT, Pynsent PB (1992) Syndromes of back pain and their classification. In: The Lumbar spine and back pain. Edinburgh: Churchill Livingstone
Petersen T, Thorsen H, Manniche C, Ekdahl C (1999) Classification of non-specific low back pain: a review of the literature on classifications systems relevant to physiotherapy. Phys Ther Rev 4:265–281. https://doi.org/10.1179/108331999786821690
Article Google Scholar
Ford J, Story I, O’Sullivan P, McMeeken J (2007) Classification systems for low back pain: a review of the methodology for development and validation. Phys Ther Rev 12(33–42):10p
Google Scholar
Woolf CJ, Bennett GJ, Doherty M et al (1998) Towards a mechanism-based classification of pain. Pain 77:227–229
Article PubMed Google Scholar
McCarthy CJ, Arnall FA, Strimpakos N et al (2004) The biopsychosocial classification of non-specific low back pain: a systematic review. Phys Ther Rev 9:17–30. https://doi.org/10.1179/108331904225003955
Article Google Scholar
Fairbank J, Gwilym S, France J, Daffner S (2011) The role of classification of chronic low back pain. Spine 1:36. https://doi.org/10.1097/BRS.0b013e31822ef72c
Article Google Scholar
Salvioli S, Pozzi A, Testa M (2019) Movement control impairment and low back pain: state of the art of diagnostic framing. Medicina (Kaunas). https://doi.org/10.3390/medicina55090548
Article Google Scholar
Carlsson H, Rasmussen-Barr E (2013) Clinical screening tests for assessing movement control in non-specific low-back pain. A systematic review of intra-and inter-observer reliability studies. Man Ther 18:103–110. https://doi.org/10.1016/j.math.2012.08.004
Article PubMed Google Scholar
Murphy SE, Blake C, Power CK, Fullen BM (2016) Comparison of a stratified group intervention (STarT back) with usual group care in patients with low back pain: a nonrandomized controlled trial. Spine 41:645–652. https://doi.org/10.1097/BRS.0000000000001305
Article PubMed Google Scholar
Mjøsund HL, Boyle E, Kjaer P et al (2017) Clinically acceptable agreement between the ViMove wireless motion sensor system and the Vicon motion capture system when measuring lumbar region inclination motion in the sagittal and coronal planes. BMC Musculoskelet Disord 18:124. https://doi.org/10.1186/s12891-017-1489-1
Article PubMed PubMed Central Google Scholar
Gracovetsky S, Newman N, Pawlowsky M et al (1995) A database for estimating normal spinal motion derived from noninvasive measurements. Spine 20:1036–1046. https://doi.org/10.1097/00007632-199505000-00010
Article CAS PubMed Google Scholar
Mannion AF, Knecht K, Balaban G et al (2004) A new skin-surface device for measuring the curvature and global and segmental ranges of motion of the spine: reliability of measurements and comparison with data reviewed from the literature. Eur Spine J 13:122–136. https://doi.org/10.1007/s00586-003-0618-8
Article PubMed Google Scholar
Öztuna D, Elhan AH, Tüccar E (2006) Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish J Med Sci 36:171–176
Google Scholar
Thode HC (2002) Statistics: textbooks and monographs 164 Testing for normality. CRC Press, New York, NY
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Physical Therapy, Cairo University, Cairo, Egypt
Ahmed Omar Abdelnaeem, Aliaa Rehan Youssef, Nesreen Fawzy Mahmoud & Nadia Abdalazeem Fayaz
Faculty of Physical Therapy, Ahram Canadian University, Giza, Egypt
Aliaa Rehan Youssef
Palmer Center for Chiropractic Research, Palmer College of Chiropractic, Davenport, IA, USA
Robert Vining
Cairo, Egypt
Ahmed Omar Abdelnaeem

Authors

Ahmed Omar Abdelnaeem
View author publications
You can also search for this author in PubMed Google Scholar
Aliaa Rehan Youssef
View author publications
You can also search for this author in PubMed Google Scholar
Nesreen Fawzy Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Abdalazeem Fayaz
View author publications
You can also search for this author in PubMed Google Scholar
Robert Vining
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Omar Abdelnaeem.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Search strategies of the searched databases and journals

Database/journal	Last citation no	Keywords
PubMed	364	(((((Non specific OR non-specific OR nonspecific OR mechanical))) AND ((low back pain OR simple backache OR lumbar strain OR spinal degeneration))) AND ((clinical test OR clinical examination OR clinical sign))) AND ((valid* OR reliabl*)) simple search
EMbase	738	(clinical) AND (test* OR exam* OR sign) AND (non-specific OR nonspecific OR 'non specific' OR mechanical OR simple) AND (low back pain OR back pain OR LBP) AND (reliab OR valid*) in English only and limited to human plus searching in EMbase only
Cochrane	92	(Non specific or non-specific or nonspecific or mechanical) and (low back pain or simple backache or lumbar strain or spinal degeneration) and (clinical test or clinical examination or clinical sign) and (valid* or reliabl*) in search manager choose in Trials, Methods Studies, Technology Assessments and Economic Evaluations (Word variations have been searched)
PEDro	226	Non specific low back pain (abstract and title) in advanced search (method clinical trials)
CINHAL	286	(Non specific OR non-specific OR nonspecific OR mechanical) AND (low back pain OR simple backache OR lumbar strain OR spinal degeneration) AND (clinical test OR clinical examination OR clinical sign) AND (valid* OR reliabl*) in advanced search
ProQuest	358	(("non specific" OR "non-specific" OR "nonspecific" OR "mechanical back pain") AND ("back pain" OR "lumbar strain" OR "simple backache") AND ("clinical test" OR "clinical examination" OR "clinical sign") AND ("valid" OR "reliab")) AND la.exact("ENG")
Physical therapy journal	460	"non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid" "reliab"
Chiroindex	79	"non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid" "reliab"
Australian journal of physiotherapy	54	Non specific in Title/Abs/Keywords OR nonspecific inTitle/Abs/Keywords OR non-specific in Title/Abs/Keywords AND Low Back Pain inTitle/Abs/Keywords OR Mechanical low back pain in Title/Abs/Keywords OR simple backache in Title/Abs/Keywords
Canadian physiotherapy In advanced search	113	Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical tests OR clinical examination OR clinical sign AND valid* OR reliabl*
physiotherapy theory and practice journal	113	Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical test OR clinical examination OR clinical sign AND valid* OR reliabl*

Appendix 2 Systematic review critical appraisal tool (Reproduced from Brink and Louw (2011))

Item 1: If human subjects were used, did the authors give a detailed description of the sample of subjects used to perform the (index) test on?

Why the criterion should be evaluated: The validity and reliability of a test will be affected by the sample characteristics or composition, and therefore, the study has to report on the sample characteristics because the validity and reliability scores will then only be applicable to that particular population. A study does not contribute to validity and reliability testing if the subjects were not recruited appropriately
This item can be scored yes if:
1 the sample characteristics (e.g., height, weight, age, diagnosis and symptom status) were described or the manner of recruiting subjects was stated or if selection criteria were applied
If none of the above have been described or if insufficient information was provided, select “no.” If inhuman or inanimate objects were used, select N/A

Item 2: Did the authors clarify the qualification, or competence of the rater(s) who performed the (index) test?

Why the criterion should be evaluated: The amount of experience of the rater(s), performing the (index) test, will influence the validity and reliability scores and needs to be explained
This item can be scored yes if:
1 the rater(s) characteristics (e.g., qualification, specialization and amount of experience using the instrument under investigation) have been described
If the above have not been described or insufficient information was provided, select “no”

Item 3: Was the reference standard explained?

Why the criterion should be evaluated: The index test scores need to be compared to the scores obtained from the reference standard in order to test validity, and therefore, the reference standard needs to be explained appropriately
This item can be scored yes if:
1 the reference standard is likely to produce correct measurements;
2 the reference standard is the best method available; and
3 details (name of the instrument, references to the accuracy of the instrument) of the reference standard are reported
If none of the above is applicable to the reference standard’s description, then select “no”

Item 4: If inter-rater reliability was tested, were raters blinded to the findings of other raters?

Why the criterion should be evaluated: When raters have access to the findings of other raters, it compromises the quality of the reliability testing procedure by inflating the agreement among the raters, and therefore, blinding needs to be performed
This item can be scored yes if:
1 it is stated that the raters were blinded to each other’s findings or if a description that implies that the raters were blinded was reported
If no information is provided, then select “no.” If intra-rater reliability was examined, then select “N/A”

Item 5: If intra-rater reliability was tested, were raters blinded to their own prior findings of the test under evaluation?

Why the criterion should be evaluated: If raters have knowledge of their prior own findings, it will influence the findings of their repeated measurements and could inflate the rater agreement, and therefore, appropriate measures, depending on the characteristics or the study design of the research study, need to be applied to ensure blinding
This item can be scored yes if:
1 rater(s) has/have examined the same subjects on more than one occasion, it should be stated whether the rater(s) was/were blinded to the subjects they have examined previously
If insufficient information is provided, then select “no.” If inter-rater reliability was examined, then select “N/A”

Item 6: Was the order of examination varied?

Why the criterion should be evaluated: If the order is varied, in which the raters examine the subjects when inter-rater reliability is tested, it reduces the risk of systematic bias. If the order is varied in which subjects are examined by one rater when intra-rater reliability is tested, it reduces the risk of the rater recalling the previous test scores and reduces bias
This item can be scored yes if:
1 the order in which subjects were tested varied between raters if inter-rater reliability was tested;
2 the order of subjects was varied when intra-rater reliability was tested
If insufficient information is provided, then select “no.” If varied order of examination is unnecessary or impractical (e.g., rater(s) digitizing or reading X-rays) then select “N/A”

Item 7: If human subjects were used, was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?

Why the criterion should be evaluated: The index test and the reference standard should be performed at the same time; however, this is not always possible. It becomes important to know whether it is possible that the test variable did not change between the two tests, otherwise it will affect the index test’s validity performance
This item can be scored yes if:
1 result from the index test and the reference standard were collected on the same subjects at the same time;
2 a delay between measurements occurs, it is important that the target condition should not change between measurements
If the time period between performing the index test and the reference standard was sufficiently long that the target condition may have changed between the two tests or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 8: Was the stability (or theoretical stability) of the variable being measured considered when determining the suitability of the time interval between repeated measures?

Why the criterion should be evaluated: For reliability, the test variable should not change between repeated measures, otherwise it will decrease the amount of agreement obtained between and within the rater(s)
This item can be scored yes if:
1 the stability of the variable is known or reported, and reviewers then decide on an appropriate time interval between repeated measures (stability of a test variable can only be determined if there is a reference standard);
2 there is no reference standard, then the reviewers should agree upon the theoretical stability of the variable and decide on an appropriate time interval between repeated measures
If insufficient information is provided, then select “no”

Item 9: Was the reference standard independent of the index test?

Why the criterion should be evaluated: If the reference standard and the index test are not independently performed, then the index test cannot replace the reference standard on its own
This item can be scored yes if:
1 it is clear from the study that the index test did not form part of the reference standard
If it appears that the index test formed part of the reference standard, then select “no”

Item 10: Was the execution of the (index) test described in enough detail to permit replication of the test?

Why the criterion should be evaluated: Variations in the execution of the reference standard and the (index) test might affect the agreement between the two tests and it is also important to be able to replicate the same study procedure in another setting when needed
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations of methodology were supplied
The extent to which details is expected to be reported depends on the ability of different procedures to influence the results and on the type of instrument or test under evaluation
If insufficient information is provided, then select “no”

Item 11: Was the execution of the reference standard described in enough detail to permit its replication?

Why the criterion should be evaluated: For the same reason as item 10
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations were supplied
If insufficient information is provided, then select “no”

Item 12: Were withdrawals from the study explained?

Why the criterion should be evaluated: The sample composition will influence the validity and reliability performance of the (index) test; therefore, it is important to know whether any withdrawals from the sample might have changed the composition of the sample
This item can be scored yes if:
1 it is clear what happened to all subjects who entered the study;
2 subjects who entered but did not complete the study are considered
If it appears that subjects who entered but did not complete the study were not accounted for or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 13: Were the statistical methods appropriate for the purpose of the study?

Why the criterion should be evaluated: The aim of validity and reliability studies is to report on an estimate of validity and reliability for the particular test and appropriate statistical methods need to be implemented in order to produce this estimate
This item can be scored yes if:
1 the analysis is appropriate in terms of the type of data (e.g., categorical, continuous and dichotomous);
2 statistical analysis for validity studies incorporates, for example means, differences between measurements, 95% confidence interval and ANOVA; and
3 statistical analysis for reliability studies incorporates, for example, interclass correlation coefficient and 95% confidence interval
If the analysis is not appropriate or if insufficient information was provided, then select “no”

Appendix 3 Classification processes of OCS

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelnaeem, A.O., Rehan Youssef, A., Mahmoud, N.F. et al. Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review. Eur Spine J 30, 957–989 (2021). https://doi.org/10.1007/s00586-020-06712-0

Download citation

Received: 23 November 2020
Revised: 23 November 2020
Accepted: 27 December 2020
Published: 20 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00586-020-06712-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review