Skip to main content

Information Bias

  • Chapter
  • First Online:
ActivEpi Companion Textbook

Abstract

Information bias is a systematic error in a study that arises because of incorrect information obtained on one or more variables measured in the study. The focus here is on the consequences of having inaccurate information about exposure and disease variables that are dichotomous, that is, when there is misclassification of exposure and disease that leads to a bias in the resulting measure of effect. We consider exposure and disease variables that are dichotomous. More general situations, such as several categories of exposure or disease, continuous exposure or disease, adjusting for covariates, matched data, and mathematical modeling approaches, are beyond the scope of the activities provided below.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References on Information/Misclassification Bias

Overviews

  • Greenberg RS, Daniels SR, Flanders WD, Eley JW, Boring JR. Medical Epidemiology (3rd Ed). Lange Medical Books, New York, 2001.

    Google Scholar 

  • Hill H, Kleinbaum DG. Bias in Observational Studies. In Encyclopedia of Biostatistics, pp 323-329, Oxford University Press, 1999.

    Google Scholar 

  • Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic Research: Principles and Quantitative Methods. John Wiley and Sons Publishers, New York, 1982.

    Google Scholar 

Special Issues

  • Barron, BA. The effects of misclassification on the estimation of relative risk. Biometrics 1977;33(2):414-8.

    Article  MATH  Google Scholar 

  • Copeland KT, Checkoway H, McMichael AJ and Holbrook RH. Bias due to misclassification in the estimation of relative risk. Am J Epidemiol 1977;105(5):488-95.

    Google Scholar 

  • Dosemeci M, Wacholder S, and Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value? Am J Epidemiol 1990:132(4):746-8.

    Google Scholar 

  • Espeland MA, Hui SL. A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 1987;43(4):1001-12.

    Article  MATH  Google Scholar 

  • Greenland S. The effect of misclassification in the presence of covariates. Am J Epidemiol 1980;112(4):554-69.

    Google Scholar 

  • Greenland S, Kleinbaum DG. Correcting for misclassification in two-way tables and matched-pair studies. Int J Epidemiol 1983;12(1):93-7.

    Article  Google Scholar 

  • Reade-Christopher SJ, Kupper LL. Effects of exposure misclassification on regression analyses of epidemiologic follow-up study data. Biometrics 1991;47(2):535-48.

    Article  Google Scholar 

  • Satten GA and Kupper LL. Inferences about exposure-disease associations using probability-of-exposure information. J Am Stat Assoc 1993;88:200-8.

    MATH  Google Scholar 

  • Wynder EL. Investigator bias and interviewer bias: the problem of reporting systematic error in epidemiology. J Clin Epidemiol 1994;47(8):825-7.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendices

Homework

1.1 ACE-1. Radiation Exposure vs. GI Tumors

An investigator is interested in studying the relationship between radiation exposure and development of gastrointestinal (GI) tumors. He assembles a cohort of cancer-free subjects, divides them into "high" and "low" radiation exposure groups, and follows them over 10 years for evidence of GI tumors.

  1. a.

    Suppose that the investigator uses a diagnostic test that fails to identify 20 % of all GI tumors but never registers a false positive result (i.e. no one without the cancer is ever misdiagnosed as having cancer). Describe this situation in terms of sensitivities and specificities for the exposed and unexposed subjects:

Sensitivity (D | E) = Sensitivity (D | not E) =

Specificity (D | E) = Specificity (D | not E) =

The following summarizes the data from the SOURCE population (no misclassification):

 

High Radiation

Low Radiation

Total

GI Tumor

600

200

800

No Tumor

1400

1300

2700

Total

2000

1500

3500

  1. b.

    What is the unbiased estimate of the risk ratio (RR)?

  2. c.

    Use the sensitivities and specificities from part a to show the data that would have been observed by the investigator:

 

High Radiation

Low Radiation

Total

GI Tumor

   

No Tumor

   

Total

   
  1. d.

    Calculate the observed RR. Is there no bias, bias toward the null, bias away from the null, or switchover bias?

  2. e.

    The following table summarizes the OBSERVED data from a different study of radiation and GI tumors:

 

High Radiation

Low Radiation

Total

GI Tumor

476

376

852

No Tumor

524

624

1148

Total

1000

1000

2000

Assuming the scenario described in part “a” above, what is the corrected (unbiased) estimate of the RR for the relationship between radiation exposure and GI tumors for this study? Show your calculations. What is the nature of the bias, if any?

  1. f.

    Calculate an odds ratio (rather than a RR) for both the observed and corrected data in part e. Is your conclusion regarding bias the same?

1.2 ACE-2. CHD and Behavior

The following data represent the SOURCE population in a case-control study of coronary heart disease (CHD) and type A behavior:

 

Type A

Non-Type A

Total

CHD

80

25

105

No CHD

50

55

105

Total

130

80

210

  1. a.

    Calculate the unbiased estimate of the exposure odds ratio (EOR).

  2. b.

    Suppose that exposure was misclassified with the following sensitivities and specificities:

Cases:Sensitivity = .8Specificity = 1

Controls:Sensitivity = .9Specificity = .6

Show the data that would have been observed, given the misclassification:

 

Type A

Non-Type A

Total

CHD

   

No CHD

   

Total

   
  1. c.

    Calculate the EOR from the observed data. Is there no bias, bias toward the null, bias away from the null, or switchover bias?

1.3 ACE-3. Sleep Disturbance vs. Clinical Depression

The Johns Hopkins Precursors Study, a long-term prospective cohort study, was used to evaluate the relation between self-reported sleep disturbances and subsequent clinical depression. A total of 1,053 men at several U.S. universities provided information on sleep habits during medical school and then were followed for 20 years for development of depression. Subjects underwent extensive psychological testing and lengthy structured interviews conducted by psychiatrists trained in the diagnosis of depression. Results of the study are summarized below (you may assume that these data are correctly classified):

 

Sleep Disturbance

 

Yes

No

Depression

168

93

No Depression

258

534

 

426

627

Now suppose that the investigators had limited funds and were not able to use the sophisticated diagnostic tools described above. They had two options available to them for diagnosing depression among the study subjects: (1) an interview or (2) a self-administered questionnaire.

The interview was able to classify depression status with the following sensitivities and specificities:

Exposed:Sensitivity = 0.84 and Specificity = 0.90

Unexposed:Sensitivity = 0.84 and Specificity = 0.90

The questionnaire was able to classify depression status with the following sensitivities and specificities:

Exposed:Sensitivity = 0.80 and Specificity = 0.88

Unexposed:Sensitivity = 0.90 and Specificity = 0.95

  1. a.

    What is the true relative risk (RR) that describes the association between sleep disturbance and depression?

  2. b.

    What is the observed RR when the interview is used to classify depression?

  3. c.

    What is the observed RR when the questionnaire is used to classify depression?

  4. d.

    Taking only issues of validity into account, which tool for diagnosing depression is preferable - the interview or the questionnaire? Justify your answer.

1.4 ACE-4. Aspirin Use and Gastrointestinal Beeding

An epidemiologist was interested in determining whether use of a new aspirin-containing pain reliever was associated with an increased risk of gastrointestinal bleeding. S/he identified 600 patients who were taking the drug on a regular basis and 600 unexposed subjects. Subjects were followed for one year to detect the occurrence of gastrointestinal bleeding. Due to publicity about the potential hazards of the new drug, physicians participating in the study followed their exposed subjects more closely than unexposed subjects, and were thus more likely to diagnose gastrointestinal bleeds when they occurred in this group of study subjects.

  1. a.

    Which of the following describe(s) the situation? [You may choose more than one]:

  1. i.

    Nondifferential misclassification of exposure

  2. ii.

    Differential misclassification of exposure

  3. iii.

    Nondifferential misclassification of disease

  4. iv.

    Differential misclassification of disease

  5. v.

    Recall bias

  6. vi.

    Detection bias

  7. vii.

    1

Suppose that the following table represents the SOURCE population (i.e. correctly classified) for the study of the new drug and gastrointestinal bleeding:

 

Use of New Drug

 

Yes

No

GI bleed

400

300

no GI bleed

200

300

  1. b.

    Assume that when the study was carried out, GI bleeds were classified with sensitivity = 0.9 and specificity = 0.75 for subjects who were using the drug. For subjects not using the drug, GI bleeds were classified with sensitivity = 0.6 and specificity = 0.75. Use this information to fill in the table below:

 

Use of New Drug

 

Yes

No

GI bleed

  

no GI bleed

  
  1. c.

    Calculate the appropriate ratio measure of association for the OBSERVED (misclassified) data. Indicate whether there is bias, and if so, in what direction.

  2. d.

    Suppose the study were repeated, with GI bleeds detected by physicians who were blinded as to the exposure status of the subjects. If the sensitivity for unexposed subjects were increased to 0.9, while all other sensitivities and specificities remained the same, what impact would this have on the bias?

  1. i.

    The bias would be eliminated.

  2. ii.

    The magnitude of the bias would decrease, but the direction would remain the same.

  3. iii.

    The magnitude of the bias would increase, but the direction would remain the same.

  4. iv.

    The direction of the bias would change.

  5. V.

    None of the above.

1.5 ACE-5. Antidepressant Medication and Breast Cancer Risk

A study entitled “Antidepressant Medication and Breast Cancer Risk” was published in a recent issue of the American Journal of Epidemiology. According to the methods section of the paper, “Cases were an age-stratified (<50 and > =50 years of age) random sample of women aged 25-74 years, diagnosed with primary breast cancer during 1995 and 1996 (pathology report confirmed) and recorded in the population-based Ontario Cancer Registry. As the 1-year survival for breast cancer is 90 percent, surrogate respondents were not used. Population controls, aged 25-74 years, were randomly sampled from the property assessment rolls of the Ontario Ministry of Finance; this database includes all home owners and tenants and lists age, sex, and address.”

  1. a.

    Discuss the authors’ approach to the identification of cases with respect to the potential misclassification bias.

  2. b.

    Discuss the authors’ approach to the identification of controls with respect to the potential for (a) selection and (b) misclassification bias.

  3. c.

    Discuss the pros and cons of the decision not to allow surrogate respondents.

The methods section of the paper goes on to say: “Data were collected through mailed, self-administered, structured questionnaires that included information on (1) sociodemographic data; (2) duration, dosage, timing, and type of antidepressant medications used; and (3) potential confounders. Subjects were asked “have you ever taken antidepressants for at least 2 weeks at any time in your life?” (a list of 11 antidepressants was given to provide examples).”

  1. d.

    Discuss the pros and cons of the authors’ approach to exposure assessment. List and describe how information bias could result from this approach.

ACE-6. Validation Study

You have conducted:

  1. i)

    a cohort study of a disease and dichotomous exposure; and

  2. ii)

    a separate validation study to assess sensitivity and specificity of the measure used in the cohort study, since disease status was misclassified differentially with respect to exposure.

Results of the validation study:

Exposed:

 

True Disease

Measured Disease

+

-

+

95

10

-

5

90

Unexposed:

 

True Disease

Measured Disease

+

-

+

97

5

-

3

95

Results of the cohort study (observed data):

 

Exposure Status

Disease Status

+

-

+

540

284

-

1460

1716

Using the above information, estimate the true risk ratio adjusted for misclassification.

(To answer this question, you may wish to use the Data Desk template for correcting for misclassification., Specificity.ise)

2.1 ACE-7. Diagnostic Testing: Baseball Fever

A researcher at Javier Lopez University developed a new test for detecting baseball fever. The new test was evaluated in a population of 10,000 people, 21 % of whom were definitely known to have baseball fever. The number of negative tests was 8,162. The positive predictive value was discovered to be 91.4 %.

  1. a.

    Use the information provided above to fill in the following 2 x 2 table:

 

Baseball Fever

 

Present

Absent

Test Result

  

Positive

______

______

Negative

______

______

  1. b.

    What are the test’s sensitivity and specificity? Provide an interpretation of the sensitivity, using value you just calculated.

  2. c.

    If you were to apply this test to a patient and the result came back negative, what would you advise that patient regarding his/her chances of having baseball fever?

2.2 ACE-8. Predictive Value: Diabetes

In a certain community, eight percent of all adults over age 50 have diabetes. If a health service in this community correctly diagnosis 95 % of all persons with diabetes as having the disease and incorrectly diagnoses ten percent of all persons without diabetes as having the disease, find the probabilities that:

  1. a.

    The health service will diagnose an adult over age 50 as having diabetes.

  2. b.

    A person over 50 diagnosed by the health service as having diabetes actually has the disease.

2.3 ACE-9. Diagnostic Testing: Prostate Cancer

A group of 50,000 men over 60 years of age were tested for prostate cancer using the PSA test with the following results:

 

Prostate Cancer

PSA Test Result

Yes

No

Positive

2,900

20,000

Negative

100

27,000

Total

3,000

47,000

  1. a.

    What are the sensitivity and the specificity of this test?

  2. b.

    What is the positive predictive value of this test?

  3. c.

    If a person gets a PSA test result that is negative, should he worry about having prostate cancer? Explain.

  4. d.

    If a person gets a PSA test result that is positive, should he worry about having prostate cancer? Explain.

  5. e.

    Suppose a new test was developed that had the same sensitivity as the PSA test, but had a specificity that was 95 %. Assuming the same numbers subjects with and without (true) prostate cancer, describe the misclassification table that would result using the new test.

 

Prostate Cancer

New Test Result

Yes

No

Positive

  

Negative

  

Total

3,000

47,000

  1. f.

    What is the positive predictive value of the new test?

  2. g.

    If a person gets a new test result that is negative, should he worry about having prostate cancer? Explain.

  3. h.

    If a person gets a new test result that is positive, should he worry about having prostate cancer? Explain.

2.4 ACE-10. Prostate Cancer Screening

Suppose that a new screening test for prostate cancer has been under development and is almost ready to be put on the market. Subjects undergoing screening are required to provide a small blood sample that is then tested to determine the level of a certain factor (Factor P). The higher the level of Factor P, the more likely the presence of prostate cancer. However, Factor P may be elevated as a result of other non-cancerous conditions involving the prostate.

  1. a.

    Scientists in Europe and those in the United States have disagreed on the appropriate cut-point for determining whether the screening test is to be considered positive. Discuss the implications of the choice of cut-point for this test. Think in terms of evaluation of the test’s performance as well as ramifications for patient care.

  2. b.

    Design an epidemiologic study that would be appropriate for evaluating the effectiveness of the new screening test. Be sure to comment on the study design, study subjects, exposure(s) of interest, outcome(s) of interest, analytic plan, potential biases, and any other important aspects of your study.

  3. c.

    As soon as the new screening test becomes available for use, it receives a great deal of media attention. It is quickly endorsed by the American Medical Association, the American Urological Association, and the American College of Surgeons. A community-based health advocacy group founded by prostate cancer patients and their families begins to call for widespread screening using the new test. Their goal is see that every adult male in the United States is screened each year for prostate. Discuss the pros and cons of such a plan for widespread screening of the general public for prostate cancer.

Answers to Study Questions and Quizzes

3.1 Q9.1

  1. 1.

    2

  2. 2.

    7

  3. 3.

    100 x (6/8) = 75 %

  4. 4.

    100 x (7/10) = 79 %

3.2 Q9.2

  1. 1.

    False. Typically, several times and locations are used in the same residence, and a time-weighted average (TWA) is often calculated.

  2. 2.

    False. Good instrumentation for measuring time-weighted average has been available for some time.

  3. 3.

    False. A system of wire codes to measure distance and configuration has been used consistently since 1979 to rank homes crudely according to EMF intensity. However, the usefulness of this system for predicting past exposure remains an open question.

  4. 4.

    True

  5. 5.

    True

  6. 6.

    True

(Note: there are no questions numbered 7 to 9)

  1. 10.

    Interviewer bias. Subjects known to have experienced a venous thrombosis might be probed more extensively than controls for a history of oral contraceptive use.

  2. 11.

    Away from the null. The proportion of exposed among controls would be less than it should have been if both cases and controls were probed to the same extent. Consequently, the odds ratio in the misclassified data would be higher than it should be.

  3. 12.

    Keep the interviewers blind to case-control status of the study subject.

  4. 13.

    5

  5. 14.

    70

  6. 15.

    100 x (95/100) = 95 %

  7. 16.

    100 x (70/80) = 87.5 %

3.3 Q9.3

  1. 1.

    The estimated risk ratio for the observed data is (380/1000)/(240/1000) = 1.58.

  2. 2.

    Because the observed risk ratio of 1.58 is meaningfully different than the true (i.e., correct) risk ratio of 3.14.

  3. 3.

    Towards the null, since the biased estimate of 1.58 is closer to the null value than is the correct estimate.

  4. 4.

    No way to tell from one example, but the answer is no, the bias might be either towards the null or away from the null.

  5. 5.

    The observed OR of 2.6 that results from misclassifying exposure is meaningfully different than the true odds ratio of 3.5.

  6. 6.

    The bias is towards the null. The biased OR estimated of 2.6 is closer to the null value of 1 than is the correct OR.

3.4 Q9.4

  1. 1.

    Sensitivity = 720 / 900 = .8 or 80 %

  2. 2.

    Specificity = 910 / 1100 = .83 or 83 %

  3. 3.

    Yes. Both sensitivity and specificity are smaller than one. However, without correcting for the bias, it is not clear that the amount of bias will be large.

3.5 Q9.5

  1. 1.

    Sensitivity = 480/600 = .80 or 80 % and Specificity = 380/400 = .95 or 95 %.

  2. 2.

    Sensitivity = 240/300 = .80 or 80 % and Specificity = 665/700 = .95 or 95 %.

  3. 3.

    The sensitivities for CHD cases and non-cases are equal. Also, the specificities for CHD cases and non-cases are equal. The sensitivity information indicates that 20 % of both cases and non-cases with low intake of fruits and vegetables tend to over-estimate their intake. The specificity information indicates that only 5 % of both cases and non-cases with high intake tend to under-estimate their intake.

  4. 4.

    In the correctly classified 2x2 table, a = 600, b = 400, c = 300, and d = 700, so the estimated odds ratio is ad/bc = (600 x 700) / (400 x 300) = 3.5.

  5. 5.

    The observed OR of 2.6 that results from misclassifying exposure is meaningfully different than the true odds ratio of 3.5.

  6. 6.

    The bias is towards the null. The biased OR of 2.6 is closer to the null value of 1 than the correct OR.

3.6 Q9.6

  1. 1.

    Sensitivity = 580/600 = .97 or 97 % and Specificity = 380/400 = .95 or 95 %.

  2. 2.

    Sensitivity = 240/300 = .80 or 80 % and Specificity = 665/700 = .95 or 95 %.

  3. 3.

    No. Although the specificities for cases and non-cases are equal (i.e., 95 %), the sensitivity for the cases (97 %) is quite different from the sensitivity for the non-cases (80 %). This difference in sensitivities indicates that cases with low intake of fruits and vegetables are less likely to over-estimate their intake than non-cases.

  4. 4.

    Not much. The observed OR of 3.95 that results from misclassifying exposure is slightly higher than the true odds ratio of 3.50.

  5. 5.

    The bias is slightly away from the null. The biased OR of 3.95 is further away from the null value of 1 than is the correct OR of 3.5.

3.7 Q9.7

  1. 1.

    No. Assuming independent misclassification is not equivalent to assuming nondifferential misclassification. The latter assumes that how a subject classifies exposure will not vary with their true disease status, i.e., Pr(classifying disease status|truly E) = Pr(classifying disease status|truly not E) or Pr(classifying exposure status|truly D) = Pr(classifying exposure status|truly not D).

  2. 2.

    Differential because (Sensitivity D | E) is not equal to (Sensitivity D | not E) = .75.

  3. 3.

    Yes, the following are missing:

(Specificity D | E)

(Specificity D | not E)

(Specificity E | not D)

  1. 4.

    Pr(D΄E΄ | D E) = Pr(D΄ | D E) x Pr(E΄ | D E) = (Sensitivity D | E) x (Sensitivity E | D)

  2. 5.

    Pr(D΄ E΄ | D E) = .8 x .9 = .72.

  3. 6.

    Pr(D΄ E΄ | D not E) = Pr(D΄ | D not E) x Pr(E΄ | D not E) = (Sensitivity D | not E) x (1 – (Specificity E | D)).

  4. 7.

    Pr(D΄ E΄ | D not E) = .75 x (1 - .95) = .0375.

3.8 Q9.8

  1. 1.

    False – in addition to the sensitivities, if the specificities for both exposed and unexposed are the same, then the bias must be towards the null.

  2. 2.

    True

  3. 3.

    True

3.9 Q9.9

  1. a)

    Away

  2. b)

    Towards

  3. c)

    Towards

  4. d)

    Away

  5. e)

    Away

  6. f)

    Away

  7. g)

    Towards

  8. h)

    Towards

  9. 1.

    It depends. We know that the bias must be towards the null. If the direction of the bias is all that we are interested in, then we do not need to correct for the bias. However, if we want to determine the extent of the bias and to obtain a quantitative measure of the true effect, then we need to correct for the bias.

  10. 2.

    We can either reason that misclassification is nondifferential from our knowledge or experience with the exposure and disease variables of our study, or we can base our decision on reliable estimates of the sensitivity and specificity parameters.

  11. 3.

    It depends. The bias may be either towards the null or away from the null. We might be able to determine the direction of the bias by logical reasoning about study characteristics. Otherwise, the only way we can determine either the extent or direction of the bias is to compare a corrected estimate with an observed estimate.

  12. 4.

    The biased (i.e., misclassified) observed odds ratio is closer to the null than the corrected odds ratio.

  13. 5.

    The greatest amount of bias is seen with the observed OR is 1.5 compared to the corrected OR of 3.5, which occurs when both the sensitivity and specificity are 80 %.

  14. 6.

    The bias is smallest when the correct OR is 1.8, which results when both sensitivity and specificity are 90 %.

  15. 7.

    One way to decide is to choose the corrected OR corresponding to the most realistic set of values for sensitivity and specificity. Another way is to choose the corrected OR (here, 3.5) that is most distant from the observed OR. A third alternative is to choose the corrected OR that changes least (here, 1.8) from the observed OR.

3.10 Q9.10

  1. 1.

    For males, SeD = 32/40 = 80 % and SpD = 54/60 = 90 %.

  2. 2.

    For females, SeD = 16/20 = 80 % and SpD = 72/80 = 90 %.

  3. 3.

    Nondifferential: The SeD for males and females are equal at 80 % and the SpD for males and females are equal at 90 %.

  4. 4.

    RR(adjusted) = (A/1000)/(B/1000) = (400/1000)/(200/1000) = 2.0.

  5. 5.

    The bias is towards the null because the biased risk ratio estimate of 1.58 is closer to the null value than is the corrected risk ratio.

  6. 6.

    q will be zero if both the SeD and SpD add up to 1. For example, if both the SeD and SpD equal .5, then q = 0. In this case there would be an equal chance of being misclassified into any one of the four cells of the 2x2 tables. There would be no point in computing corrected effect estimates for such a situation, since misclassification would have completely invalidated one’s study results.

3.11 Q9.11

  1. 1.

    SeE = 48/60 = 80 % and SpE = 133/140 = 95 %.

  2. 2.

    You would need to stratify the misclassification table into two tables, one for cases and the other for controls, and then determine whether corresponding sensitivities and specificities for cases and controls were equal. Without such information, you might be able to reason that since both cases and controls have cancer that is gastrointestinal, they may tend to have similar reporting tendencies about PEU history. Such an argument suggests that misclassification is likely to be nondifferential.

  3. 3.

    Note: there is no question 3 in this section.

  4. 4.

    OR(adjusted) = (A x D)/(B x C) = (80 x 280)/(220 x 20) = 5.1.

  5. 5.

    The bias is towards the null, because the biased risk ratio estimated of 3.0 is closer to the null than the corrected risk ratio of 5.1.

3.12 Q9.12

  1. 1.

    Stratify the classification information on pollution level by true illness status, and stratify the classification information on illness by true pollution level.

  2. 2.

    Not really because the true stratified misclassification information involving the true illness status and pollution level is not provided.

  3. 3.

    Non-differential misclassification is assumed since no stratum-specific sensitivity or specificity values are provided in the sub-sample or the previous pollution study to be applied here. The non-differential misclassification assumption does appear reasonable. It is unlikely that illness would be misreported one week later according to pollution level, or that water quality was measured incorrectly or misreported according to illness level.

  4. 4.

    It is reasonable to assume independent classification since the exposure and disease variables were measured at different times and likely by different investigators.

  5. 5.

    RR(corrected) = (339.3/416.7)/(410.8/2083.4) = 4.1

  6. 6.

    The bias is towards the null because the biased risk ratio estimate of 1.6 is closer to the null value than is the corrected risk ratio of 4.1.

  7. 7.

    The biased estimate of 1.6 jumps quite a lot to 4.1 when corrected. Such a large jump from biased to correct estimate often occurs when both disease and exposure are misclassified, even when the sensitivity and specificity parameters are close to 100 %.

  8. 8.

    If either the sensitivity and specificity for disease sums to 1 or the sensitivity and specificity for exposure sums to 1, then the corrected cell frequencies are undefined because q* = 0.

3.13 Q9.13

  1. 1.

    7.4 – To answer this question, you will need to use the appropriate formula to correct for nondifferential misclassification of disease; the corrected table is:

 

E

Not E

 

D

68

12

80

Not D

52

68

120

Total

120

80

200

  1. 2.

    towards

  2. 3.

    true – If the sum of the specificity and sensitivity equals 1, you will obtain indeterminate results.

  3. 4.

    true

  4. 5.

    false – Since the sum of either the specificities and sensitivities is 1, the results will be indeterminate.

3.14 Q9.14

  1. 1.

    Differential because the sensitivities of 96.7 % and 80 % are different, even though the specificities are the same.

  2. 2.

    A CHD case, who might be concerned about the reasons for his or her illness, is not as likely to over-estimate his or her intake of fruits and vegetables as is a control.

  3. 3.

    OR(corrected) = (599.7 x 700)/(400.2 x 300) = 3.5. This is the same value that we previously obtained for the true odds ratio in our previous presentation about differential misclassification that showed how to obtain observed cell frequencies when starting out with the true cell frequencies.

  4. 4.

    The bias is away from the null because the biased odds ratio estimate of 3.95 is further away from the null value than the corrected odds ratio of 3.5.

3.15 Q9.15

  1. 1.

    3.5 – See the appropriate formulas required to calculate this estimate. A = 600, B = 400, C = 300, D = 700.

  2. 2.

    Away from – Since the observed OR is further from the null than the adjusted estimate, the observed estimate must be away from the null.

3.16 Q9.16

  1. 1.

    Sensitivity = 48 / 60 = 0.80.

  2. 2.

    The patient is very unlikely to have the disease, since the probability of getting a negative test result for a patient with the disease is.01, which is very small.

  3. 3.

    Specificity = 126 / 140 = 0.90

  4. 4.

    The patient is very likely to have the disease, because the probability of getting a positive result for a patient without the disease is .01, which is very small.

  5. 5.

    Prevalence of true disease = 60 / 200 = 0.30.

  6. 6.

    Cannot fully answer this question. Both the sensitivity and specificity are relatively high at .80 and .90, but the prevalence is only 30 %. What is required is the proportion of total ultrasound positives that truly have DVT, which in this study is 48 / 62 = 0.77, which is high but not over .90 or .95.

3.17 Q9.17

  1. 1.

    Choice A is the predictive value and Choice B is sensitivity.

  2. 2.

    PV + = 48 / 62 = 0.77

  3. 3.

    Based on the table, the prior probability of developing DVT is 60 / 200 = 0.30, which is the estimated prevalence of disease among patients studied.

  4. 4.

    Yes, the prior probability was 0.30, whereas the (post-test) probability using an ultrasound increased to 0.77 given a positive result on the test.

  5. 5.

    PV- = 126 / 138 = 0.91

  6. 6.

    Based on the table, the prior probability of not developing DVT is 140 / 200 = 0.70, which is 1 minus the estimated prevalence of disease among patients studied.

  7. 7.

    Yes, the prior probability of not developing DVT was 0.70 whereas the (post-test) probability of not developing DVT using an ultrasound increased to 0.91 given a negative test result.

  8. 8.

    Sensitivity = 16 / 20 = 0.80, specificity = 162 / 180 = 0.90, prevalence = 20 / 200 = .10.

  9. 9.

    Corresponding sensitivity and specificity values are identical in both tables, but prevalence computed for this data is much lower at 0.10 than computed for the previous table (.30).

  10. 10.

    PV + = 16 / 34 = 0.47 and PV- = 162 / 166 = 0.98.

  11. 11.

    PV + has decreased from 0.77 to 0.47 and PV- has increased from 0.91 to 0.98 whereas the prevalence has dropped from 0.30 to 0.10 while sensitivity and specificity has remained the same and high.

  12. 12.

    If the prevalence decreases, the predictive value positive will decrease and may be quite low even if sensitivity and specificity are high. Similarly, the predictive value negative will increase and may be very high, even if the sensitivity and specificity are not very high.

3.18 Q9.18

  1. 1.

    90 %

  2. 2.

    90 %

  3. 3.

    33.3 %

  4. 4.

    81.8 %

  5. 5.

    16.7 %

  6. 6.

    64.3 %

  7. 7.

    smaller

  8. 8.

    10 %

  9. 9.

    50 %

  10. 10.

    smaller

  11. 11.

    small, high

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kleinbaum, D.G., Sullivan, K.M., Barker, N.D. (2013). Information Bias. In: ActivEpi Companion Textbook. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5428-1_9

Download citation

Publish with us

Policies and ethics