387Statistics
DEFINITIONS
• General definitions
Variable—anything manipulated in an experiment.
Independent variable—one varied by and under the control of the experimenter.
Dependent variable—one that responds to manipulation.
Nominal variable—a named category, for example: sex, diagnosis.
Ordinal variable—a set of ordered categories, for example: stages of cancer are ordered but the significance between each step is not known.
Interval variable—measurement in which the step between is meaningful for example: temperature, age.
Ratio—ratio of the numbers has some meaning.
Parametric—data that follow a normal distribution.
Nonparametric—data that do not follow a normal distribution (nominal and ordinal).
Incidence—current number of new events/population at risk in same time interval.
Prevalence—total number of events/population at risk. Prevalence should be more than the incidence.
• Measures of central tendency
Mode—value most often reported.
Median—value with half the responses below and half above (nonparametric).
Mean—average of all values.
• Measures of dispersion
Standard deviation (SD) of the mean is the square root of the variance. The smaller the SD, the less each score varies from the mean: 1 SD = 68%, 2 SD = 95.5%, 5 SD = 99%.
Variance—the average of the squared differences from the mean (value of point − mean)2/total number of data points.
Range—the difference between the highest value and the lowest value.
Percentile—where the result lands out of 100.
METHODS TO ANALYZE DATA
There are two methods to analyze data. Descriptive statistics communicate results, but does not generalize beyond the sample. Inferential statistics communicate the likelihood of these differences occurring by a chance combination of unforeseen variables.
• Null hypothesis: by statistical convention, it is assumed that the speculated hypothesis is always wrong and that the observed phenomena simply occur by chance. It is this hypothesis that is to be either nullified or not nullified by the test. When the null hypothesis is nullified, it is possible to conclude that data support the alternative hypothesis.
• Significance level: the extent to which the test in question shows that the “speculated hypothesis” has or has not been nullified is called its significance level; the higher the significance level, the less likely it is that the phenomena in question could have been produced by chance alone.
• Statistics for inference (hypothesis) testing
Confidence intervals (CIs)—used to indicate the reliability of an estimate. The CI is calculated by 1-alpha.
Standard error (SE)—this is used to help determine if the result is true or occurs more by chance. SE = SD/square root of sample size. The SE can either be systemic, where the wrong measure is taken each time or random, where the answer is different each time the experiment is run.
Margin of error—the amount the results are expected to change from one experiment to another.
Central limit theory (CLT)—if the sample size is sufficiently large (n > 10), the mean will normally distribute regardless of the original distribution. This theory allows the parametric assessment of nonparametric data.
Z test—compares the sample mean with the known population mean.
• Sensitivity: sensitivity relates to the test’s ability to identify positive results. The sensitivity of a test is the proportion of people who have the disease who test positive for it. For example, a sensitivity of 100% means that the test recognizes all actual positives—that is, all sick people are recognized as being ill. Thus, in contrast to a high-specificity test, a negative result in a high-sensitivity test is used to rule out the disease.
This can be written as follows:
Sensitivity = Number of true positives/Number of true positives + Number of false negatives or
True positives/All positive with disease
If a test has high sensitivity, then a negative result would suggest the absence of disease.
• Specificity: specificity relates to the ability of the test to identify negative results. The specificity of a test is defined as the proportion of patients who do not have the disease who will test negative for it. This can also be written as follows:
Specificity = Number of true negatives/Number of true negatives + Number of false positives or
True negatives/All negative with disease.
The specificity states the ability of a test to determine if the patient tests negative that the patient does not have the disease.
• Positive predictive value (PPV): this test reflects the probability that a positive test reflects the underlying condition being tested for.
• Negative predictive value (NPV): this test reflects the proportion of subjects with a negative test result who are correctly diagnosed. A high NPV means that when the test yields a negative result, it is most likely correct in its assessment (Table 9.1).
| Disease positive | Disease negative |
---|---|---|
Positive exp/screen | A | B |
Negative exp/screen | C | D |
Sensitivity—True positives/All with disease |
| A/(A + C) |
Specificity—True negatives/All without disease |
| D/(B + D) |
PPV |
| A/(A + B) |
NPV |
| D/(C + D) |
NPV, negative predictive value; PPV, positive predictive value. |