The incorporation of new technologies into medical practice has been slower than that of other industries. Medical culture, while aggressively trying to develop new approaches to serious problems, has also simultaneously been notoriously resistant to those changes. The timing of adoption of new techniques is often very variable with some physicians/institutions/countries ranging across a spectrum of “early adopters” to “late adopters.” There are many underlying components to such variability, including technological capabilities, resources available to implement new technologies, the cost/benefits of such developments, return on investment, and perceived liability reductions and exposures from such moves.1,2
There are two usual requirements for a technology to replace another: the new technology has been reasonably vetted and found to be an improvement or cheaper compared to the existing one, and clinicians become uncomfortable staying with the old approach. With minimal exceptions, there is never universal agreement that a new technology should immediately replace the old one—just as there is usually not universal acceptance that any new paradigm should replace an older one.
Acceptance is affected by a combination of factors that must be in place for the process to move forward. It often depends on perception in the community, resolving practical problems of implementation, and on technical assessments of evidence. Currently debated, disruptive evolutions in obstetrics include the use of cell-free fetal DNA (cffDNA) versus diagnostic procedures such as microarrays, as well as pan-ethnic carrier screening.3,4,5 cffDNA utilization has skyrocketed much quicker than any recent technology. It clearly identifies an increased percentage of fetuses with Down syndrome, but it comes at the cost of abandonment of diagnostic procedures from which microarray analysis could detect a far higher number of serious disorders.4,6 The gap will further increase as whole exome and whole genome sequencing come on line over the next several years.7 Conversely, there is underutilization of basic, let alone pan-ethnic, carrier screening. Most clinicians are not aware that even in well-defined risk groups, such as the Ashkenazi Jewish population, pan-ethnic screening identifies carriers of more conditions that are not within the typical Ashkenazi panel than those within.8
New screening approaches do not always have to include new technologies, per se. Our work on fetal monitoring has shown that the incorporation of other already known variables such as increased uterine contractions and the presence of maternal, fetal, and obstetrical risk factors can significantly increase the performance metrics of screening.9,10,11
Similarly, resistance to change is multifactorial in scope and intensity. For example, it took many years after almost all experts agreed upon the utility of antenatal steroids for lung maturity for the use of these drugs to become the standard of care.12 More recently, the debate over preeclampsia screening has been intense. National bodies, such as the American College of Obstetricians and Gynecologists (ACOG), are typically late in the game to encourage adoption because, as many critics explain, such recognition could create liability exposure for those late to move to the new technologies.13
In the 1990s, ACOG warned its membership that failure to offer low maternal serum alpha-fetoprotein (MSAFP) for Down syndrome screening could create medicolegal exposures. It was intended to protect the membership but had the effect of making such offerings required as a standard of care.14
The practice of medicine involves routine use of both diagnostic and screening tests. Obstetrics does much more than most specialties. Despite the Pap smear, a screening test that has been standard practice for nearly a century, most patients and, frankly, many physicians do not understand the difference between screening and diagnostic tests15 (Table 15.1). Diagnostic tests are meant to give a definitive answer, may have risks, may be invasive, may be expensive, and are only meant for patients at a risk high enough to warrant them. Conversely, screening tests are meant for everyone, and these tests divide a group with high enough risk to warrant diagnostic testing from those who do not. Screening tests do not give definitive answers.16 How well a screening test will do its job is defined by the metrics of sensitivity, specificity, positive predictive value (PPV), and negative predictive value.
Table 15.1 Characteristics of Screening Tests Versus Diagnostic Tests
Diagnostic Tests
Performed only on at-risk population
Commonly expensive
Commonly have risk
Results give definitive answer
Screening Tests
Offered to general population of patients
Healthy patients
Inexpensive
Easy to perform
Reliable
Rapid return of results
Identify at-risk population
Results do not give definitive answer
The principles of evaluation were introduced into medical practice in the 1970s by Galen and Gambino.16 The performance characteristics establish the boundaries of a playing field and a scoring system within which competitors for better ways to do things can be evaluated.
There are up to 10 criteria generally felt necessary to be considered before deciding (from a public policy perspective) to screen for a condition.17 We have focused on seven salient, generalizable ones15 (Table 15.2). However, not all screening tests currently being used actually follow these guidelines. Variability in criteria for use can lead to disproportionate expectations, expenditures, and complications from follow-up diagnostic testing that are likely unwarranted. As opposed to the individual patient and physician who are interested in the outcome for a specific person (PPVs and negative predictive values), the goal of a screening program is population based (sensitivity and specificity). The goal is to detect the maximum number of affected individuals for the least number called screen positive. Where to put the cutoff points is arbitrary, but it must be maintained to maximum efficiency. Specifically, a program cannot be judged by whether any particular patient’s problem was or was not identified.16
Diagnosing patients with disease usually involves tests or procedures performed on persons believed to be at increased risk. Procedures to determine such status may include clinical examinations, laboratory testing, minor invasive procedures (such as obtaining blood), or even major surgical ones. Only a small portion of the overall population generally has enough risk to justify expensive or significant invasive procedures.
Particularly for genetic disorders, there are often population subgroups known to be at disproportionately high risk. Well known subgroups include women at advanced maternal age (AMA) for Down syndrome, Ashkenazi Jewish heritage for Tay-Sachs disease (TSD), African heritage for sickle cell disease, and numerous others.8 However, for many disorders, although the risk for any given individual in the high-risk category is certainly higher than for anyone in the low-risk category, if the high-risk category is a small proportion of the population, the majority of affected individuals actually come from the low-risk group.3 Particularly with the advent of increasingly sophisticated molecular technologies, we now have the ability to look for literally thousands of potential disorders in any individual who may be totally asymptomatic.21
Screening test results are by definition not pathognomonic for the disease15; rather, they delineate who needs further testing. With regard to genetic diseases, for example, asking a patient “how old are you?” is nothing more than a cheap screening test. Historically, using maternal age 35 years as a cutoff, only 30% of chromosomal abnormalities, such as Down syndrome, were detected because that is the percentage that occurred in women older than 35 years compared to those younger than 35 years (before the massive increase in fertility treatments that have increased the number of “older women” having their own children). In the United States, the AMA group has routinely been offered diagnostic testing. Such targeted group testing can now dramatically increase the detection of certain chromosomal abnormalities from 30% (by age) to about 80% to 90% by combining methodologies,4 but the principle has been the same for decades: screen widely, routinely, and cheaply (if that it technologically, programmatically, logistically, and fiscally possible), and then perform follow-up tests that are more accurate. Changes in technological capabilities, however, are challenging that approach, specifically for aneuploidy and copy number variants (CNVs).4,17
Conditions should have a carrier frequency of 1 in 100 or greater
Conditions should be severe enough that at-risk couples would consider having a prenatal diagnosis
Condition should cause cognitive disability, necessitate surgical or medical intervention, and/or have an effect on quality of life
Conditions should have a well-defined phenotype
Conditions with variable expressivity, incomplete penetrance, or mild phenotype should be optional
Prenatal diagnosis can lead to prenatal intervention, delivery management, and/or prenatal education of parents
Conditions should have a detrimental effect on quality of life
Patients should provide consent for any adult-onset disorders tested
Exclude conditions with adult onset, low penetrance, and those that cannot effectively be identified by molecular techniques
Conditions should cause cognitive or physical impairment
Causative gene(s), mutations, and mutation frequencies should be known in the population being tested
—
Conditions should require surgical or medical intervention
Validated clinical association between the mutation(s) detected and the severity of the disorder should exist
—
Conditions should have an onset early in life
—
—
Conditions can be diagnosed prenatally
—
—
ACMG, American College of Medical Genetics and Genomics; ACOG, American College of Obstetricians and Gynecologists.
Key Measures of Screening Tests: Sensitivity, Specificity, PPV, and Negative Predictive Value
Four key measures are used in the evaluation of screening tests: sensitivity, specificity, PPV, and negative predictive value15,16 (Figure 15.1). Sensitivity and specificity fundamentally are epidemiologic questions. For example, sensitivity is defined by the question, of all the people with a condition, what percentage were identified by the test? Conversely, specificity is defined by the question, of all people who do not have the disease process, what percentage of the patients test negative? Physicians are generally more interested in different questions, however, because only after a positive test does the patient usually receive additional follow-up care. Therefore, the question becomes, of all patients who have a positive test, what percentage of them actually have the disease? This is PPV. The negative predictive value is just the opposite—that is, of all the people who have a negative test, what percentage of them are actually negative?
Figure 15.1 Screening metrics.
As a principle, sensitivity and specificity do not vary as a function of prevalence, unless there is an influence of other factors on the equation. However, PPVs and negative predictive values do. This has particular relevance, for example, to the mid 1980s when HIV testing first became a subject of public debate. One of the suggestions of the Reagan White House was to have mandatory testing of traditional male/female couples about to marry. In a population in which the prevalence is very low, the proportion of positives that will be false positive will be much higher than in a population in which the prevalence is very high. In the latter population, a large proportion of positives will, in fact, be true positives. In both high and low prevalence areas, the sensitivity and specificity of the tests should be the same, but the positive and negative predictive values will be widely different. If a test is absolutely useless, then the predictive value after the test will be the same as the population risk before the test. Some tests have even been worse than that, that is, the chance of them determining the correct outcome was less than a coin flip.
Newer tests are developed in an effort to refine the sensitivity and specificity of screening, and to reduce the overall costs of the screening programs per se. The goal is to reduce the need for the expenses of invasive testing that follow a positive screening. In addition, although not often mentioned, a good screening prenatal screening test will, in practice, reduce the cost of the care of affected newborns who might, as a result of screening, be detected and terminated during the pregnancy at the wishes of the parents.21,22,23 Changing attitudes in society and the lessening paternalistic nature of medicine are bringing new tests onto the market—some, such as ancestry determination, are now available to the public without medical supervision and are aggressively marketed. It remains to be seen how such screening tests will be received, used, and misused.
Use and Misuse of Statistics
The use and misuse of statistical data to justify a particular approach of medical care has been omnipresent for decades. In the 1970s, Galen and Gambino were the first to show that the proper use of statistical principles to interpret laboratory data could significantly improve the quality of clinical care.16 However, the abuse of such statistics has likewise been used to convince clinicians and patients about less than optimal therapies. For example, with a disorder of low prevalence, even a great test will have a low PPV, and the negative predictive value will be extremely high even before any tests are done. If the population incidence of disease X is 2%, then just saying “hello” to the patient will be associated with a 98% negative predictive value.
Each test must first be judged on its properties. Only if a screen has both good specificity and sensitivity is there a chance that it may be clinically useful. However, there are other criteria that are less specific to the test than to the disease in question and the society or system in which the patients and physicians are embedded.17 Some screening criteria relate to the disease itself and will change over time along with the development of better screening tests and treatments for those disorders. As the natural history of diseases becomes better understood, treatments will generally become more effective and hopefully available. Additionally, for screening to be appropriate, large populations must be reached, convinced that screening is worthwhile, and confirmatory testing and follow-up must be available. None of these can be taken for granted, but perhaps the most challenging is reaching population segments who could benefit from screening and who, for a variety of reasons, may not be eager to participate.
Screening and testing are powerful public health tools. As specific risk factors are considered, and relevant populations become more constrained, the line between screening and testing can become blurred. This has been most obvious in the poor performance of electronic fetal monitoring for which the difference between screening and testing has been blurred and for which performance metrics have been very poor.9,10,11
History of Screening in Obstetrics
Beyond the Pap smear and blood pressure that have direct relevance to gynecology and in fact all medical specialties, the first obstetric screening test was for Rh status, which, with the development of RhoGAM in the 1970s, was remarkably successful in preventing sensitization and tragic consequences to future pregnancies. While such has been a landmark public health achievement, it is sad to report that its low use in the developing world has left hundreds of thousands of pregnancies at high risk and untreated.24
In terms of screening for intrinsic fetal problems, the use of MSAFP in the 1970s (preultrasound) was shown to have about a 90% sensitivity for a 5% false positive rate. The development of the screening test followed the use of amniotic fluid alpha-fetoprotein for diagnosis in couples known to be at high risk. However, because about 95% of all neural tube defects (NTDs) occur to women in the low-risk population and the risk of amniocentesis was felt at the time to be as much as 2%, primarily offering diagnostic procedures to the population at large was neither programmatically nor financially feasible and would likely have led to far more complications from the procedures than the number of abnormalities detected actually warranted.
Next came the discovery that low levels of MSAFP were associated with increased risks of aneuploidy, specifically trisomies 21 and 18. Use of MSAFP screening was much worse for detecting aneuploidies than for detecting NTDs, but it was still an improvement over AMA alone. Given that the risk of a 35-year-old woman carrying a fetus with Down syndrome at midtrimester is about 1/270, the detection of Down syndrome in 1/140 patients following a low MSAFP result was an improvement. Double, triple, and quad screening raised the PPV to about 1/50.25 In relative terms, such was a big improvement, but there was still much more to be desired. First-trimester combined screening and CffDNA have significantly increased the performance metrics and are discussed in Chapter 11.
The first forays into carrier (Mendelian disorder) screening focused on conditions with simple genetics and simple laboratory requirements. Sickle cell anemia (SSA) is the prototype as every person with SSA on the planet has the same one base pair substitution. Early assessments by “sickle cell prep,” hemoglobin electrophoresis, and molecular technologies improved performance metrics.
TSD, a disorder whose consequences were substantial, likewise met the criteria of having a defined population perceived to be at high risk and a laboratory methodology of enzyme analysis that was felt to be sufficiently accurate. Molecular analysis revealed a manageable number of mutations to be investigated. In response, the “Dor Yeshorim” (looking to the future) program was established in the early 1970s for the observant Jewish community in New York. These families by custom and practice had arranged marriages and commonly had large families.26 TSD carrier status was very common, so disease occurrence was very high. The program featured TSD carrier testing for high school students; however, the patients were not told the results. When parents wanted to have their children marry, the prospective couple and families would meet with the “matchmaker.” She would then, in secret, check their carrier statuses. If they were an at-risk couple, she would decide, without explaining why, that they were “not a good match.” Hundreds of what would-be carrier couple marriages were prevented, and, because of the large families expected, large numbers of affected children were never conceived.
Only gold members can continue reading. Log In or Register to continue