If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
We simulate CF birth incidence to account for changes in live births by race and Hispanic ethnicity and we account for the introduction of newborn screening for CF by state and year of implementation.
•
The prevalence of CF in the United States in 2020 is approximately 40,000 individuals.
•
Participation in the CF Foundation Patient Registry is nearly 80% of the total population of individuals living with CF, with highest participation among children under 15 years of age.
Abstract
Background
The Cystic Fibrosis Foundation Patient Registry (CFFPR) collects data on individuals with cystic fibrosis (CF) in the United States (US). In 2012, the US CF population was estimated at 33,292 to 34,327 individuals, with 81-84% CFFPR participation.
Methods
In this study, we update these estimates via simulation to account for uncertainty in CF incidence by race or Hispanic ethnicity, initiation of CF newborn screening (NBS) programs by state, and updated cumulative survival for CF births 1968-2020. We defined registry participation as the proportion of individuals alive as of 2020 with any prior CFFPR participation as well as the proportion with contributing data in 2019 or 2020; we summarize CFFPR participation for those born prior to 1968.
Results
We estimated the 2020 prevalent CF population between 1968-2020 to be 38,804 (95% Uncertainty Interval (UI): 38,532 to 39,065) individuals, with 77% of the prevalent CF population contributing recent data. CFFPR participation differs by age (54% of those born in 1968) and exceeds >90% of the population born in 2009 or later.
Conclusions
We demonstrate that the CFFPR remains a valid data source generalizable to the CF population. High participation among younger individuals may reflect the success of newborn screening programs and early referral to CF care. If engagement can be sustained, the percentage of individuals participating in the CFFPR will grow over time and there is an opportunity to identify factors associated with loss to follow up among older individuals to optimize the quality of the CFFPR data.
Cystic Fibrosis (CF) is a disease caused by variation in the cystic fibrosis transmembrane conductance regulator (CFTR) gene causing dysfunction in the CFTR protein which is responsible for chloride ion transport across apical membranes of epithelial cells in tissues [
]. CF produces excessively dry and thick mucus in the lungs, which can result in recurrent infections and inflammation, that contributes to lung function decline. CFTR dysfunction also leads to other comorbidities such as exocrine pancreatic insufficiency and CF-related diabetes [
]. First established in 1968, the Cystic Fibrosis Foundation Patient Registry (CFFPR) collects detailed longitudinal data on persons with CF who attend accredited CF care centers in the United States (US) [
]. CFFPR data serve a variety of uses, from bench-marking CF care center performance to characterizing groups within the CF population who might benefit from novel therapeutics. CFFPR-based studies describe long-term trends in population-level outcomes such as the prevalence of infections or complications, incidence of transplantation, and overall improvements in survival. The current CFFPR is updated annually and includes data from individuals participating in the CFFPR at any time from 1986 to the present. The history of the CFFPR and data collection methods were previously detailed in Knapp et al [
The objective of this analysis was to estimate size of the CF population in the US and CFFPR participation in 2020. CFFPR participation was first quantified in 2012 at 81-84% based on an estimated prevalent US population between 33,292 and 34,327 people [
]. Those estimates were derived from published CF incidence rates, aggregate national-level vital statistics data and deaths reported by the Centers for Disease Control and Prevention WONDER database [
Centers for Disease Control and Prevention National Center for Health Statistics. National Vital Statistics System, Mortality: Compressed Mortality File 1999-2020 on CDC WONDER Online Database, released 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program 2022.
], a decline observed in other settings coincident with long-term implementation of newborn screening (NBS) and increased prenatal carrier screening, including the 2017 recommendation to offer CF carrier screening to all women considering pregnancy or currently pregnant [
]. Advances in the management of CF disease including the introduction and expansion of access to CFTR modulator therapies from 2012-2020 may also influence participation in CFFPR. Finally, median predicted survival in 2020 is estimated at 50.0 years (95% confidence interval (CI): 48.5; 51.3 years), compared to 38.0 years (95% CI: 36.96; 39.1 years) in 2010 [
]. As such, prior estimates of registry participation may not reflect the present period and understanding CFFPR participation by age is increasingly important as individuals with CF survive longer into adulthood.
2. Methods
To estimate the prevalence of CF in the US in 2020, we implemented a simulation study which allowed assumptions related to CF birth incidence (pre-NBS), the proportion of CF cases detected post-NBS, and cumulative survival to change with every iteration. We repeated the analysis 500 times (iterations) and used the distribution of the 500 estimates to quantify the prevalent population. Fig. 1 presents an overview of the model inputs, estimation steps and aggregation steps. First, we extracted the total number of live births reported by state, race, and Hispanic ethnicity from 1968 to 2009 from US vital statistics data [
] (see Supplementary Materials). Total live births by state were calculated for mutually exclusive categories defined as White non-Hispanic, Black non-Hispanic, Other non-Hispanic (including Asians and indigenous people) and any Hispanic. These four categories were chosen to reflect differences in genetic risk and to allow for comparison across US vital statistics and CFFPR data. According to the most recent census, the US population is 19% Hispanic, with 12% and 62% identified as Black and White, respectively [
]. We created an aggregate non-Hispanic/Other category given the small number of incident CF reported for Asian, Native Hawaiian or Pacific Islander, Native American or Alaska Native or other race. For state-year combinations prior to NBS introduction, we estimated incidence of CF by multiplying the total reported live births by CF incidence (by race or ethnicity) drawn from a uniform distribution with upper and lower bounds set according to published incidence estimates (Supplement Table 1). To estimate incident CF after implementation of NBS programs [
], we adjusted CFFPR-reported total births from 2010 to 2020 for possible under-detection dependent on the age of each birth cohort as of 2020 and differential rates of detection by race and Hispanic ethnicity, varying the proportion detected in each iteration of the simulation (Supplement Fig. 1 and Tables 2-3). Individuals with no reported state of birth were assumed to be US born and those births were proportionally allocated to states to reflect the distribution of US live births by race and Hispanic ethnicity. All births that occurred post-NBS implementation were further inflated to account for 1.5% non-consent to registry participation based on CFF administrative records for 2019-2020 (data not presented).
Fig. 1Flowchart to illustrate the estimation process.
Second, we estimated the proportion of incident CF births that survived through the end of 2020. We extracted the total number of deaths by birth cohort between 1986 - 2020 from the CFFPR and deaths among CFFPR participants lost-to-follow-up linked to National Death Index (NDI) data from 1986-2018 [
Ostrenga JS, Brown AW, Elbert A, Fink AK, Faro A, Marshall BC, et al. Impact of loss to follow-up on survival estimation for cystic fibrosis Annals of Epidemiology. At review.
]. We excluded death data among those born prior to 1986 to avoid potential immortal time bias as deaths that occurred prior to 1986 could not be reported in the CFFPR. Using the Kaplan-Meier method, we extracted the mean and 95% upper and lower confidence limits of cumulative survival through 2020 for each birth cohort stratum. The cumulative survival estimates per birth cohort were then used to back-extrapolate cumulative survival proportions for births from 1968 to 1985 using a linear model (mean, upper and lower estimates of survival were extracted and modeled separately). For each iteration of the simulation, we sampled from a uniform distribution defined by these upper and lower confidence limits. We tested several alternate approaches (see Supplement) to estimate cumulative survival for birth cohorts 1968-1985 for comparison. We compared all death estimates from the model to those reported by the Centers for Disease Control and Prevention (CDC) WONDER [
Centers for Disease Control and Prevention National Center for Health Statistics. National Vital Statistics System, Mortality: Compressed Mortality File 1999-2020 on CDC WONDER Online Database, released 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program 2022.
National Center for Health Statistics Centers for Disease Control and Prevention. National Vital Statistics System. Mortality: Compressed Mortality File 1968-1978. CDC WONDER Online Database, compiled from compressed mortality file CMF 1968-1988. 2022. Series 20, No. 2A, 2000.
National Center for Health Statistics. Centers for Disease Control and Prevention. National Vital Statistics System Mortality: Compressed Mortality File 1979-1998. CDC WONDER on-line database, compiled from compressed mortality file CMF 1968-1988.
Finally, the total prevalent CF population was calculated by multiplying incident CF births by the cumulative proportion assumed to have died by the end of 2020 for every birth cohort. This quantity (in aggregate or by birth year) serves as the denominator to calculate the proportion of prevalent individuals with CF participating in the CFFPR. We first included any individual with a CF diagnosis born between 1968-2020 and no reported death date, minus the total number of individuals matched to NDI records (by birth cohort). This first estimate quantifies the proportion of prevalent individuals who have ever participated in the CFFPR (at any point in their lifetime). We then quantified recent participation calculated as the total number of individuals with no reported death date who contributed data to the CFFPR in either 2019 or 2020. We then subtracted CFFPR participant counts from the number of prevalent individuals to estimate the absolute number of people who did not participate in the CFFPR. All final point estimates represent the median and the 95% Uncertainty Intervals (UI) reflect the 2.5th and 97.5th percentiles of the 500 iterations. Statistical analyses were implemented using SAS version 9.4 (Cary, NC).
We did not estimate historical incidence of CF in US territories or account for migration. As the model excluded foreign and territorial births, these individuals were also excluded from calculations of CFFPR participation. We summarized the number of CFFPR participants born prior to 1968 as well as those foreign born.
3. Results
We estimated the prevalence of CF among those born between 1968 and 2020 in the US in 2020 was 38,804 prevalent individuals (95% UI: 38,532, 39,065). We simulated a median total of 53,900 (95% UI: 53,692, 54,114) incident births between 1968 – 2020 (Fig. 2), of which 45,714 were among the White non-Hispanic population, 2,482 among the Black non-Hispanic population, 930 among other race/non-Hispanic individuals and 4,780 among individuals with any Hispanic ethnicity; model results were compared to CFFPR reported births (Supplement Figs. 2-5).
Fig. 2Total CF births (median and 95% UI) predicted in simulation.
We estimated a cumulative total of 15,100 deaths (95% UI: 14,850, 15,322) among individuals born from 1968-2020. Extrapolation of Kaplan-Meier survival estimates assumed between 70.2% and 80.1% of the 1968 birth cohort did not survive through 2020; we assumed >98% of individuals born since 2010 are still alive (Supplement Table 4). Median predicted deaths and 95% UI by birth year are presented in Fig. 3 and are consistent with CDC reported CF deaths (Supplement Fig. 10). Sensitivity analysis with a quadradic back-extrapolation of survival estimates resulted in higher median death estimate of 16,583 (95% UI: 16,414, 16,753), with corresponding lower estimates of the 2020 CF population of 37,316 (95% UI: 37,129, 37,514). Sensitivity analysis allowing larger uncertainty assumptions for the linear back-extrapolation estimated prevalence at 38,869 (95% UI: 38,366, 39,405).
Fig. 3Total deaths (median and 95% UI) predicted in simulation.
As of 2020, there were 36,518 registry participants born in the US between 1968-2020 with a CF diagnosis and no reported death date, which included 5,014 individuals with missing state of birth (assumed US born). We removed an additional 1,769 individuals who matched to an NDI record among CFFPR participants lost to follow-up resulting in a total of 34,749 individuals alive in 2020 who have ever participated in the CFFPR at some point in their lifetime (90% of prevalent individuals). To assess if misclassification of CF diagnosis could bias our results, we calculated the proportion missing confirmatory diagnostic testing. A total of 479 individuals were missing a sweat test value and had no report of CFTR genotyping. Among CFFPR participants born in 1968, 11% were missing both sweat chloride values and genotype data compared to <1% of participants born 1987 or later. Admittedly, completeness of data entry may confound this assessment.
Restricting the CFFPR population to those with any data reported in 2019 or 2020 (excluding foreign and territorial births, as well as those born prior to 1968), exactly 30,000 people participated in the CFFPR (77% of the prevalent population). CFFPR participation is differential by age (Fig. 4), with the highest participation in children. Among individuals <15 years of age, participation in 2019 or 2020 was between 85% and 93%. In contrast, among adults born between 1968-1975, we estimated 54.8% (95% UI: 48.3%, 63.1%) contributed recent data. Among those born in 1968, we estimated 138 individuals (95% UI: 90, 185) did not recently participate in the CFFPR (51% of those estimated alive) compared to 47 individuals (95% UI: 46, 48) born in 2020 (Fig. 5). Since 42 states and the District of Columbia did not implement NBS until 2005-2009 and testing is imperfect, there are still undiagnosed CF cases even in the pediatric population and there remain individuals who will enter the CFFPR at older ages. In total, individuals who entered the CFFPR at age 30 or older accounted for 4% to 7% of the model-estimated CF population in each birth cohort 1968-1980. While we did not model CF among those born prior to 1968, there were 1,642 individuals with CF born prior to 1968 and a total of 470 individuals identified as foreign births 1968-2020 reported to the CFFPR in 2019 or 2020.
Fig. 4Median proportion and 95% UI of individuals who participate in the CFFPR, by year of birth.
The objective of this study was to estimate the prevalence of CF in the United States and quantify CFFPR participation in 2020. We estimated overall CFFPR participation in 2019 or 2020 at 77% and show lower CFFPR participation among older individuals compared to children. While we are unable to identify the factors that contribute to less participation among adults, as survival increases [
] and the number of CF births decreases, loss-to-follow-up may play a role. The CFFPR reported a lost-to-follow-up proportion of 2.9% each year from 2018 through 2020 (approximately 900 individuals per year) [
]. In the pediatric population, our estimates show CFFPR participation exceeded 90% in children ages 10 years and younger, which may reflect NBS programs referring infants to CF care programs for diagnosis, improving subsequent retention in CF care and CFFPR participation among children.
The primary strength of this analysis is the use of simulation to quantify uncertainty in CF incidence, accounting for state-level NBS implementation as well as updated cumulative survival through 2020, which better reflects possible variation in CF incidence and survival from year to year. This is important for older birth cohorts as we have less confidence in total births estimated from CF incidence prior to NBS. Given that NBS programs reduced the dependency between CF-specific symptoms and age of CF diagnosis at the population-level [
], we assume higher confidence in the detection of incident CF post-NBS compared to the pre-NBS era, reflected by narrower uncertainty intervals. While there is circularity in using CFFPR-reported incidence to estimate total CF births, our assumptions are conservative compared to the Wisconsin NBS trial which reported that 98% of screened children were diagnosed by one year of age [
In estimating CF births, we assume published CF incidence rates by race or ethnicity reflect plausible bounds for true CF incidence and no factors contributed to declining incidence of CF beyond secular trends in fertility prior to NBS. We tested several published estimates of CF incidence in the White non-Hispanic population and found they likely under-estimate CF when compared with CF diagnoses reported to CFFPR. Bias in prior CF incidence estimates may result from reliance on CF diagnoses reported in the pre-NBS era subject to under-reporting [
]. We further assumed that race and ethnicity as reported in US vital statistics data reflects a common genetic risk of CF for all births in each category. Given that race and ethnic classifications are social constructs [
], there is certainly misclassification of CF risk by reported race or ethnicity. Nevertheless, our median estimates of total births align with the pattern of CFFPR reported births by race or Hispanic ethnicity, suggesting our assumptions are plausible at the population level. While this is the first analysis to account for differential detection of CF via NBS by race or ethnicity in the US, we did not account for changes to NBS algorithms over time as many states have since added or expanded genetic testing [
Clarification of laboratory and clinical variables that influence cystic fibrosis newborn screening with initial analysis of immunoreactive trypsinogen.
]. Although we adjust for CFFPR non-consent among incident CF births post-NBS, we cannot determine if consent patterns have changed over time. The COVID-19 pandemic may have delayed the consent process for children born in late 2019 and 2020, under-estimating CF births in those years.
We assumed the risk of death through 2020 was the same among all prevalent individuals living with CF (within a birth cohort), regardless of transplant status or CFFPR participation. It is possible that older individuals born pre-NBS (and even now given limitations in NBS and heterogeneity in comprehensiveness of genetic testing, as well as the presence of diagnosis bias particularly in non-White populations) who do not present with “typical” CF symptoms are yet-to-be-diagnosed [
]. Such individuals would not be captured in the CFFPR, and we are unable to quantify any survival advantage or disadvantage they may experience. We did not estimate survival proportions by race or ethnicity, due to both potential immortal time bias due to later age at diagnosis (individuals who died undiagnosed cannot be reported as CF) as well as small numbers of individuals per birth cohort among the Black/non-Hispanic and Other/non-Hispanic categories. Deaths that occur in early infancy in the post-NBS era should not bias prevalence given their small numbers: CDC reports 4 total deaths associated with CF among individuals less than 1 year of age in 2020 [
Centers for Disease Control and Prevention National Center for Health Statistics. National Vital Statistics System, Mortality: Compressed Mortality File 1999-2020 on CDC WONDER Online Database, released 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program 2022.
]. Among lung transplant recipients, approximately 720 people have no reported death date but did not provide data to the CFFPR in 2019 or 2020; if we over-estimated survival for this group it is unlikely this bias would greatly impact the population estimate.
We are unable to account for migration in the model as it would require data on immigration patterns for people with CF from 1968 to 2020 as well as CF incidence and survival for countries of origin, much of which is unknown. We acknowledge that 14% of registry participants have no state of birth reported; some of these individuals may be immigrants. Given that the majority of individuals who immigrated to the United States in the past 30 years were from settings with lower CF incidence [
], we cannot assume the proportion of foreign births reported in the US for a given year is the same for those who participate in the CFFPR.
While it is possible to operationalize the simulation for any birth cohort, we chose to focus on people with CF born 1968-2020. Data on US live births by race or ethnicity are increasingly subject to bias in historical time periods as reporting was less detailed. For example, to estimate incidence of CF among individuals with Hispanic ethnicity, we based our assumptions using 1989 data for live births between 1968-1988, which is likely inaccurate. In terms of deaths, since CDC cause of death data are not reported prior to 1968, we do not have additional sources of data to assess assumptions related to CF survival among individuals born in earlier years. Virtually no population-level data exist to estimate survival probabilities for individuals born before the introduction of modern methods [
] for CF diagnosis (the Cooke & Gibson method was introduced in 1959, the Macroduct system® has been available since 1983; the CFTR gene was not identified until 1989). Survival estimates for birth cohorts from the 1940s onwards published from registry data in the United Kingdom provide a visual comparison of mean estimates, but do not report enough detail to parameterize model inputs or uncertainty [
Our analysis suggests over 38,000 individuals with CF born 1968-2020 are currently living in the United States, a prevalent population as large as 40,000 people accounting for the 1,600 individuals born in earlier years with recent CFFPR participation. We demonstrate recent CFFPR participation exceeds 90% among children born in the past 10 years and speculate NBS programs have led not only to more timely diagnosis but also earlier referral to the CF care model and possibly improved CFFPR participation.
Mechanisms driving retention in the CF care network among adults with CF are complex, as the adult CF care model did not reach broad geographic scale until the early 2000s [
]. We hypothesize that among those diagnosed at older ages, some individuals have an established relationship with a provider outside of the CF care network and not all young adults transition to an adult CF center. In the post-transplant population, some choose to be followed by their transplant center which may not report to the CFFPR. Additional factors such as healthcare access and affordability also influence participation in the CF care model, which subsequently impacts CFFPR data availability. If the healthcare needs of an increasingly older CF population become more individualized, the CFFPR will need to consider novel mechanisms of data capture to accommodate changing patterns in CF care. Looking to the future, the prevalence of CF may continue to increase as CF survival increases resulting in growth in CF center patient volumes, influencing CFFPR participation. Improvements in CF diagnosis and increased awareness of CF as disease that affects non-White populations coupled with improved gene sequencing techniques may also result in more diagnoses, particularly among adults. In spite of lower participation among older adults, the CFFPR remains the most representative source of data on individuals with CF, with 90% of the prevalent population contributing CFFPR data at some point in their lives.
CRediT authorship contribution statement
Elizabeth A. Cromwell: Conceptualization, Methodology, Visualization, Formal analysis, Writing – original draft. Joshua S. Ostrenga: Data curation, Methodology, Writing – review & editing. Jonathan V. Todd: Data curation, Methodology, Writing – review & editing. Alexander Elbert: Data curation, Investigation, Writing – review & editing. A. Whitney Brown: Writing – review & editing. Albert Faro: Writing – review & editing. Christopher H. Goss: Writing – review & editing. Bruce C. Marshall: Supervision, Writing – review & editing.
Conflict of interest statement
The authors have no conflict of interest to disclose.
Acknowledgements
The authors would like to thank the Cystic Fibrosis Foundation for the use of CF Foundation Patient Registry data to conduct this study. Additionally, we would like to thank the patients, care providers, and clinic coordinators at CF centers throughout the United States for their contributions to the CF Foundation Patient Registry. This work was funded by the Cystic Fibrosis Foundation.
Centers for Disease Control and Prevention National Center for Health Statistics. National Vital Statistics System, Mortality: Compressed Mortality File 1999-2020 on CDC WONDER Online Database, released 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program 2022.
Ostrenga JS, Brown AW, Elbert A, Fink AK, Faro A, Marshall BC, et al. Impact of loss to follow-up on survival estimation for cystic fibrosis Annals of Epidemiology. At review.
National Center for Health Statistics Centers for Disease Control and Prevention. National Vital Statistics System. Mortality: Compressed Mortality File 1968-1978. CDC WONDER Online Database, compiled from compressed mortality file CMF 1968-1988. 2022. Series 20, No. 2A, 2000.
Clarification of laboratory and clinical variables that influence cystic fibrosis newborn screening with initial analysis of immunoreactive trypsinogen.