If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Université Paris René Descartes, Sorbonne Paris Cité, Paris, FranceAP-HP, Hôpital Cochin, Service de Physiologie et Explorations Fonctionnelles, Paris, France
Nasal Potential Difference (NPD) is a biomarker of CFTR activity used to diagnose CF and monitor experimental therapies. Limited studies have been performed to assess agreement between expert readers of NPD interpretation using a scoring algorithm.
Methods
We developed a standardized scoring algorithm for “interpretability” and “confidence” for PD (potential difference) measures, and sought to determine the degree of agreement on NPD parameters between trained readers.
Results
There was excellent agreement for interpretability between NPD readers for CF and fair agreement for normal tracings but slight agreement of interpretability in indeterminate tracings. Amongst interpretable tracings, excellent correlation of mean scores for Ringer's Baseline PD, Δamiloride, and ΔCl-free+Isoproterenol was observed. There was slight agreement regarding confidence of the interpretable PD tracings, resulting in divergence of the Ringers and Δamiloride, and ΔCl-free+Isoproterenol PDs between “high” and “low” confidence CF tracings.
Conclusion
A multi-reader process with adjudication is important for scoring NPDs for diagnosis and in monitoring of CF clinical trials.
Nasal Potential Difference (NPD) measurements have been important for diagnostic evaluations of cystic fibrosis (CF) since the technique was developed almost 40 years ago [
Results of a phase IIa study of VX-809, an investigational CFTR corrector compound, in subjects with cystic fibrosis homozygous for the F508del-CFTR mutation.
No detectable improvements in cystic fibrosis transmembrane conductance regulator by nasal aminoglycosides in patients with cystic fibrosis with stop mutations.
While improved NPD methods now allow for the use of NPD in multi-center trials with electronic data capture and blinded interpretation that have improved its reliability [
] and has been agreed upon by the US TDN and EU CTN, there has not yet been a proposed standardized interpretation protocol to ensure uniform interpretation within clinical trials. Furthermore, a standardized scoring system allows for a multiple reader approach to adjudicate the inclusion of questionable tracings, which could improve the performance of NPD, particularly for smaller early phase and proof-of-concept clinical trials.
In the present study, we investigated the agreement of expert NPD readers using a standardized scoring algorithm originally developed at the Center for CFTR detection at UAB. Six expert NPD readers were trained on this algorithm, which provides a method for quantifying NPD values and assigns a 2-tier approach to rating the quality of NPD tracings: a lower stringency rating determines “interpretability” (i.e. whether a tracing should be included in a dataset), and a higher stringency tier which determines if tracings are “high” or “low confidence” for measurements of sodium and chloride. Here we evaluated a scoring system and measured inter-reader agreement that included expert readers from two continents.
2. Methods
2.1 Development of the scoring system
The UAB Center for CFTR Detection (CCD) has previously developed a scoring algorithm for use in multi-center clinical trials of CFTR modulators employing NPD as an outcome for CFTR function. This scoring system was reviewed by the 2 co-directors of the CCD and 4 additional expert readers in the Therapeutics Development Network (TDN) and the European Clinical Trials Network (CTN).
This scoring system was developed to assess both a stringent criteria of “interpretability” based on completeness of the tracing protocol, biological plausibility of the response data, and appropriate control responses, which enhances and standardizes the current quality standards for tracings using the current standard operating procedures employed by the Therapeutics Development Network. In addition, we analyzed a less stringent “confidence” score that reflects more subtle abnormalities in the tracing that do not alter its ability to be analyzed. As measures of sodium and chloride transport are derived from different time parts of the tracing, each component was scored separately. The criteria for rating “interpretability” and “confidence” are detailed in Table 1. The resultant scoring system was approved by the review committee.
Table 1Criteria for interpretability and confidence of NPD analysis.
Intepretability — “Intepretable” = absence of all of the following
1. Missing portion of tracing
2. Incomplete tracing
3. >1 mV shift in the last 30 s of each perfusion tracing
4. Displaced catheter without recovery to pre-displacement value
After approval each committee member was acquainted with the scoring algorithm in an in-person training session as well as provided with review materials that detail the scoring system. Examples of “interpretable” tracings are detailed in Fig. 1A (high confidence) and Fig. 1B (low confidence). Examples of uninterpretable tracings are shown in Fig. 1C–D. The original lab chart files corresponding to these examples are shown in Supplementary Fig. 1.
Fig. 1A. Training example of high confidence, interpretable CF tracing demonstrating lack of interfering artifacts, shifts, or catheter displacements with biologically plausible values and absence of missing or incomplete tracing portions. B. Training example of low-confidence, interpretable CF tracing due to excessive artifacts interfering with interpretability but displaying biologically plausible responses and the absence of missing or incomplete portions of the tracings. C. Training example of an uninterruptable CF tracing due to an incomplete chloride portion of the tracing due to catheter displacement with biologically implausible responses to amilioride and ATP controls. D. Training example of an unintepretable CF tracing due to excessive artifacts affecting the signal-to-noise ratio throughout with biologically implausible responses to amilioride and ATP controls.
After appropriate training, each reader was assigned blinded tracings to determine the correlation of qualitative scores and quantitative values for key NPD parameters including: Ringer's Baseline potential difference (PD), ΔAmiloride PD, and ΔCl-free+Isoproterenol PD. These key PD measures (Ringer's Baseline potential difference (PD), ΔAmiloride PD, and ΔCl-free+Isoproterenol PD) were quantified by the mean of the last 10 s of the perfusion as previously described [
]. Each reader was assigned 40 single-nostril CF, 40 single-nostril non-CF, and 20 indeterminate (non-diagnostic) tracings for CFTR function. Tracings were chosen from recent clinical trial databases and our database of diagnostic NPDs sent for interpretive over-read to our center. Since the scoring system was developed for individual nostril tracings, the readers analyzed tracings in this manner. All NPDs were collected using the Therapeutics Development Network standard Operating Procedure (NPD SOP 528.0), and conformed to general quality standards in the document at that time. Each NPD was selected at random and blinded by study staff before review by expert readers. The diagnosis of “CF” was defined by the presence of sweat chloride ≥60 meq/L and/or 2 disease-causing mutations on CFTR genetic analysis [
]. “Indeterminate” was defined by the presence of a questioned CF clinical diagnosis with indeterminate sweat chloride values (40–60 meq/L) and <2 CFTR causing mutations on CFTR genetic analysis. We grouped level of agreement into 3 categories: “complete” (all 6 readers agree), “moderate” (4–5 of 6 readers agree), and “poor” (only 3 readers agree).
2.4 Statistical analysis
After unblinding at the CCD, kappa statistics (κ) for inter-reader comparisons were calculated for “interpretability” and “confidence” of the ENaC-mediated portion of the tracing and the CFTR-mediated portions of the tracing. Significant correlation was determined when p < 0.05 (ANOVA). Comparative statistics were calculated using SPSS 13.0 (IBM Corporation, Armonk, NY) and Graph Pad Prism 6.0 (GraphPad Software, Inc., La Jolla, CA). The level of agreement was assessed as previously described [
In addition, the intra-class correlation coefficient (ICC) for all quantitative values of sodium (Ringer's Baseline PD, ΔAmiloride PD) as well as chloride measures (ΔCl-free+Isoproterenol PD) were assessed using SPSS. Significant ICC were defined as p < 0.05 (ANOVA).
Finally, to determine the effect of “confidence” on the key NPD parameters assessed in the study on the mean quantitative data, we calculated the mean of each quantitative value from “high” versus “low” confidence tracings. For this study, “high” confidence was achieved when all the readers scored the tracing as “high” confidence. “Low” confidence was defined in tracings in which 1 or more of the readers marked the tracing as “low” confidence. The mean quantitative values for the tracings were compared with significant differences defined by p < 0.05 for “high” versus “low” confidence tracings (ANOVA).
3. Results
3.1 Inter-reader agreement of interpretability
Fewer indeterminant tracings (40%) and non-CF tracings (80%) were deemed interpretable compared to CF tracings (93%). We found complete agreement in only 8 of 20 tracings amongst indeterminant tracings and 32 of 40 non-CF tracings (Fig. 2A ). The remaining distribution of level of agreement of “interpretability” is shown in Fig. 2A. The agreement of qualitative scoring is shown in Table 2.
Fig. 2A. Histogram showing higher level of agreement of “interpretability” for CF and non-CF amongst readers (n = 40 CF and non-CF tracings and n = 20 “indeterminate” tracings) reveals that CF and non-CF tracings display larger levels of complete agreement by readers regarding “interpretability” B. Histogram showing higher levels of agreement of “confidence” for sodium measures for CF and non-CF tracings amongst readers (n = 40 CF and non-CF tracings and n = 20 “indeterminate” tracings). C. Histogram showing higher level of agreement of “confidence” for chloride measures CF and non-CF tracings amongst readers (n = 40 CF and non-CF tracings and n = 20 “indeterminate” tracings).
All of the tracings deemed interpretable by at least one reader were subsequently analyzed for confidence of the sodium-dependent and chloride-dependent portions of the tracing, a higher stringency criteria that also had some subjectivity (see Table 2). There was fair agreement of confidence ratings for both sodium and chloride measures in CF tracings (κ, 0.206 and 0.283, respectively, p < 0.001 for both), although this was less than the agreement observed for interpretability. We observed complete agreement of confidence level in sodium PD measures in only 19 of 40 CF tracings, 18 of 40 non-CF tracings, and 0 of 20 indeterminate tracings (Fig. 2B). The remaining distribution of level of agreement of “interpretability” is shown in Fig. 2B. A similar distribution of agreement was observed in chloride transport measures amongst the readers (Fig. 2C).
In contrast to the general agreement of confidence in CF tracings, there was slight agreement of confidence for non-CF tracings, although it was statistically significant for sodium transport (κ = 0.188, p < 0.05) but poor for chloride measures (κ = 0.081, p = NS). There was slight agreement of confidence observed in either sodium or chloride measures for indeterminate tracings (κ, −0.078 and −0.091, respectively, both p = NS).
3.2 Inter-reader agreement of quantitative NPD values
CFTR and ENaC activity were determined by quantitative metrics by each of 6 reviewers per standard operating procedure. We sought to determine the inter-reader agreement of quantitative measures of key NPD parameters from interpretable tracings. When CF tracings were analyzed, we found excellent correlation of quantitative data for Ringer's Baseline PD, Δamiloride PD, and ΔCl-free+Isoproterenol PD (0.989, 0.968, and 0.976, respectively, all p < 0.001) for all six readers. Similarly strong correlations for mean scores of Ringer's Baseline PD, Δamiloride PD, and ΔCl-free+Isoproterenol PD were also observed for non-CF tracings (each = 1.00; all p < 0.001) and for “indeterminate” NPDs (0.997, 0.0998, and 0.998, respectively; all p < 0.001), see Table 3. These data demonstrate that despite slight differences in qualitative interpretability ratings, NPD values for both chloride and sodium measures were extremely close, no matter what the underlying diagnosis of the tracing is.
Table 3Intraclass correlation of quantitative score amongst 6 NPD scorers.
CF
Non-CF
Indeterminate
ICC
P-value
ICC
P-value
ICC
P-value
Ringer's
0.989
<0.001
1.000
<0.001
0.997
<0.001
ΔAmiloride
0.968
<0.001
1.000
<0.001
0.985
<0.001
ΔCl-free+Isoproterenol
0.976
<0.001
1.000
<0.001
0.998
<0.001
ICC, Intraclass Correlation Coefficient; NS = Not Significant.
Due to reduced agreement of confidence amongst readers, especially in indeterminate NPD tracings, we sought to determine if confidence level affected quantitative NPD data. We grouped all tracings deemed “high confidence” versus tracings deemed “low confidence” by one or more readers. For CF tracings, reduced confidence was associated with diminished Ringer's Baseline PD (−35.6 mV for high versus −27.0 mV for low, p = 0.005), ΔAmiloride PD (21.3 mV for high versus 16.6 mV for low, p < 0.05), and ΔCl-free+Isoproterenol (3.8 mV for high versus 0.2 mV for low confidence, p < 0.001). Confidence level of non-CF tracings also was associated with a significant divergence of the Ringer's Baseline PD (−15.5 mV for high versus −10.1 mV for low confidence, p < 0.001), ΔAmiloride PD (high, 9.6 mV versus 5.8 mV for low, p < 0.01), but not ΔCl-free+Isoproterenol (−19.1 mV for high versus −17.4 mV for low confidence, p = NS). For indeterminate tracings, no divergence was seen for Ringer's Baseline PD (−14.1 for high vs −13.9 low confidence, p = NS), ΔAmiloride PD (5.7 mV high vs 5.2 mV for low confidence, p = NS), or ΔCl-free+Isoproterenol (−5.1 high vs −4.6 mV for low confidence, p = NS, Table 4). These data indicate that reduced confidence is associated with diminished values for both CF and non-CF tracings; this could be because less dynamic tracings influence confidence level, or alternatively that poor quality tracings that dampen confidence are also less responsive because of variability of the catheter to detect small changes in the PD.
Table 4Quantitative comparison of high vs low confidence tracings.
This study represents the first systematic investigation of a multi-reader scoring process for NPD measurements and included both quantitative and qualitative metrics to assist with an adjudication process. Such a system is important for ensuring that accurate and acceptable tracings are included for analyses in clinical trials employing NPD. In this study, we developed a standardized scoring system and demonstrate feasibility to train multiple readers in this process.
After training, the pool of readers demonstrated good correlation of the key qualitative score for interpretability, although this was diminished for indeterminate tracings. There was excellent agreement amongst the readers about the interpretability of CF, whereas there was fair agreement for non-CF (κ, 0.388, p < 0.001) and only slight agreement for indeterminate tracings. This is in agreement with a recent report by Naerhlich et al. demonstrating poor inter-reader agreement for indeterminate tracings [
]. This suggests that caution must be employed when using NPD to confirm a diagnosis of CF in this patient population. In addition, clinical trials of CFTR-directed should exclude patients without clear evidence for CF on NPD despite the clinical diagnosis, as higher variability and poorer signal-to-noise seems to exist in this patient population. Finally, in our study, we found more disagreement than in a previous report of inter-reader agreement regarding non-CF tracings [
]. However, when there was agreement on interpretability, the quantitative data values of key PD measures were consistent. This suggests that a multi-reader adjudication process may improve precision for NPD-based diagnosis and monitoring in clinical trials.
Amongst the interpretable tracings, which are the clear majority (≥90% in recent studies) of tracings performed in a clinical trial setting, we observed excellent correlations for quantitative scores amongst readers. This indicates that interpretable tracings result in highly consistent and reproducible quantitative values. In addition, it suggests that a standardized electronic scoring process can be implemented in a trans-continental fashion without excess variance amongst readers.
In contrast to interpretability, there was significant disagreement amongst readers regarding confidence assessments. We posit that this reflects the stringent nature of the confidence scoring system as compared to the interpretability scoring, which was less subjective in nature. We also postulate that reduced agreement between reviewers is due either to inherent biological differences in individuals that generate indeterminate tracings (i.e. patients with CFTR-related disorders or acquired CFTR dysfunction due to smoking or COPD) who exhibit less polarized ion transport responses that are reflected in the interpreters' evaluation of confidence. In addition this observation may be due to influential bias that these tracings did not fit a reader's pre-conceived notion of “normal values” resulting in falsely reduced confidence in the tracing. Alternatively, poor tracings may be due to technical limitations of the device, especially due to high resistance agar electrodes that have frequent electrical artifacts that can diminish otherwise normal polarized responses and may be perceived as reducing confidence due to artifact. Regardless, this reflects the need for more training on objective assignment of confidence scores to ensure that bias is removed; whereby, if successfully implemented, this could support use of poor confidence scores derived from multiple readers to eliminate poor quality tracings which can diminish discrimination between treatments in the setting of controlled trials.
This study has several strengths and also limitations that should be noted. We rigorously trained expert readers to read blinded tracings in a protocolized fashion. In addition, to improve rigor and external validity, we randomly selected tracings performed by trained and qualified operators that were performed in the setting of clinical trials conducted within the CF Therapeutics Development Network. However, we did not perform repeated or follow-up training to readers and assess whether additional training improved divergence of confidence scores. We note that in prior studies (unpublished), confidence score assigned by a single interpreter did not improve discriminating capacity of effective CFTR-directed therapeutics, thus the utility of this venture may not yield positive results.
In conclusion, we show that a multi-interpreter scoring system for NPD is both plausible and feasible, and yields remarkably similar quantitative results between readers. This analysis sheds light on the divergent nature of qualitative assessment of tracings in the indeterminate range, which may be particularly important to understand the efficacy of CFTR modulators in the context of clinical trials. Future studies will be conducted to determine the importance of subgroup analysis of tracings based on confidence level and may strengthen the application of a multi-interpreter process in the future.
The following are the supplementary data related to this article.
Supplementary Fig. 1A. LabChart 7 file training example of high confidence, interpretable CF tracing demonstrating lack of interfering artifacts, shifts, or catheter displacements with biologically plausible values and absence of missing or incomplete tracing portions, and stable baseline at end of perfusions. B. LabChart 7.0 training example of low-confidence, interpretable CF tracing due to excessive artifacts interfering with interpretability but displaying biologically plausible responses and the absence of missing or incomplete portions of the tracings. C. LabChart 7.0 training example of an uninterruptable CF tracing due to an incomplete chloride portion of the tracing due to catheter displacement with biologically implausible responses to amilioride and ATP controls. D. LabChart 7.0 training example of an unintepretable CF tracing due to excessive artifacts affecting the signal-to-noise ratio throughout with biologically implausible responses to amilioride and ATP controls. In all panels, horizontal axis is time (small boxes correspond to 60 s intervals).
G.M.S. and S.M.R. designed the study. G.M.S., I.S.G, I.F, M.W., F.V., B.L., and S.M.R performed experiments and NPD analyses. G.M.S., S.M.R., and B.L. analyzed the data. G.M.S. and S.M.R. prepared the manuscript. All authors reviewed the manuscript and approved of the final version of the manuscript before submission.
Sources of funding
The authors gratefully acknowledge funding support for this work from the NIH (NIH P30 DK072482 to S.M.R. and G.M.S.), the Cystic Fibrosis Foundation (CLANCY09Y0 to G.M.S. and SORSCH15RO to G.M.S.).
Conflicts of interest
None.
References
Knowles M.R.
Carson J.L.
Collier A.M.
Gatzy J.T.
Boucher R.C.
Measurements of nasal transepithelial electric potential differences in normal human subjects in vivo.
Results of a phase IIa study of VX-809, an investigational CFTR corrector compound, in subjects with cystic fibrosis homozygous for the F508del-CFTR mutation.
No detectable improvements in cystic fibrosis transmembrane conductance regulator by nasal aminoglycosides in patients with cystic fibrosis with stop mutations.