Accurate identification of COVID-19 diagnosis in patient medical records is essential for studies using administrative data to examine morbidity, mortality, and risk factors associated with COVID-19.1 Before April 1, 2020, the Centers for Disease Control and Prevention suggested using the existing International Statistical Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) code B97.29 (other coronavirus as the cause of diseases classified elsewhere) as the primary diagnostic code for patients infected with COVID-19.2 On April 1, 2020, a new code U07.1 (2019-nCoV acute respiratory disease) was added to ICD-10-CM3 and was rapidly adopted by hospitals.4 Our study examined how nonhospital and hospital health care professionals have used these diagnostic codes in practice using a national medical claims data set in the US. We analyzed the comorbidities associated with COVID-19 diagnosis to assess the specificity of the legacy code and the importance of using both codes.
In this quality improvement study, we used the deidentified Clinformatics Data Mart Database (Optum), which comprises commercial and Medicare Advantage health plans members who are similar to the US commercially insured population with respect to demographic characteristics.5 Our analytic sample contained the longitudinal medical records of 28 853 694 patients across the US from January 1, 2018, to September 30, 2020. We examined the frequency of B97.29 and U07.1 in 2020 to understand their adoption by hospital and nonhospital health care professionals. We only considered the first encounter to avoid double counts. Because the legacy code may be used for other coronavirus incidents, we identified the most common co-occurring diagnoses before and after January 1, 2020, and calculated the correlation of their frequency using the SciPy package in Python, version 1.5.2. The same analysis was conducted for U07.1 for comparison. This study was approved by the Indiana University institutional review board and followed the Standards for Quality Improvement Reporting Excellence () reporting guideline. Owing to the use of deidentified patient data, the need for informed consent was waived by the institutional review board.6
Of the 18 975 615 patients (mean [SD] age, 47.9 [24.1] years; 9 832 556 women [51.8%]; 9 143 059 men [48.2%]) in the data set between January 1 and September 30, 2020, 26 414 (0.14%) were diagnosed with B97.29, and 279 066 (1.47%) were diagnosed with U07.1. The number of patients with a B97.29 code increased in March 2020 but rapidly diminished after the introduction of U07.1 (Figure). Although hospitals stopped using the legacy code shortly after the introduction of U07.1, some nonhospital health care professionals continued to use it (Figure). In 2020, 6 out of the 10 most frequent diagnoses that co-occurred with B97.29 were associated with COVID-19 according to the Centers for Disease Control and Prevention guideline,3 whereas in 2018 and 2019, only 1 out of 10 most frequent diagnoses that co-occurred with B97.29 was associated with COVID-19 (Table). The frequency of diagnostic codes that co-occurred with B97.29 in 2020 was more closely correlated with the frequency of those with U07.1 (Pearson r, 0.92; P < .001) than those with B97.29 in 2018 and 2019 (Pearson r, 0.58; P < .001). Using only U07.1 to identify patients with COVID-19 after April 1, 2020, missed 9714 patients diagnosed only with B97.29, consisting of 3.4% among 286 161 patients with either the legacy or new codes. However, the number of false positives due to screening B97.29 for patients with COVID-19 would be small, because the code’s drastic increase in 2020 can be attributed to COVID-19-related symptoms (Table).
Using a hospital discharge data set, Kadri et al4 showed that the legacy code B97.29 was quickly replaced by U07.1, and its use decreased to prepandemic levels. Our quality improvement study confirmed their findings about hospitals by using large-scale medical claims data, but our results suggest that some nonhospital health care professionals have continued to use B97.29 for COVID-19 diagnoses in 2020. It is possible that patients diagnosed with B97.29 were infected with coronavirus diseases other than COVID-19, and our findings may not generalize to other data sources. However, future research on COVID-19 using claims data should consider both B97.29 and U07.1 when identifying patients with COVID-19 to avoid introducing a systematic bias across hospital and nonhospital health care professionals given the strong socioeconomic disparity in rates of COVID-19 testing and infection.
Accepted for Publication: July 8, 2021.
Published: September 8, 2021. doi:10.1001/jamanetworkopen.2021.24643
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Yang KC et al. ÌÇÐÄvlog Open.
Corresponding Author: Brea L. Perry, PhD, Department of Sociology, Indiana University-Bloomington, 1020 E Kirkwood Ave, 767 Ballantine Hall, Bloomington, IN 47405 (blperry@indiana.edu).
Author Contributions: Mr Yang and Dr Perry had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Yang, Lee, Ahn.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Yang.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Yang.
Obtained funding: Ahn, Perry.
Administrative, technical, or material support: Yang, Ahn, Perry.
Supervision: Ahn, Perry.
Conflict of Interest Disclosures: None reported.
Funding/Support: This research was funded by grant R01 DA039928 from the National Institute on Drug Abuse (Dr Perry).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The findings and conclusions in this study are those of the authors and do not necessarily represent the official position of the National Institute on Drug Abuse.
Additional Contributions: We thank Kosali Simon, PhD, and the College of Arts and Sciences at Indiana University-Bloomington for their support. They did not receive financial compensation for their contribution.
1.Schwab
 P, Mehrjou
 A, Parbhoo
 S,
 et al.  Real-time prediction of COVID-19 related mortality using electronic health records.   Nat Commun. 2021;12(1):1058. doi:
2.Centers for Disease Control and Prevention. ICD-10-CM official coding guidelines—supplement coding encounters related to COVID-19 coronavirus outbreak. February 20, 2020. Accessed October 9, 2020.
3.Centers for Disease Control and Prevention. ICD-10-CM official coding and reporting guidelines: April 1, 2020 through September 30, 2020. Accessed October 9, 2020.
4.Kadri
 SS, Gundrum
 J, Warner
 S,
 et al.  Uptake and accuracy of the diagnosis code for COVID-19 among US hospitalizations.  Ìý´³´¡²Ñ´¡. 2020;324(24):2553-2554. doi:
5.Wallace
 PJ, Shah
 ND, Dennen
 T, Bleicher
 PA, Crown
 WH.  Optum Labs: building a novel node in the learning health care system.   Health Aff (Millwood). 2014;33(7):1187-1194. doi:
6.Ogrinc
 G, Davies
 L, Goodman
 D, Batalden
 P, Davidoff
 F, Stevens
 D.  SQUIRE 2.0 (Standards for Quality Improvement Reporting Excellence): revised publication guidelines from a detailed consensus process.   BMJ Qual Saf. 2016;25(12):986-992. doi: