Each year, JAMA Surgery receives hundreds of submissions that retrospectively analyze large surgical databases. Although many of these attempt to shed light on new and important questions, most do not get published. A majority of submissions are not even sent out for peer review because they have clear flaws in the data analytic techniques or they attempt to address a research question that cannot be adequately answered with the proposed data set. Of those that are sent out for peer review, many are recommended to be rejected by expert peer reviewers as they find major methodological flaws in the use of these otherwise powerful data sets. Articles that are published frequently come from a select group of investigators who have developed a mastery of specific data sets and the analytic techniques required to truly harness their potential.
To help more and more investigators develop the skills needed to appropriately use the increasing number of large surgical data sets available, the editors of JAMA Surgery have commissioned this current series of statistical methodology articles. The series is aimed at providing a short, practical guide for academic surgeons and researchers in the use of the most widely available surgical data sets that can be used across the research continuum, from conceptualization to peer-reviewed publication. To achieve this, JAMA Surgery is pleased to partner with the Surgical Outcomes Club () to publish a series that will be instrumental in elevating the science used in surgical outcomes research.
This 13-part series provides a succinct overview of the 11 most widely used data sets1-11 (Box 1), their specific features, strengths, limitations, and some important statistical considerations. In addition, we present a 10-item checklist (Box 2) in this Editorial that authors can use to ensure that they have covered what is 鈥渁t minimum鈥 expected from a manuscript that uses 1 of these databases. Finally, we support this series with an Editorial12 by our biostatistician colleagues, who provide more in-depth information on statistical methodologies mentioned in the practical guides as well as potential pitfalls that need to be avoided. To ensure that these guides are truly practical and relevant, we have leveraged our partnership as the official journal of the Surgical Outcomes Club to develop a 3-person authorship team that includes (1) a surgeon investigator who is a senior member of the Surgical Outcomes Club with extensive experience using that particular data set; (2) a member of the JAMA Surgery Editorial Board who commonly reviews such manuscripts; and (3) a JAMA Surgery biostatistician who is routinely consulted to knowledgeably evaluate the methods for these types of papers (in some cases, the JAMA Surgery board member is also an expert methodologist, obviating the need for a biostatistician). This authorship strategy has ensured that each guide is presented in terms that are relevant to surgeons, even if they do not have previous experience with the biostatistics or the data set involved and includes basic information required to prepare a manuscript for the rigorous JAMA Surgery peer review process.
Box Section Ref IDBox 1.
Databases Covered in This Series
Agency for Healthcare Research and Quality Healthcare Cost and Utilization Project databases: National Inpatient Sample, State Inpatient Databases, and Kids鈥 Inpatient Database1
Surveillance, Epidemiology, and End Results Program2
Medicare Claims Data3
Military Health System Tricare Encounter Data4
Veterans Affairs Surgical Quality Improvement Program5
National Surgical Quality Improvement Program6
Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program7
National Cancer Database8
National Trauma Data Bank9
Society for Vascular Surgery Vascular Quality Initiative10
The Society of Thoracic Surgeons National Database11
Box Section Ref IDBox 2.
Checklist to Elevate the Science of Surgical Database Research
Have a solid research question and clear hypothesis. Consider using the FINER (Feasible, Interesting, Novel, Ethical, Relevant) or PICO (Patient, Population, or Problem; Intervention, Prognostic Factor, or Exposure; Comparison or Intervention; Outcome) criteria to develop these.
Ensure compliance with the institutional review board and data use agreements.
Conduct a thorough literature review. Use a reference management program for ease in manuscript development.
Make sure this is the best data set available and that it has the appropriate variables to answer your research question.
Clearly define the inclusion criteria, exclusion criteria, and outcome variables. Use a flow diagram to describe final patient selection.
Identify potential confounders and use risk adjustment to minimize bias. Consider using a directed acyclic graph to represent potential associations. Avoid use of causal language in reporting results of these observational studies.
Ensure that the data variables have not changed over time. If so, account for this.
Ensure that competing risks are identified and addressed.
Ensure that data issues, such as missing data, are discussed and that any sensitivity analyses or imputations performed are reported in a clear and cohesive way.
Ensure that your article has a clear take-home message that addresses how your research advances current knowledge and has important policy or clinical implications.
To help authors improve the quality of their submissions, we have developed a 10-item checklist (Box 2). The first item in our checklist encourages authors to pursue hypothesis-driven science. Defining a solid research question is key to translating a problem into an operational hypothesis. The FINER (Feasible, Interesting, Novel, Ethical, Relevant) criteria or the PICO (Patient, Population, or Problem; Intervention, Prognostic Factor, or Exposure; Comparison or Intervention; Outcome) format can help develop a meaningful research question.13,14 Adequately defining the population of interest lays a solid groundwork for the interpretation, applicability, and generalizability of the research findings. We understand that in many cases, authors may be using these large databases for 鈥渉ypothesis-generating鈥 research. That is of course acceptable, but one must start with a solid research question to conduct a meaningful research project that will generate important hypotheses from the large data sets that can then be further studied with translational or prospective approaches. Some authors ask if it is acceptable to try and see what they can find in a data set that they may have access to without a real research question. This is never acceptable.
Second, we remind authors to seek approval or an exemption from an institutional review board and to properly document and comply with applicable data use agreements. These are often overlooked, but compliance with applicable rules are necessary for patient privacy and a variety of important reasons. Third, a thorough literature review will assist in making sure the best database is selected to answer research questions and to make sure the research question has not been previously answered. Fourth, we encourage authors to invest enough time early on to get to know the database, confirm that it has the appropriate variables, and understand methodological considerations to make sure this is the best data set available for the study. Fifth, a clear definition of the inclusion and exclusion criteria, as well as outcome variables, is necessary for reviewers and readers to understand the population under study. This also helps facilitate data query and extraction of a complete and useful data set.
Another important aspect of working with databases is the need to identify potential confounders or covariates and use risk adjustment to minimize bias. Given the observational nature of data in these surgical registries, 1 approach to do this is to create a directed acyclic graph,15 which will allow a visual depiction of the potential association being explored along with the covariates and confounders that need to be kept in mind or accounted for while studying the association. Please refer to the Editorial by Kaji et al12 for further details. Authors should also avoid use of causal language when describing the results of these observational studies. Seventh, authors must account for any updates or significant changes to the variables of interest over time as this might jeopardize comparison between and across years (for example, in the National Cancer Database, the definition of sentinel lymph node biopsy for breast and melanoma has changed during the last 10 years, and this must be accounted for). Eighth, authors are encouraged to identify if competing risks exist in outcomes.16 For example, if authors are studying complication rates 30 days after surgery, one must account for patients who may have already died and are not at risk for developing these complications. Ninth, authors must ensure that any data issues, such as missing data, are openly discussed in a clear, cohesive, and replicable way. Authors must lay out any data limitations, how they were addressed, and measures taken to reduce their impact (eg, sensitivity analyses, multiple imputation17 for missing data). Finally, as our last item in the checklist, we encourage authors to clearly state a take-home message. It is best to communicate how the study advances the science, addresses gaps in knowledge, highlights further research opportunities, and discusses important policy or clinical implications of the work.
We recommend that authors use this checklist, the practical guide for their chosen data set, and the statistical tips for analyzing data sets as a 3-part series to consult before submission of their manuscript. We hope that by following these simple guides, authors can benefit from the collective wisdom of so many colleagues who have successfully completed similar analyses in the past. We look forward to the opportunity to publish analytically advanced studies and hope that these guides will help elevate the science of surgical database research.
Corresponding Author: Adil H. Haider, MD, MPH, Center for Surgery and Public Health, Department of Surgery, Brigham and Women鈥檚 Hospital, 1620 Tremont St, Ste 4-020, Boston, MA 02120 (ahhaider@bwh.harvard.edu).
Published Online: April 4, 2018. doi:10.1001/jamasurg.2018.0628
Conflict of Interest Disclosures: Dr Haider reports receiving grants from the Henry M. Jackson Foundation of the Department of Defense, the Orthopaedic Research and Education Foundation, and the National Institutes of Health, and nonfinancial research supports from the Centers for Medicare and Medicaid Services Office of Minority Health. Dr Bilimoria was the president of the Surgical Outcomes Club from 2016 to 2017. No other disclosures were reported.
Funding/Support: This work is supported by the Henry M. Jackson Foundation for the Advancement of Military Medicine of the Department of Defense (Dr Haider).
Role of the Funder/Sponsor: The funder had no role in the preparation, review, or approval of the manuscript and decision to submit the manuscript for publication.
1.Stulberg
聽JJ, Haut
聽ER. 聽AHRQ Healthcare Cost and Utilization Project Databases: National Inpatient Sample (NIS) [published online April 4, 2018].聽聽JAMA Surg. doi:
2.Doll
聽KM, Rademaker
聽A, Sosa
聽JA. 聽Longitudinal outcomes reporting using the Surveillance, Epidemiology, and End Results (SEER) Database [published online April 4, 2018].聽聽JAMA Surg. doi:
3.Ghaferi
聽AA, Dimick
聽JB. 聽Longitudinal outcomes reporting using Medicare claims [published online April 4, 2018].聽聽JAMA Surg. doi:
4.Schoenfeld
聽AJ, Kaji
聽AH, Haider
聽AH. 聽Outcomes reporting using Tricare claims [published online April 4, 2018].聽聽JAMA Surg. doi:
5.Massarweh
聽NM, Kaji
聽AH, Itani
聽KMF. 聽Veterans Affairs Surgical Quality Improvement Program [published online April 4, 2018].聽聽JAMA Surg. doi:
6.Raval
聽MV, Pawlik
聽TM. 聽National Surgical Quality Improvement Program (NSQIP) and pediatric NSQIP [published online April 4, 2018].聽聽JAMA Surg. doi:
7.Telem
聽DA, Dimick
聽JB. 聽Metabolic and Bariatric Surgery Accreditation and Quality Program (MBSAQIP) [published online April 4, 2018].聽聽JAMA Surg. doi:
8.Merkow
聽RP, Rademaker
聽AW, Bilimoria
聽KY. 聽National Cancer Database [published online April 4, 2018].聽聽JAMA Surg. doi:
9.Hashmi
聽ZG, Kaji
聽AH, Nathens
聽AB. 聽National Trauma Data Bank [published online April 4, 2018].聽聽JAMA Surg. doi:
10.Desai
聽SS, Kaji
聽AH, Upchurch
聽G. 聽Society for Vascular Surgery Vascular Quality Improvement Program [published online April 4, 2018].聽聽JAMA Surg. doi:
11.Farjah
聽F, Kaji
聽AH, Chu
聽D. 聽Society of Thoracic Surgery (STS) Dataset [published online April 4, 2018].聽聽JAMA Surg. doi:
12.Kaji
聽AH, Rademaker
聽AW, Hyslop
聽T. 聽Tips for analyzing large data sets from the JAMA Surgery statistical editors [published online April 4, 2018].聽聽JAMA Surg. doi:
13.Cummings
聽SR, Browners
聽WS, Hulley
聽SB. Conceiving the research question and developing the study plan. In: Hulley
聽SB, Cummings
聽SR, Browner
聽WS, Grady
聽DG, Newman
聽TB, eds. 聽Designing Clinical Research. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2007:19-22.
14.Brian Haynes
聽R. 聽Forming research questions.聽聽J Clin Epidemiol. 2006;59(9):881-886.
15.Shrier
聽I, Platt
聽RW. 聽Reducing bias through directed acyclic graphs.聽聽BMC Med Res Methodol. 2008;8:70.
16.Sun
聽M, Choueiri
聽TK, Hamnvik
聽OP,
聽et al. 聽Comparison of gonadotropin-releasing hormone agonists and orchiectomy: effects of androgen-deprivation therapy.聽聽JAMA Oncol. 2016;2(4):500-507.
17.Oyetunji
聽TA, Crompton
聽JG, Ehanire
聽ID,
聽et al. 聽Multiple imputation in trauma disparity research.聽聽J Surg Res. 2011;165(1):e37-e41.