Evaluating a Structured Reporting Template to Increase Transparency and Reduce Review Time for Healthcare Database Studies

Focus Groups as Used by the Food and Drug Administration

Appendix B3 - ISPE-ISPOR Joint Task Force Paper

Evaluating a Structured Reporting Template to Increase Transparency and Reduce Review Time for Healthcare Database Studies

OMB: 0910-0497

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 0910-0497 can be found here:
Document [pdf]
Download: pdf | pdf
Received: 21 July 2017

Revised: 25 July 2017

Accepted: 25 July 2017

DOI: 10.1002/pds.4295

ORIGINAL REPORT

Reporting to Improve Reproducibility and Facilitate Validity
Assessment for Healthcare Database Studies V1.0
Shirley V. Wang1,2

|

Sebastian Schneeweiss1,2

|

Marc L. Berger3

|

Jeffrey Brown4

| Rosa Gini7 | Olaf Klungel8
Frank de Vries5 | Ian Douglas6 | Joshua J. Gagne1,2
C. Daniel Mullins9 | Michael D. Nguyen10 | Jeremy A. Rassen11 | Liam Smeeth6 |

|

|

Miriam Sturkenboom12 |
on behalf of the joint ISPE‐ISPOR Special Task Force on Real World Evidence in Health Care
Decision Making
1

Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, MA, USA

2

Department of Medicine, Harvard Medical School, MA, USA

3

Pfizer, NY, USA

4

Department of Population Medicine, Harvard Medical School, MA, USA

5

Department of Clinical Pharmacy, Maastricht UMC+, The Netherlands

6

London School of Hygiene and Tropical Medicine, England, UK

7

Agenzia regionale di sanità della Toscana, Florence, Italy

8

Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht University, Utrecht, Netherlands

9

Pharmaceutical Health Services Research Department, University of Maryland School of Pharmacy, MA, USA

10

FDA Center for Drug Evaluation and Research, USA

11

Aetion, Inc., NY, USA

12

Erasmus University Medical Center Rotterdam, Netherlands

Correspondence
S. V. Wang, Division of Pharmacoepidemiology
and Pharmacoeconomics, Brigham and
Women's Hospital and Harvard Medical
School, United States.
Email: [email protected]

Abstract
Purpose:

Defining a study population and creating an analytic dataset from longitudinal

healthcare databases involves many decisions. Our objective was to catalogue scientific decisions
underpinning study execution that should be reported to facilitate replication and enable assessment of validity of studies conducted in large healthcare databases.

Methods:

We reviewed key investigator decisions required to operate a sample of macros

and software tools designed to create and analyze analytic cohorts from longitudinal streams of
healthcare data. A panel of academic, regulatory, and industry experts in healthcare database analytics discussed and added to this list.

Contributors to the joint ISPE‐ISPOR Special Task Force on Real World Evidence in Health Care Decision Making paper co‐led by Shirley V. Wang and Sebastian Schneeweiss.
The writing group contributors are the following: Marc L. Berger, Jeffrey Brown, Frank de Vries, Ian Douglas, Joshua J. Gagne, Rosa Gini, Olaf Klungel, C. Daniel Mullins,
Michael D. Nguyen, Jeremy A. Rassen, Liam Smeeth and Miriam Sturkenboom. The contributors who participated in small group discussion and/or provided substantial
feedback prior to ISPE/ISPOR membership review are the following: Andrew Bate, Alison Bourke, Suzanne Cadarette, Tobias Gerhard, Robert Glynn, Krista Huybrechts,
Kiyoshi Kubota, Amr Makady, Fredrik Nyberg, Mary E Ritchey, Ken Rothman and Sengwee Toh. Additional information is listed in Appendix.
This article is a joint publication by Pharmacoepidemiology and Drug Safety and Value in Health.
--------------------------------------------------------------------------------------------------------------------------------

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited.
© 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.

1018

wileyonlinelibrary.com/journal/pds

Pharmacoepidemiol Drug Saf. 2017;26:1018–1032.

WANG

1019

ET AL.

Conclusion:

Evidence generated from large healthcare encounter and reimbursement

databases is increasingly being sought by decision‐makers. Varied terminology is used around
the world for the same concepts. Agreeing on terminology and which parameters from a large
catalogue are the most essential to report for replicable research would improve transparency
and facilitate assessment of validity. At a minimum, reporting for a database study should
provide clarity regarding operational definitions for key temporal anchors and their relation to
each other when creating the analytic dataset, accompanied by an attrition table and a design
diagram.
A substantial improvement in reproducibility, rigor and confidence in real world evidence generated from healthcare databases could be achieved with greater transparency about operational
study parameters used to create analytic datasets from longitudinal healthcare databases.
KEY W ORDS

Transparency, reproducibility, replication, healthcare databases, pharmacoepidemiology, methods,
longitudinal data

1

|

I N T RO D U CT I O N

did you plan to do?”) and transparency in study execution (e.g. “what
did you actually do?). This paper led by ISPE focuses on the latter topic,

Modern healthcare encounter and reimbursement systems produce an

reporting of the specific steps taken during study implementation to

abundance of electronically recorded, patient‐level longitudinal data.

improve reproducibility and assessment of validity.

These data streams contain information on physician visits, hospitaliza-

Transparency and reproducibility in large healthcare databases is

tions, diagnoses made and recorded, procedures performed and billed,

dependent on clarity regarding 1) cleaning and other pre‐processing

medications prescribed and filled, lab tests performed or results

of raw source data tables, 2) operational decisions to create an analytic

recorded, as well as many other date‐stamped items. Such temporally

dataset and 3) analytic choices (Figure 1). This paper focuses on

ordered data are used to study the effectiveness and safety of medical

reporting of design and implementation decisions to define and create

products, healthcare policies, and medical interventions and have

a temporally anchored study population from raw longitudinal source

become a key tool for improving the quality and affordability of

data (Figure 1 Step 2). A temporally anchored study population is iden-

healthcare.1,2 The importance and influence of such “real world” evi-

tified by a sentinel event – an initial temporal anchor. Characteristics of

dence is demonstrated by commitment of governments around the

patients, exposures and/or outcomes are evaluated during time

world to develop infrastructure and technology to increase the capac-

periods defined in relation to the sentinel event.

ity for use of these data in comparative effectiveness and safety
research as well as health technology assessments.3-12

However understanding how source data tables are cut, cleaned
and pre‐processed prior to implementation of a research study (Figure

Research conducted using healthcare databases currently suffers

1 Step 1), how information is extracted from unstructured data (e.g.

from a lack of transparency in reporting of study details.13-16 This

natural language processing of free text from clinical notes), and how

has led to high profile controversies over apparent discrepancies in

the created dataset is analyzed (Figure 1 Step 3) are also important

results and reduced confidence in evidence generated from healthcare

parts of reproducible research. These topics have been covered else-

databases. However, subtle differences in scientific decisions regard-

where,14,29-36 however we summarize key points for those data prov-

ing specific study parameters can have significant impacts on results

enance steps in the online appendix.

and interpretation – as was discovered in the controversies over 3rd
generation oral contraceptives and risk of venous thromboembolism
or statins and the risk of hip fracture.17,18 Clarity regarding key operational decisions would have facilitated replication, assessment of valid-

1.1

ity and earlier understanding of the reasons that studies reported

Transparency in what researchers initially intended to do protects

different findings.

against data dredging and cherry picking of results. It can be achieved

|

Transparency

The intertwined issues of transparency, reproducibility and validity

with pre‐registration and public posting of protocols before initiation

cut across scientific disciplines. There has been an increasing move-

of analysis. This is addressed in detail in a companion paper led by

ment towards “open science”, an umbrella term that covers study reg-

ISPOR.37 Because the initially planned research and the design and

istration, data sharing, public protocols and more detailed, transparent

methodology underlying reported results may differ, it is also impor-

reporting.19-28 To address these issues in the field of healthcare data-

tant to have transparency regarding what researchers actually did to

base research, a Joint Task Force between the International Society

obtain the reported results from a healthcare database study. This

for Pharmacoepidemiology (ISPE) and the International Society for

can be achieved with clear reporting on the detailed operational deci-

Pharmacoeconomics and Outcomes Research (ISPOR) was convened

sions made by investigators during implementation. These decisions

to address transparency in process for database studies (e.g. “what

include how to define a study population (whom to study), and how

1020

WANG

FIGURE 1

ET AL.

Data provenance: transitions from healthcare delivery to analysis results. [Colour figure can be viewed at wileyonlinelibrary.com]

to design and conduct an analysis (what to measure, when and how to

conducting multiple studies that evaluate the same question and

measure it).

estimand (target of inference) but use different data and/or apply different methodology or operational decisions (conceptual replication38)

1.2

|

Reproducibility and replicability

(Table 1).
Direct replicability is a necessary, but not sufficient, component of

Reproducibility is a characteristic of a study or a finding. A reproducible

high quality research. In other words, a fully transparent and directly

study is one for which independent investigators implementing the

replicable research study is not necessarily rigorous nor does it neces-

same methods in the same data are able to obtain the same results

sarily produce valid findings. However, the transparency that makes

(direct replication38). In contrast, a reproducible finding is a higher

direct replication possible means that validity of design and operational

order target than a reproducible study, which can be tested by

decisions can be evaluated, questioned and improved. Higher order

WANG

1021

ET AL.

TABLE 1

Reproducibility and replicability

researchers go beyond general guidance and provide a clear report of
the temporal anchors, coding algorithms, and other decisions made to
create and analyze their study population(s), independent investigators
following the same technical/statistical protocol and using the same data
source are able to closely replicate the study population and results.47

1.3 | The current status of transparency and
reproducibility of healthcare database studies
issues such as conceptual replication of the finding can and should be

Many research fields that rely on primary data collection have empha-

evaluated as well, however, without transparency in study implemen-

sized creation of repositories for sharing study data and analytic

tation, it can be difficult to ascertain whether superficially similar stud-

code.48,49 In contrast to fields that rely on primary data collection,

ies address the same conceptual question.

numerous healthcare database researchers routinely make secondary

For healthcare database research, direct replication of a study

use of the same large healthcare data sources. However the legal

means that if independent investigators applied the same design oper-

framework that enables healthcare database researchers to license or

ational choices to the same longitudinal source data, they should be

otherwise access raw data for research often prevents public sharing

able to obtain the same results (or at least a near exact reproduction).

both of raw source data itself as well as created analytic datasets

In contrast, conceptual replication and robustness of a finding can be

due to patient privacy and data security concerns. Access to data and

assessed by applying the same methods to different source data (or dif-

code guarantees the ability to directly replicate a study. However,

ferent years from the same source). Here, lack of replicability would

the current system for multi‐user access to the same large healthcare

not necessarily mean that one result is more “correct” than another,

data sources often prevents public sharing of that data. Furthermore,

or refutes the results of the original. Instead, it would highlight a need

database studies require thousands of lines of code to create and ana-

for deeper inquiry to find the drivers of the differences, including

lyze a temporally anchored study population from a large healthcare

differences in data definitions and quality, temporal changes or true

database. This is several orders of magnitude larger than the code

differences in treatment effect for different populations. Conceptual

required for analysis of a randomized trial or other dataset based on

replications can be further evaluated through application of different

primary collection. Transparency requires clear reporting of the deci-

plausible methodologic and operational decisions to the same or

sions and parameters used in study execution. While we encourage

different source data to evaluate how much the finding is influenced

sharing data and code, we recognize that for many reasons, including

by the specific parameter combinations originally selected. This would

data use agreements and intellectual property, this is often not possi-

encompass evaluation of how much reported findings vary with plausi-

ble. We emphasize that simply sharing code without extensive annota-

ble alternative parameter choices, implementation in comparable data

tion to identify where key operational and design parameters are

sources or after flawed design or operational decision is corrected.

defined would obfuscate important scientific decisions. Clear natural

However, the scientific community cannot evaluate the validity and

language description of key operational and design details should be

rigor of research methods if implementation decisions necessary for

the basis for sharing the scientific thought process with the majority

replication are not transparently reporte.

of informed consumers of evidence.

The importance of achieving consistently reproducible research is
recognized in many reporting guidelines (e.g. STROBE,34 RECORD,39
PCORI Methodology Report,40 EnCePP33) and is one impetus for
developing infrastructure and tools to scale up capacity for generating

1.4 | Recent efforts to improve transparency and
reproducibility of healthcare database studies

evidence from large healthcare database research.3,41-45 Other guide-

To generate transparent and reproducible evidence that can inform

lines, such as the ISPE Guidelines for Good Pharmacoepidemiology

decision‐making at a larger scale, many organizations have developed

Practice (GPP) broadly cover many aspects of pharmacoepidemiology

infrastructure to more efficiently utilize large healthcare data

from protocol development, to responsibilities of research personnel

sources.9,50-56 Recently developed comprehensive software tools from

and facilities, to human subject protection and adverse event

such organizations use different coding languages and platforms to

reporting.46 While these guidelines certainly increase transparency,

facilitate identification of study populations, creation of temporally

even strict adherence to existing guidance would not provide all the

anchored analytic datasets, and analysis from raw longitudinal

information necessary for full reproducibility. In recognition of this

healthcare data streams. They have in common the flexibility for inves-

issue, ISPE formed a joint task force with ISPOR specifically focused

tigators to turn “gears and levers” at key operational touchpoints to

on improving transparency, reproducibility and validity assessment

create analytically usable, customized study populations from raw lon-

for database research, and supported a complementary effort to

gitudinal source data tables. However, the specific parameters that

develop a version of the RECORD reporting guidelines with a specific

must be user specified, the flexibility of the options and the underlying

focus on healthcare database pharmacepidemiology.

programming code differ. Many but not all, reusable software tools go

Any replication of database research requires an exact description

through extensive quality checking and validation processes to provide

of the transformations performed upon the source data and how missing

assurance of the fidelity of the code to intended action. Transparency

data are handled. Indeed, it has been demonstrated that when

in quality assurance and validation processes for software tools is

1022

WANG

ET AL.

critically important to prevent exactly replicable findings that lack fidel-

2017. The paper was also reviewed by ISPOR membership and

ity to intended design and operational parameters.

endorsed by ISPOR leadership.

Even with tools available to facilitate creation and analysis of a
temporally anchored study population from longitudinal healthcare
databases, investigators must still take responsibility for publically

3

RESULTS

|

reporting the details of their design and operational decisions. Due to
the level of detail, these can be made available as online appendices

Our review identified many scientific decisions necessary to operate

or web links for publications and reports.

software solutions that would facilitate direct replication of an analytic
cohort from raw source data captured in a longitudinal healthcare data
source (Table 2). After reviewing the first two comprehensive software

1.5

|

Objective

The objective of this paper was to catalogue scientific decisions made
when executing a database study that are relevant for facilitating rep-

solutions, no parameters were added with review of additional software tools (e.g. “saturation point”). The general catalogue includes
items that may not be relevant for all studies or study designs.
The group of experts agreed that the detailed catalogue of scien-

lication and assessment of validity.
We emphasize that a fully transparent study does not imply that
reported parameter choices were scientifically valid; rather, the validity
of a research study cannot be evaluated without transparency regarding those choices. We also note that the purpose of this paper was not
to recommend specific software or suggest that studies conducted
with software platforms are better than studies based on de novo code.

tific decision points that would enhance transparency and reproducibility but noted that even if every parameter were reported, there was
room for different interpretation of language used to describe choices.
Therefore future development of clear, shared terminology and design
visualization techniques would be valuable. While sharing source data
and code should be encouraged (when permissible by data use agreements and intellectual property), this would not be a sufficient substitute for transparent, natural language reporting of study parameters.

2

|

METHODS

In order to identify an initial list of key parameters that must be defined
to implement a study, we reviewed 5 macro based programs and soft-

3.1

|

Data source

Researchers should specify the name of the data source, who provided

ware systems designed to support healthcare database research (listed

the data (A1), the data extraction date (DED) (A2), data version, or data

in appendix). We used this as a starting point because such programs

sampling strategy (A3) (when appropriate), as well as the years of

are designed with flexible parameters to allow creation of customized

source data used for the study (A4). As summarized in the appendix,

study populations based on user specified scientific decisions.54,57-60

source data may have subtle or profound differences depending on

These flexible parameters informed our catalogue of operational deci-

when the raw source data was cut for research use. Therefore, if an

sions that would have to be transparent for an independent investiga-

investigator were to run the same code to create and analyze a study

tor to fully understand how a study was implemented and be able to

population from the same data source twice, the results may not line

directly replicate a study.

up exactly if the investigator uses a different data version or raw lon-

Our review included a convenience sample of macro based programs and software systems that were publically available, developed

gitudinal source data cut by the data holding organization at different
time points.

by or otherwise accessible to members of the Task Force. Although

When a researcher is granted access to only a subset of raw longi-

the software systems used a variety of coding languages, from a

tudinal source data from a data vendor, the sampling strategy and any

methodologic perspective, differences in code or coding languages

inclusions or exclusions applied to obtain that subset should be

are irrelevant so long as study parameters are implemented as

reported. For example, one could obtain access to a 5% sample of

intended by the investigator.

Medicare patients flagged with diabetes in the chronic condition ware-

In our review, we identified places where an investigator had to

house in the years 2010‐2014.

make a scientific decision between options or create study specific

It is also important for researchers to describe the types of data

inputs to create an analytic dataset from raw longitudinal source data,

available in the data source (A5) and characteristics of the data such

including details of data source, inclusion/exclusion criteria, exposure

as the median duration of person‐time within the data source. This is

definition, outcome definition, follow up (days at risk), baseline covar-

important for transparency and ability of decision‐makers unfamiliar

iates, as well as reporting on analysis methods. As we reviewed each

with the data source to assess the validity or appropriateness of

tool, we added new parameters that had not been previously encoun-

selected design choices. The data type has implications for comprehen-

tered and synonyms for different concepts.

siveness of patient data capture. For example, is the data based on

After the list of parameters was compiled, the co‐authors, an inter-

administrative or electronic health records? If the latter, does the data

national group of database experts, corresponded about these items

cover only primary care, inpatient settings or an integrated health sys-

and suggested additional parameters to include. In‐person discussions

tem? Does it include lab tests, results or registry data? Does it contain

took place following the ISPE mid‐year in London (2017).

data on prescribed medications or dispensed medications? Is there link-

This paper was opened to comment by ISPE membership prior to
publication and was endorsed by ISPE's Executive Board on July 20,

age between outpatient and inpatient data? Is there linkage to other
data sources? (A6) If so, then who did the linkage, when and how?

WANG

1023

ET AL.

TABLE 2

Reporting specific parameters to increase reproducibility of database studies*
Description

Example

Synonyms

A. Reporting on data source should include:
A.1 Data provider

Data source name and name of organization
that provided data.

A.2 Data extraction
date (DED)

The date (or version number) when data were The source data for this research
extracted from the dynamic raw transactional
study was cut by [data vendor]
data stream (e.g. date that the data were cut
on January 1st, 2017. The study
for research use by the vendor).
included administrative claims
from Jan 1st 2005 to
The search/extraction criteria applied if the
Dec 31st 2015.
source data accessible to the researcher is a
subset of the data available from the vendor.

Data version, data pull

A.4 Source data range
(SDR)

The calendar time range of data used for the
study. Note that the implemented study may
use only a subset of the available data.

Study period, query period

A.5 Type of data

The domains of information available in the
source data, e.g. administrative, electronic
health records, inpatient versus outpatient
capture, primary vs secondary care,
pharmacy, lab, registry.

A.3 Data sampling

Medicaid Analytic Extracts data covering 50
states from the Centers for Medicare and
Medicaid Services.

The administrative claims data include
enrollment information, inpatient and
outpatient diagnosis (ICD9/10) and
procedure (ICD9/10, CPT, HCPCS) codes as
well as outpatient dispensations (NDC codes)
for 60 million lives covered by Insurance X.
The electronic health records data include
diagnosis and procedure codes from billing
records, problem list entries, vital signs,
prescription and laboratory orders, laboratory
results, inpatient medication dispensation, as
well as unstructured text found in clinical
notes and reports for 100,000 patients with
encounters at ABC integrated healthcare
system.

A.6 Data linkage, other Data linkage or supplemental data such as chart We used Surveillance, Epidemiology, and End
Results (SEER) data on prostate cancer cases
supplemental data
reviews or survey data not typically available
from 1990 through 2013 linked to Medicare
with license for healthcare database.
and a 5% sample of Medicare enrollees living
in the same regions as the identified cases of
prostate cancer over the same period of time.
The linkage was created through a
collaborative effort from the National Cancer
Institute (NCI), and the Centers for Medicare
and Medicaid Services (CMS).
A.7 Data cleaning

Global cleaning: The data source was cleaned to
Transformations to the data fields to handle
exclude all individuals who had more than
missing, out of range values or logical
one gender reported. All dispensing claims
inconsistencies. This may be at the data
that were missing day's supply or had 0 days’
source level or the decisions can be made on
supply were removed from the source data
a project specific basis.
tables. Project specific cleaning: When
calculating duration of exposure for our
study population, we ignored dispensation
claims that were missing or had 0 days’
supply. We used the most recently reported
birth date if there was more than one birth
date reported.

A.8 Data model
conversion

Format of the data, including description of
decisions used to convert data to fit a
Common Data Model (CDM).

The source data were converted to fit the
Sentinel Common Data Model (CDM) version
5.0. Data conversion decisions can be found
on our website (http://ourwebsite).
Observations with missing or out of range
values were not removed from the CDM
tables.

B. Reporting on overall design should include:
B.1 Design diagram

See example Figure 2.
A figure that contains 1st and 2nd order
temporal anchors and depicts their relation to
each other.

C. Reporting on inclusion/exclusion criteria should include:
C.1 Study entry date
(SED)

The date(s) when subjects enter the cohort.

Index date, cohort entry
We identified the first SED for each patient.
date, outcome date, case
Patients were included if all other inclusion/
date, qualifying event
exclusion criteria were met at the first SED.
date, sentinel event
We identified all SED for each patient.
Patients entered the cohort only once, at the
(Continues)

1024
TABLE 2

WANG

ET AL.

(Continued)

C.2 Person or episode
level study entry

C.3 Sequencing of
exclusions

C.4 Enrollment window
(EW)
C.5 Enrollment gap

Description
Example
Synonyms
The type of entry to the cohort. For example, at
first SED where all other inclusion/exclusion Single vs multiple entry,
treatment episodes, drug
the individual level (1x entry only) or at the
criteria were met. We identified all SED for
eras
episode level (multiple entries, each time
each patient. Patients entered the cohort at
inclusion/exclusion criteria met).
every SED where all other inclusion/
exclusion criteria were met.
The order in which exclusion criteria are
Attrition table, flow
applied, specifically whether they are
diagram, CONSORT
applied before or after the selection of
diagram
the SED(s).
The time window prior to SED in which an
Patients entered the cohort on the date of their Observation window
individual was required to be contributing to
first dispensation for Drug X or Drug Y after
the data source.
at least 180 days of continuous enrollment
(30 day gaps allowed) without dispensings for
The algorithm for evaluating enrollment prior to
either Drug X or Drug Y.
SED including whether gaps were allowed.

C.6 Inclusion/Exclusion The time window(s) over which inclusion/
definition window
exclusion criteria are defined.

Exclude from cohort if ICD‐9 codes
for deep vein thrombosis (451.1x, 451.2x,
451.81, 451.9x, 453.1x, 453.2x, 453.8x,
C.7 Codes
The exact drug, diagnosis, procedure, lab or
Concepts, vocabulary, class,
453.9x, 453.40, 453.41, 453.42 where x
other codes used to define inclusion/
domain
represents presence of a numeric digit 0‐9
exclusion criteria.
or no additional digits) were recorded in the
C.8 Frequency and
The temporal relation of codes in relation to
primary diagnosis position during an
temporality of codes
each other as well as the SED. When defining
inpatient stay within the 30 days prior to and
temporality, be clear whether or not the SED
including the SED. Invalid ICD‐9 codes
is included in assessment windows (e.g.
that matched the wildcard criteria were
occurred on the same day, 2 codes for A
excluded.
occurred within 7 days of each other
during the 30 days prior to and including
the SED).
C.9 Diagnosis position The restrictions on codes to certain positions, e.
(if relevant/available)
g. primary vs. secondary. Diagnoses.
Care site, place of service,
point of service, provider
type

C.10 Care setting

The restrictions on codes to those identified
from certain settings, e.g. inpatient,
emergency department, nursing home.

C.11 Washout for
exposure

The period used to assess whether exposure at New initiation was defined as the first
Lookback for exposure,
the end of the period represents new
dispensation for Drug X after at least 180 days
event free period
exposure.
without dispensation for Drug X, Y, and Z.

C.12 Washout for
outcome

The period prior to SED or ED to assess
whether an outcome is incident.

Patients were excluded if they had a stroke
within 180 days prior to and including the
cohort entry date. Cases of stroke were
excluded if there was a recorded stroke
within 180 days prior.

Lookback for outcome,
event free period

D. Reporting on exposure definition should include:
We evaluated risk of outcome Z following
incident exposure to drug X or drug Y.
Incident exposure was defined as beginning
on the day of the first dispensation for one of
these drugs after at least 180 days without
D.2 Exposure risk
The ERW is specific to an exposure and the
dispensations for either (SED). Patients with
window (ERW)
outcome under investigation. For drug
incident exposure to both drug X and drug Y
exposures, it is equivalent to the time
on the same SED were excluded. The
between the minimum and maximum
exposure risk window for patients with Drug
hypothesized induction time following
X and Drug Y began 10 days after incident
ingestion of the molecule.
exposure and continued until 14 days past
D.2a Induction period1 Days on or following study entry date during
the last days supply, including refills. If a
which an outcome would not be counted as
patient refilled early, the date of the early
"exposed time" or "comparator time".
refill and subsequent refills were adjusted so
that the full days supply from the initial
The algorithm applied to handle leftover days
D.2b Stockpiling1
dispensation was counted before the days
supply if there are early refills.
supply from the next dispensation was
D.2c Bridging exposure The algorithm applied to handle gaps that are
tallied. Gaps of less than or equal to 14 days
1
episodes
longer than expected if there was perfect
in between one dispensation plus days
adherence (e.g. non‐overlapping dispensation
supply and the next dispensation for the
+ day's supply).
same drug were bridged (i.e. the time was
counted as continuously exposed). If patients
The algorithm applied to extend exposure past
D.2d Exposure
exposed to Drug X were dispensed Drug Y or
the days supply for the last observed
extension1
vice versa, exposure was censored. NDC
dispensation in a treatment episode.
codes used to define incident exposure to
D.3 Switching/add on
The algorithm applied to determine whether
drug X and drug Y can be found in the
exposure should continue if another
appendix. Drug X was defined by NDC codes
exposure begins.
listed in the appendix. Brand and generic

D.1 Type of exposure

The type of exposure that is captured or
measured, e.g. drug versus procedure, new
use, incident, prevalent, cumulative, time‐
varying.

Drug era, risk window

Blackout period

Episode gap, grace period,
persistence window, gap
days
Event extension

Treatment episode
truncation indicator

(Continues)

WANG

1025

ET AL.

TABLE 2

(Continued)

Description
Description in Section C.
D.4 Codes, frequency
and temporality of
codes, diagnosis
position, care setting
D.5 Exposure
Assessment Window
(EAW)

Example
versions were used to define Drug X. Non
pill or tablet formulations and combination
pills were excluded.

Synonyms
Concepts, vocabulary, class,
domain, care site, place of
service, point of service,
provider type

We evaluated the effect of treatment
A time window during which the exposure
intensification vs no intensification following
status is assessed. Exposure is defined at the
hospitalization on disease progression. Study
end of the period. If the occurrence of
entry was defined by the discharge date from
exposure defines cohort entry, e.g. new
the hospital. The exposure assessment
initiator, then the EAW may be a point in
window started from the day after study
time rather than a period. If EAW is after
entry and continued for 30 days. During
cohort entry, FW must begin after EAW.
this period, we identified whether or not
treatment intensified for each patient.
Intensification during this 30 day period
determined exposure status during follow
up. Follow up for disease progression
began 31 days following study entry and
continued until the firsst censoring criterion
was met.

E. Reporting on follow‐up time should include:
E.1 Follow‐up window
(FW)

The time following cohort entry during which
patients are at risk to develop the outcome
due to the exposure. FW is based on a
biologic exposure risk window defined by
minimum and maximum induction times.
However, FW also accounts for censoring
mechanisms.

E.2 Censoring criteria

The criteria that censor follow up.

Follow up began on the SED and continued
until the earliest of discontinuation of study
exposure, switching/adding comparator
exposure, entry to nursing home, death, or
end of study period. We included a
biologically plausible induction period,
therefore, follow up began 60 days after the
SED and continued until the earliest of
discontinuation of study exposure,
switching/adding comparator exposure,
entry to nursing home, death, or end of study
period.

F. Reporting on outcome definition should include:
F.1 Event date ‐ ED

The date of an event occurrence.

The ED was defined as the date of first
inpatient admission with primary diagnosis
410.x1 after the SED and occurring within
the follow up window.

Description in Section C.
F.2 Codes, frequency
and temporality of
codes, diagnosis
position, care setting
F.3. Validation

The performance characteristics of outcome
algorithm if previously validated.

Case date, measure date,
observation date

Concepts, vocabulary, class,
domain, care site, place of
service, point of service,
provider type
The outcome algorithm was validated via
chart review in a population of diabetics
from data source D (citation). The positive
predictive value of the algorithm was
94%.

G. Reporting on covariate definitions should include:

Event measures,
observations

G.1 Covariate
assessment window
(CW)

The time over which patient covariates are
assessed.

We assessed covariates during the 180 days
prior to but not including the SED.

G.2 Comorbidity/risk
score

The components and weights used in
calculation of a risk score.

See appendix for example. Note that codes,
temporality, diagnosis position and care
setting should be specified for each
component when applicable.

G.3 Healthcare
utilization metrics

The counts of encounters or orders
over a specified time period,
sometimes stratified by care setting,
or type of encounter/order.

We counted the number of generics dispensed
for each patient in the CAP. We counted
the number of dispensations for each
patient in the CAP. We counted the
number of outpatient encounters recorded
in the CAP. We counted the number of
days with outpatient encounters recorded
in the CAP. We counted the number of
inpatient hospitalizations in the CAP, if
admission and discharge dates for
different encounters overlapped, these
were "rolled up" and counted as 1
hospitalization.

Baseline period

(Continues)

1026

WANG

TABLE 2

ET AL.

(Continued)

Description

Example

Synonyms

Baseline covariates were defined by codes from Concepts, vocabulary, class,
domain, care site, place of
claims with service dates within 180 days
service, point of service,
prior to and including the SED. Major upper
provider type
gastrointestinal bleeding was defined as
inpatient hospitalization with: At least one of
the following ICD‐9 diagnoses: 531.0x,
531.2x, 531.4x, 531.6x, 532.0x, 532.2x,
532.4x, 532.6x, 533.0x, 533.2x, 533.4x,
533.6x, 534.0x, 534.2x, 534.4x, 534.6x,
578.0 ‐ OR ‐ An ICD‐9 procedure code of:
44.43 ‐ OR ‐ A CPT code 43255

Description in Section C.
G.4 Codes, frequency
and temporality of
codes, diagnosis
position, care setting

H. Reporting on control sampling should include:
H.1 Sampling strategy

H.2 Matching factors
H.3 Matching ratio

The strategy applied to sample controls for
We used risk set sampling without
identified cases (patients with ED meeting all
replacement to identify controls from our
inclusion/exclusion criteria).
cohort of patients with diagnosed diabetes
(inpatient or outpatient ICD‐9 diagnoses of
The characteristics used to match controls to
250.xx in any position). Up to 4 controls
cases.
were randomly matched to each case on
The number of controls matched to cases (fixed
length of time since SED (in months), year
or variable ratio).
of birth and gender. The random seed and
sampling code can be found in the online
appendix.

I. Reporting on statistical software should include:
I.1 Statistical software
program used

The software package, version, settings,
packages or analytic procedures.

We used: SAS 9.4 PROC LOGISTIC Cran R
v3.2.1 survival package Sentinel's Routine
Querying System version 2.1.1 CIDA+PSM1
tool Aetion Platform release 2.1.2 Cohort
Safety

Parameters in bold are key temporal anchors

If the raw source data is pre‐processed, with cleaning up of messy

in the 183 days prior to but not including the study entry date. There

fields or missing data, before an analytic cohort is created, the

are two windows during which covariates are assessed, covariates 1‐5

decisions in this process should be described (A7). For example, if

are defined in the 90 days prior to but not including the study index

the raw data is converted to a common data model (CDM) prior to

date whereas covariates 6‐25 are defined in the 183 days prior to but

creation of an analytic cohort, the CDM version should be referenced

not including the index date. There is an induction period following

(e.g. Sentinel Common Data Model version 5.0.1,61 Observational

study entry so follow up for the outcome begins on day 30 and con-

62

Medical Outcomes Partnership Common Data Model version 5.0 )

tinues until a censoring mechanism is met.

(A8). Or if individuals with inconsistent dates of birth or gender were
unilaterally dropped from all relational data tables, this should be documented in meta‐data about the data source. If the data is periodically
refreshed with more recent data, the date of the refresh should be

3.3 | Exposure, outcome, follow up, covariates and
various cohort entry criteria

reported as well as any changes in assumptions applied during the data
transformation.31,32 If cleaning decisions are made on a project specific

A great level of detail is necessary to fully define exposure, outcome,

basis rather than at a global data level, these should also be reported.

inclusion/exclusion and covariates. As others have noted, reporting
the specific codes used to define these measures is critical for

3.2

|

Design

transparency and reproducibility47,63 especially in databases where
there can be substantial ambiguity in code choice.

In addition to stating the study design, researchers should provide a

The study entry dates (C1) will depend on how they are selected

design diagram that provides a visual depiction of first/second order

(one entry per person versus multiple entries) (C2) and whether inclu-

temporal anchors (B1, Table 3) and their relationship to each other.

sion/exclusion criteria are applied before or after selection of study entry

This diagram will provide clarity about how and when patients enter

date(s) for each individual (C3). Reporting should include a clear descrip-

the cohort, baseline characteristics are defined as well as when follow

tion of the sequence in which criteria were applied to identify the study

up begins and ends. Because the terminology for similar concepts

population, ideally in an attrition table or flow diagram, and description of

varies across research groups and software systems, visual depiction

whether patients were allowed to enter multiple times. If more than one

of timelines can reduce the risk of misinterpretation. We provide one

exposure is evaluated, researchers should be explicit about how to

example of a design diagram that depicts these temporal anchors

handle situations where an individual meets inclusion/exclusion criteria

(Figure 2). In this figure, the study entry date is day 0. A required period

to enter the study population as part of more than one exposure group.

of enrollment is defined during the 183 days prior to but not including

Also critical are other key investigator decisions including 1)

the study entry date. There is also washout for exposure and outcome

criteria for ensuring that healthcare encounters would be captured in

WANG

1027

ET AL.

TABLE 3

Key temporal anchors in design of a database study 1

Temporal Anchors

Description

Base anchors (calendar time):
Data Extraction Date ‐ DED

The date when the data were extracted from the dynamic raw transactional data stream

Source Data Range ‐ SDR

The calendar time range of data used for the study. Note that the implemented study may
use only a subset of the available data.

First order anchors (event time):
Study Entry Date ‐ SED

The dates when subjects enter the study.

Second order anchors (event time):

1

Enrollment Window ‐ EW

The time window prior to SED in which an individual was required to be
contributing to the data source

Covariate Assessment Window ‐ CW

The time during which all patient covariates are assessed. Baseline covariate assessment
should precede cohort entry in order to avoid adjusting for causal intermediates.

Follow‐Up Window ‐ FW

The time following cohort entry during which patients are at risk to develop the
outcome due to the exposure.

Exposure Assessment Window ‐ EAW

The time window during which the exposure status is assessed. Exposure is defined at the end
of the period. If the occurrence of exposure defines cohort entry, e.g. new initiator, then the
exposure assessment may be a point in time rather than a window. If exposure assessment
is after cohort entry, follow up must begin after exposure assessment.

Event Date ‐ ED

The date of an event occurrence following cohort entry

Washout for Exposure ‐ WE

The time prior to cohort entry during which there should be no exposure (or comparator).

Washout for Outcome ‐ WO

The time prior to cohort entry during which the outcome of interest should not occur

Anchor dates are key dates; baseline anchors identify the available source data; first order anchor dates define entry to the analytic dataset, and second
order anchors are relative to the first order anchor

FIGURE 2

Example design diagram. [Colour figure can be viewed at wileyonlinelibrary.com]

the data (e.g. continuous enrollment for a period of time, with or

When “wildcards” are used to summarize code lists instead of list-

without allowable gaps) (C4, C5), 2) specific codes used, the frequency

ing out every single potential code, the definition of the wildcard

and temporality of codes in relation to each other and the study entry

should be specified. For example, if someone uses “x” as a wildcard

date (C6‐C8), 3) diagnosis position (C9) and care settings (C10) (e.g.

in an algorithm to define a baseline covariate (e.g. ICD‐9 codes 410.

primary diagnosis in an inpatient setting). Whenever defining temporal

x1), the definition should indicate over what time period in relation

anchors, whether or not time windows are inclusive of the study entry

to study entry (covariate assessment window – CW), which care set-

date should be articulated. Some studies use multiple coding systems

tings to look in (C11), whether to include only primary diagnoses

when defining parameters. For example, studies that span the

(C10), and whether the wildcard “x” includes only digits 0‐9 or also

transition from ICD‐9 to ICD 10 in the United States or studies that

includes the case of no additional digits recorded. Furthermore, when

involve data from multiple countries or delivery systems. If coding

wildcards are used, it should be clear whether invalid codes found with

algorithms are mapped from one coding system to another, details

a wildcard match in the relevant digit were excluded (e.g. 410.&1 is not

about how the codes were mapped should be reported.

a valid code but matches 410.x1).

1028

WANG

ET AL.

It is important to report on who can be included in a study.

to capture incident events (C12). If a washout period was applied, it

Reporting should include specification of what type of exposure

should be clear whether the washout included or excluded the study

measurement is under investigation, for example prevalent versus

entry date. The timing of the event date (F1) relative to the specific

incident exposure (D1).64 If the latter, the criteria used to define inci-

codes used and restrictions to certain care settings or diagnosis posi-

dence, including the washout window, should be clearly specified

tion should be reported if they are part of the outcome definition

(C11). For example, incidence with respect to the exposure of interest

(F2). If the algorithm used to define the outcome was previously

only, the entire drug class, exposure and comparator, etc. When rele-

validated, a citation and performance characteristics such as positive

vant, place of service used to define exposure should also be specified

predictive value should be reported (F3).

(e.g. inpatient versus outpatient).

The same considerations outlined above for outcome definition

Type of exposure (D1), when exposure is assessed and duration of

apply to covariates (G1, G4). If a comorbidity score is defined for the

exposure influence who is selected into the study and how long they

study population, there should be a clear description of the score

are followed. When defining drug exposures, investigators make

components, when and how they were measured, and the weights

decisions regarding the intended length of prescriptions as well as

applied (G2, Appendix C). Citations often link to papers which evaluate

hypothesized duration of exposure effect. Operationally, these

multiple versions of a score, and it can be unclear which one was

definitions may involve induction periods, algorithms for stockpiling of

applied in the study. When medical utilization metrics are reported,

re‐filled drugs, creating treatment episodes by allowing gaps in exposure

there should be details about how each metric is calculated as part of

of up to X days to be bridged, extending the risk window beyond the end

the report (G3). For example, in counts of medical utilization, one must

of days’ supply or other algorithms (D2, D3). The purpose of applying

be clear if counts of healthcare visits are unique by day or unique by

such algorithms to the data captured in healthcare databases is to more

encounter identifier and whether they include all encounters or only

accurately measure the hypothesized biologic exposure risk window

those from specific places of service. Hospitalizations are sometimes

(ERW). The ERW is specific to an exposure and the outcome under

“rolled up” and counted only once if the admission and discharge dates

investigation. For drug exposures, it is equivalent to the difference

are contiguous or overlapping. Patients may have encounters in multi-

between the minimum and maximum induction time following ingestion

ple care settings on the same date. All encounters may be counted or

of a molecule.65,66 Similar decisions are necessary to define timing and

an algorithm applied to determine which ones are included in utilization

duration of hypothesized biologic effect for non‐drug exposures. These

metrics. Different investigator choices will result in different counts.

decisions are necessary to define days at risk while exposed and should

If sampling controls for a case‐control study, how and when con-

be explicitly stated. There may be data missing for elements such as

trols are sampled should be clearly specified. Reporting should include

days’ supply or number of tablets. Decisions about how to handle

the sampling strategy (H1), whether it is base case, risk set or survivor

missingness should be articulated. When describing the study popula-

sampling. If matching factors are used, these should be listed and the

tion, reporting on the average starting or daily dose can facilitate under-

algorithms for defining them made available (H2). The number and

standing of variation in findings between similar studies conducted in

ratio of controls should be reported, including whether the ratio is

different databases where dosing patterns may differ. Specific codes,

fixed or variable and whether sampling is with or without replacement

formulations, temporality, diagnosis position and care settings should

(H3). If multiple potential matches are available, the decision rules for

be reported when relevant (D4).

which to select should be stated.

For some studies, exposure is assessed after study entry (D5). For

In addition, the statistical software program or platform used to

example, a study evaluating the effect of treatment intensification ver-

create the study population and run the analysis should be detailed,

sus no intensification on disease progression after a hospitalization

including specific software version, settings, procedures or packages (I1).

could define study entry as the date of discharge and follow up for out-

The catalogue of items in Table 2 are important to report in detail

comes after an exposure assessment window (EAW) during which

in order to achieve transparent scientific decisions defining study pop-

treatment intensification status is defined. The ERW and follow up

ulations and replicable creation of analytic datasets from longitudinal

for an outcome should not begin until after EAW has concluded.67

healthcare databases. We have highlighted in Table 3 key temporal

The timing of EAW relative to study entry and follow up should be

anchors that are essential to report in the methods section of a paper,

clearly reported when relevant.

ideally accompanied with a design diagram (Figure 2). Other items from

The analytic follow up window (FW) covers the interval during

Table 2 should be included with peer reviewed papers or other public

which outcome occurrence could be influenced by exposure (E1).

reports, but may be reported in online appendices or as referenced

The analytic follow up is based on the biologic exposure risk, but the

web pages.

actual time at risk included may also be defined by censoring mecha-

After creating an analytic dataset from raw longitudinal data

nisms. These censoring mechanisms should be enumerated in time to

streams, there are numerous potential ways to analyze a created ana-

event analyses (E2). Reasons for censoring may include events such

lytic dataset and address confounding. Some of the most common

as occurrence of the outcome of interest, end of exposure, death,

methods used in healthcare database research include multivariable

disenrollment, switching/adding medication, entering a nursing home,

regression and summary score methods (propensity score or disease

or use of a fixed follow‐up window (e.g. intention to treat).

risk score matching, weighting, stratification).68,69 Other methods

Outcome surveillance decisions can strongly affect study results.

include instrumental variable analysis, standardization and stratifica-

In defining the outcome of interest, investigators should specify

tion. Each of these methods comes with their own set of assumptions

whether a washout period prior to the study entry date was applied

and details of implementation which must be reported to assess

WANG

1029

ET AL.

adequacy of those methods and obtain reproducible results. In the

healthcare database research could be achieved if specific design and

appendix, we highlight important descriptive or comparative results

operation

to report for several commonly used analytic methods (Appendix D).

researchers to prepare appendices that report in detail 1) data source

decisions

were

routinely

reported.

We

encourage

provenance including data extraction date or version and years covered, 2) key temporal anchors (ideally with a design diagram), 3)
detailed algorithms to define patient characteristics, inclusion or exclu-

4

DISCUSSION

|

sion criteria, and 4) attrition table with baseline characteristics of the
study population before applying methods to deal with confounding.

Evidence generated from large healthcare databases is increasingly

The ultimate measure of transparency is whether a study could be

being sought by decision‐makers around the world. However, publica-

directly replicated by a qualified independent investigator based on

tion of database study results is often accompanied by study design

publically reported information. While sharing data and code should

reported at a highly conceptual level, without enough information for

be encouraged whenever data use agreements and intellectual prop-

readers to understand the temporality of how patients entered the

erty permit, in many cases this is not possible. Even if data and code

study, or how exposure, outcome and covariates were operationally

are shared, clear, natural language description would be necessary for

defined in relation to study entry. Only after decision‐makers and

transparency and the ability to evaluate the validity of scientific

peer‐reviewers are reasonably confident that they know the actual

decisions.

steps implemented by the original researchers can they assess whether

In many cases, attempts from an independent investigator to

or not they agree with the validity of those choices or evaluate the

directly replicate a study will be hampered by data use agreements that

reproducibility and rigor of the original study findings.

prohibit public sharing of source data tables and differences in source

Stakeholders involved in healthcare are increasingly interested in

data tables accessed from the same data holder at different times.

evaluating additional streams of evidence beyond randomized clinical

Nevertheless, understanding how closely findings can be replicated

trials and are turning their attention toward real‐world evidence from

by an independent investigator when using the same data source over

large healthcare database studies. This interest has led to groundbreak-

the same time period would be valuable and informative. Similarly,

ing infrastructure and software to scale up capacity to generate data-

evaluation of variation in findings from attempts to conceptually

base evidence from public and private stakeholders. The United

replicate an original study using different source data or plausible

States FDA's Sentinel System is one example of a large scale effort to

alternative parameter choices can provide substantial insights. Our

create an open source analytic infrastructure. Supported by FDA to

ability to understand observed differences in findings after either

achieve its public health surveillance mission, the tools and infrastruc-

direct or conceptual replication relies on clarity and transparency of

ture are also available to the research community through Reagan Udall

the scientific decisions originally implemented.

Foundation's IMEDS system. Sentinel has committed itself to transpar-

This paper provides a catalogue of specific items to report to

ency through online posting of study protocols, final reports, and study

improve reproducibility and facilitate assessment of validity of

specifications, including temporal anchors, how data are processed into

healthcare database analyses. We expect that it will grow and change

a common data model, and study design details. Similarly, the Canadian

over time with input from additional stakeholders. This catalogue could

government, the European Medicines Agency (EMA) and several coun-

be used to support parallel efforts to improve transparency and repro-

tries in Asia have developed consortia to facilitate transparent evidence

ducibility of evidence from database research. For example, we noted

generation from healthcare databases, including the Canadian Network

that the terminology used by different research groups to describe

for Observational Drug Effect Studies (CNODES),8 Innovative Medi-

similar concepts varied. A next step could include development of

cines Initiative (IMI), ENCePP70 and others.9

shared terminology and structured reporting templates. We also had

These efforts have made great strides in improving capacity for

consensus within our task force that a limited number of parameters

transparent evidence generation from large healthcare databases,

are absolutely necessary to recreate a study population, however there

however many involve closed systems that do not influence research

was disagreement on which. Empirical evaluation of the frequency and

conducted outside of the respective networks. Currently, there is not

impact of lack of transparency on the catalogue of specific operational

a clear roadmap for how the field should proceed. This is reflected in

parameters on replicability of published database studies would be a

policies around the world. In the US, the recently passed 21st Century

valuable next step. Empirical data could inform future policies and

Cures Act and Prescription Drug User Fee Act (PDUFA VI) include

guidelines for reporting on database studies for journals, regulators,

sections on evaluating when and how to make greater use of real

health technology assessment bodies and other healthcare decision‐

world evidence to support regulatory decisions. In the EU, there is

makers, where greater priority could be placed on reporting specific

exploration of adaptive pathways to bring drugs to market more

parameters with high demonstrated influence on replicability. It could

quickly by using healthcare database evidence to make approval deci-

also help stakeholders create policies that triage limited resources by

and active work on harmonizing policies on use of real ‐world

focusing on database evidence where reporting is transparent enough

evidence from databases to inform health technology assessment

that validity and relevance of scientific choices can be assessed. By

decisions.12

aligning incentives of major stakeholders, the conduct and reporting

11

sions

Regardless of whether a study is conducted with software tools or

of database research will change for the better. This will increase the

de novo code, as part of a network or independently, a substantial

confidence of decision‐makers in real‐world evidence from large

improvement in transparency of design and implementation of

healthcare databases.

1030

WANG

E TH I CS S T AT E M ENT
The authors state that no ethical approval was needed.

Martijn Schuemie PhD

Janssen

Lisa Shea

MPG

ET AL.

Emilie Taymore MSc

MSD

ACKNOWLEDGMENTS

David L Van Brunt PhD

AbbVie, Inc.

Provided comments during ISPE and ISPOR membership review

Amir Viyanchi MA, RN,

Medical Sciences

period:

PhD student

University of Shahid

Shohei Akita PhD

Beheshti, Hamedan, Iran

Pharmaceuticals and
Medical

Xuanqian Xie MSc

Health Quality
Ontario

Devices Agency
René Allard PhD

Grünenthal GmbH

Dorothee B Bartels MSc, PhD

Head of Global

ORCID

Epidemiology,

Shirley V. Wang

http://orcid.org/0000-0001-7761-7090

Boehringer Ingelheim

Joshua J. Gagne

http://orcid.org/0000-0001-5428-9733

GmbH
Tânia Maria

Instituto Nacional

Beume MBA

de Cancer

Lance Brannman PhD

AstraZeneca

Michael J Cangelosi MA MPH

Boston Scientific

Gillian Hall PhD

Grimsdyke House

Kenneth Hornbuckle DVM, PhD, MPH

Eli Lilly

Hanna Gyllensten PhD

Karolinska Institute

Kris Kahler SM, PhD

Novartis

YH Kao

Institute of Bio
pharmaceutical Science,
College of Medicine,
National Cheng Kung
University

Hugh Kawabata PhD

Bristol‐Myers Squibb

James Kaye MD, DrPH

RTI Health Solutions

Lincy Lai PhD, PharmD

Optum

Sinead Langan FRCP MSc PhD

London School of
Hygiene and Tropical
Medicine

Tamar Lasky PhD
Junjie Ma, MS PhD candidate

MIE Resources
University of Utah,
Pharmacotherapy
Outcomes
Research Center

Kim McGuigan PhD MBA

Teva Pharmaceuticals

Montserrat Miret MD, MPH

Nestle Health Science

Brigitta Monz MD MPH MA

Roche, Pharmaceuticals
Division

Dominic Muston MSc

Bayer

Melvin Olsen PhD

Novartis

Eberechukwu Onukwugha, MS, PhD

University of Maryland,
School of Pharmacy

Chris L Pashos PhD (and colleagues)

Takeda

Smitri Raichand

University of New
South Wales
Australia

Libby Roughead PhD, M.App.Sc.

School of Pharmacy and
Medical Sciences,
University of South
Australia

RE FE RE NC ES
1. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. Apr
2005;58(4):323‐337.
2. Califf RM, Robb MA, Bindman AB, et al. Transforming Evidence
Generation to Support Health and Health Care Decisions. N Engl J
Med. 2016;375(24):2395‐2400.
3. Psaty BM, Breckenridge AM. Mini‐Sentinel and regulatory science‐‐big
data rendered fit and functional. N Engl J Med. Jun 5. 2014;370(23):
2165‐2167.
4. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS.
Launching PCORnet, a national patient‐centered clinical research
network. J Am Med Inform Assoc. Jul‐Aug. 2014;21(4):578‐582.
5. Oliveira JL, Lopes P, Nunes T, et al. The EU‐ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf.
2013 May. 2013;22(5):459‐467.
6. As PENc, Andersen M, Bergman U, et al. The Asian
Pharmacoepidemiology Network (AsPEN): promoting multi‐national
collaboration for pharmacoepidemiologic research in Asia.
Pharmacoepidemiol Drug Saf. 2013;22(7):700‐704.
7. Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R.
Developing the Sentinel System‐‐a national resource for evidence
development. N Engl J Med. Feb 10. 2011;364(6):498‐499.
8. Suissa S, Henry D, Caetano P, et al. CNODES: the Canadian Network
for Observational Drug Effect Studies. Open Medicine. 2012;6(4):
e134‐e140.
9. Trifiro G, Coloma PM, Rijnbeek PR, et al. Combining multiple healthcare
databases for postmarketing drug and vaccine safety surveillance: why
and how? J Intern Med. Jun. 2014;275(6):551‐561.
10. Engel P, Almas MF, De Bruin ML, Starzyk K, Blackburn S, Dreyer NA.
Lessons learned on the design and the conduct of Post‐Authorization
Safety Studies: review of 3 years of PRAC oversight. Br J Clin
Pharmacol. Apr. 2017;83(4):884‐893.
11. Eichler H‐G, Hurts H, Broich K, Rasi G. Drug Regulation and Pricing —
Can Regulators Influence Affordability? N Engl J Med. 2016;374(19):
1807‐1809.
12. Makady A, Ham RT, de Boer A, et al. Policies for Use of Real‐World
Data in Health Technology Assessment (HTA): A Comparative Study
of Six HTA Agencies. Value in health: J Int Soc Pharmacoecon Outcomes
Res. Apr. 2017;20(4):520‐532.
13. Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann
A. Development and use of reporting guidelines for assessing the
quality of validation studies of health administrative data. J Clin
Epidemiol. 2011;64(8):821‐829.
14. Nicholls SG, Quach P, von Elm E, et al. The REporting of Studies
Conducted Using Observational Routinely‐Collected Health Data
(RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines. PLoS One. 2015;10(5):e0125620
15. Collins GS, Omar O, Shanyinde M, Yu LM. A systematic review finds
prediction models for chronic kidney disease were poorly reported

WANG

16.

17.

18.

19.
20.
21.

22.
23.
24.
25.
26.
27.

28.
29.

30.

31.

32.
33.
34.

35.

36.

37.

38.
39.

40.

ET AL.

and often developed using inappropriate methods. J Clin Epidemiol.
Mar. 2013;66(3):268‐277.
Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods
in clinical prediction research: a systematic review. PLoS Med. 2012;
9(5):1‐12.
de Vries F, de Vries C, Cooper C, Leufkens B, van Staa T‐P. Reanalysis of
two studies with contrasting results on the association between statin
use and fracture risk: the General Practice Research Database. Int J
Epidemiol. 2006;35(5):1301‐1308.
Jick H, Kaye JA, Vasilakis‐Scaramozza C, Jick SS. Risk of venous thromboembolism among users of third generation oral contraceptives
compared with users of oral contraceptives with levonorgestrel before
and after 1995: cohort and case‐control analysis. BMJ: Br Med J.
2000;321(7270):1190‐1195.
Begley CG, Ellis LM. Drug development: Raise standards for preclinical
cancer research. Nature Mar 29. 2012;483(7391):531‐533.
Kaiser J. The cancer test. Science. Jun 26 2015;348(6242):1411‐1413.
Nosek BA, Alter G, Banks GC, et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science. Jun 26 2015;348(6242):
1422‐1425.
Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility.
Nature Jan 30 2014;505(7485):612‐613.
Goodman SK, Krumholz HM. Open Science: PCORI's Efforts to Make
Study Results and Data More Widely Available. 2015.
Science CfO. Transparency and Openness Promotion (TOP) Guidelines.
http://centerforopenscience.org/top/. Accessed July 20, 2015.
Meta‐Research Innovation Center at Stanford. http://metrics.stanford.
edu/, 2016.
clinicalstudydatarequest.com. http://clinicalstudydatarequest.com/
Default.aspx.
Iqbal SA, Wallach JD, Khoury MJ, Schully SD, Ioannidis JP. Reproducible Research Practices and Transparency across the Biomedical
Literature. PLoS Biol. Jan 2016;14(1):e1002333
Zika Open‐Research Portal. https://zika.labkey.com/project/home/
begin.view?
Hall GC, Sauer B, Bourke A, Brown JS, Reynolds MW, LoCasale R.
Guidelines for good database selection and use in pharmacoepidemiology research. Pharmacoepidemiol Drug Saf Jan. 2012;21(1):1‐10.
Lanes S, Brown JS, Haynes K, Pollack MF, Walker AM. Identifying
health outcomes in healthcare databases. Pharmacoepidemiol Drug Saf
Oct. 2015;24(10):1009‐1016.
Brown JS, Kahn M, Toh S. Data quality assessment for comparative
effectiveness research in distributed data networks. Med Care Aug.
2013;51(8 Suppl 3):S22‐S29.
Kahn MG, Brown JS, Chun AT, et al. Transparent reporting of data
quality in distributed data networks. Egems. 2015;3(1):1052
EMA. ENCePP Guide on Methodological Standards in Pharmacoepidemiology. 2014.
von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting
of Observational Studies in Epidemiology (STROBE) statement:
guidelines for reporting observational studies. PLoS Med Oct 16.
2007;4(10):e296
Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding
Principles for scientific data management and stewardship. Scientific
Data 03/15/online. 2016;3: 160018
Gini R, Schuemie M, Brown J, et al. Data Extraction and Management in
Networks of Observational Health Care Databases for Scientific
Research: A Comparison of EU‐ADR, OMOP, Mini‐Sentinel and
MATRICE Strategies. Egems. 2016;4(1):1189
Berger ML, Sox H, Willke R, et al. Good Practices for Real‐World Data
Studies of Treatment and/or Comparative Effectiveness: Recommendations from the Joint ISPOR‐ISPE Special Task Force on Real‐World
Evidence in Healthcare Decision‐Making. Pharmacoepidemiol Drug Saf.
2017;26(9):1033–1039.
Nosek BA, Errington TM. Making sense of replications. Elife. 2017;6:
e23383
Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies
Conducted using Observational Routinely‐collected health Data
(RECORD) statement. PLoS Med. Oct 2015;12(10):e1001885
Committee PP‐CORIM. The PCORI Methodology Report. 2013.

1031

41. Gagne JW, Wang SV, Schneeweiss, S. FDA Mini‐Sentinel Prospective
Routine Observational Monitoring Program Tool: Cohort Matching.
Technical Users’ Guide version: 1.0. January 2014.
42. Zhou X, Murugesan S, Bhullar H, et al. An evaluation of the THIN database in the OMOP Common Data Model for active drug safety
surveillance. Drug Saf. Feb 2013;36(2):119‐134.
43. Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream
into reality. J Am Med Inform Assoc Jul‐Aug. 2014;21(4):576‐577.
44. Schuemie MJ, Gini R, Coloma PM, et al. Replication of the OMOP
experiment in Europe: evaluating methods for risk identification in
electronic health record databases. Drug Saf Oct. 2013;36(Suppl 1):
S159‐S169.
45. Schneeweiss S, Shrank WH, Ruhl M, Maclure M. Decision‐Making
Aligned with Rapid‐Cycle Evaluation in Health Care. Int J Technol Assess
Health Care. Jan 2015;31(4):214‐222.
46. Public Policy Committee, International Society for Pharmacoepidemiology. Guidelines for good pharmacoepidemiology practice
(GPP). Pharmacoepidemiol Drug Saf. 2016;25(1):2‐10.
47. Wang SV, Verpillat P, Rassen JA, Patrick A, Garry EM, Bartels DB.
Transparency and Reproducibility of Observational Cohort Studies
Using Large Healthcare Databases. Clin Pharmacol Ther Mar. 2016;
99(3):325‐332.
48. Strom BL, Buyse M, Hughes J, Knoppers BM. Data sharing, year 1‐‐
access to data from industry‐sponsored clinical trials. N Engl J Med
Nov 27. 2014;371(22):2052‐2054.
49. Science CfO. Open Science Framework now a recommended repository for the Nature Publishing Group's data journal, Scientific Data.
https://cos.io/pr/2015‐08‐12/, 2016.
50. Observational Medical Outcomes Partnership. http://omop.org/, 2016.
51. Innovation in Medical Evidence Development and Surveillance
(IMEDS). http://imeds.reaganudall.org/AboutIMEDS, 2016.
52. Mini‐Sentinel Prospective Routine Observational Monitoring Program
Tools (PROMPT): Cohort Matching, SAS Code Package, Version 1.0.
June 2014.
53. Essential Evidence for Healthcare. 2016; http://www.aetion.com/evidence/.
54. Center M‐SC. Routine Querying Tools (Modular Programs). 2014;
http://mini‐sentinel.org/data_activities/modular_programs/details.
aspx?ID=166. Accessed June 12, 2017.
55. EMIF. EMIF Platform. http://www.emif.eu/about/emif‐platform.
56. PROTECT. http://www.imi‐protect.eu/.
57. Aetion, Inc.'s Comparative Effectiveness and Safety platforms version
2.2 https://www.aetion.com/, July 7, 2017.
58. Observational Health Data Sciences and Informatics Atlas version 2.0.0
https://ohdsi.org/analytic‐tools/) Accessed July 7, 2017.
59. Observational Health Data Sciences and Informatics CohortMethod
version 2.2.2 https://github.com/OHDSI/CohortMethod. Accessed
July 7, 2017.
60. Innovation in Medical Evidence in Development and Surveillance
(IMEDs) Regularized Identification of Cohorts (RICO) version 1.2
http://imeds.reaganudall.org/RICO. Accessed July 7, 2017.
61. Mini‐Sentinel I. Mini‐Sentinel: Overview and Description of the
Common Data Model v5.0.1. http://www.mini‐sentinel.org/work_
products/Data_Activities/Mini‐Sentinel_Common‐Data‐Model.pdf.
Accessed 2016.
62. OMOP Common Data Model. http://omop.org/CDM. Accessed March
21, 2016.
63. Langan SM, Benchimol EI, Guttmann A, et al. Setting the RECORD
straight: developing a guideline for the REporting of studies Conducted
using Observational Routinely collected Data. Clin Epidemiol. 2013;5:
29‐31.
64. Ray WA. Evaluating Medication Effects Outside of Clinical Trials:
New‐User Designs. Am J Epidemiol November 1, 2003. 2003;158(9):
915‐920.
65. Rothman KJ. Induction and latent periods. Am J Epidemiol. 1981;
114(2):253‐259.
66. Maclure M. The case‐crossover design: a method for studying transient
effects on the risk of acute events. Am J Epidemiol. Jan 15. 1991;133(2):
144‐153.
67. Suissa S. Immortal time bias in pharmaco‐epidemiology. Am J Epidemiol.
Feb 15. 2008;167(4):492‐499.

1032

WANG

68. Brookhart MA, Sturmer T, Glynn RJ, Rassen J, Schneeweiss S.
Confounding control in healthcare database research: challenges and
potential approaches. Med Care Jun. 2010;48(6 Suppl):S114‐S120.
69. Glynn RJ, Schneeweiss S, Sturmer T. Indications for propensity scores
and review of their use in pharmacoepidemiology. Basic Clin Pharmacol
Toxicol Mar. 2006;98(3):253‐259.
70. Agency EM. The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP) 2014; http://encepp.eu/.
Accessed December 13, 2014.

Rosa Gini PhD, MSc

ET AL.

Agenzia regionale di sanità della
Toscana, Florence, Italy

Olaf Klungel PhD

Division of
Pharmacoepidemiology &
Clinical Pharmacology, Utrecht
University

C. Daniel Mullins PhD

Pharmaceutical Health Services
Research Department,

SUPPORTING I NFORMATION

University of Maryland School

Additional Supporting Information may be found online in the
supporting information tab for this article.

of Pharmacy
Michael D. Nguyen MD

FDA Center for Drug
Evaluation and Research

How to cite this article: Wang SV, Schneeweiss S, Berger ML,

Jeremy A. Rassen ScD

et al. Reporting to Improve Reproducibility and Facilitate

Liam Smeeth MSc, PhD
Miriam Sturkenboom PhD, MSc

Participated in small group discussion and/or provided
substantial feedback prior to ISPE/ISPOR membership
review

APPENDIX
Contributors to the joint ISPE‐ISPOR Special Task
Force on Real World Evidence in Health Care
Decision Making

Andrew Bate PhD

Pfizer

Alison Bourke MSC, MRPharm.S

QuintilesIMS

Suzanne M. Cadarette PhD

Leslie Dan Faculty of Pharmacy,
University of Toronto

Co‐Lead
Division of

Tobias Gerhard BSPharm, PhD

and Institute for Health,

Pharmacoeconomics, Brigham

Health Care Policy and Aging

and Women's Hospital
Department of Medicine,
Harvard Medical School

Research
Robert Glynn ScD

Brigham & Women's Hospital,

Pharmacoepidemiology and

Department of Medicine,

Pharmacoeconomics, Brigham
Department of Medicine,

Division of
Pharmacoepidemiology,

Division of

and Women's Hospital,

Department of Pharmacy
Practice and Administration

Pharmacoepidemiology and

Sebastian Schneeweiss MD, ScD

Erasmus University Medical
Center Rotterdam

org/10.1002/pds.4295

Shirley V. Wang PhD, ScM

London School of Hygiene and
Tropical Medicine

Validity Assessment for Healthcare Database Studies V1.0.
Pharmacoepidemiol Drug Saf. 2017;26:1018–1032. https://doi.

Aetion, Inc.

Harvard Medical School
Krista Huybrechts MS, PhD

Division of
Pharmacoepidemiology and

Harvard Medical School

Pharmacoeconomics, Brigham
and Women's Hospital,

Writing group

Department of Medicine,
Marc L. Berger MD

Pfizer

Jeffrey Brown PhD

Department of Population

Harvard Medical School
Kiyoshi Kubota MD, PhD

Medicine, Harvard Medical
School
Frank de Vries PharmD, PhD

Dept. Clinical Pharmacy, Maastricht UMC+, The Netherlands

Ian Douglas PhD

London School of Hygiene and
Tropical Medicine

Joshua J. Gagne PharmD, ScD

Division of
Pharmacoepidemiology and
Pharmacoeconomics, Brigham
and Women's Hospital, Department of Medicine,
Harvard Medical School

NPO Drug Safety Research
Unit Japan

Amr Makady

Zorginstituut Nederland

Fredrik Nyberg MD, PhD

AstraZeneca

Mary E Ritchey PhD

RTI Health Solutions

Ken Rothman DrPH

RTI Health Solutions

Sengwee Toh ScD

Department of Population
Medicine, Harvard Medical
School
File Type	application/pdf
File Title	Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0
Author	Shirley V. Wang, Sebastian Schneeweiss, Marc L. Berger, Jeffrey
File Modified	2019-06-25
File Created	2017-09-06