Attachment 7_Statistical Design Plan

Attachment 7_Statistical Design Plan.pdf

Health Center Patient Survey (HCPS_

Attachment 7_Statistical Design Plan

OMB: 0915-0368

Document [pdf]
Download: pdf | pdf
February 2018

2019 Health Center Patient
Survey

Deliverable 9: Statistical Design Plan

Draft: December 29, 2017
Revision #1: January 22, 2018
Revision#2: February 2, 2018
Final: February 13, 2018

Prepared for
Bonnie Ohri
Health Resources and Services Administration
Bureau of Primary Health Care
Parklawn Building, 5600 Fishers Lane
Rockville, MD 20857

Prepared by
Patrick Chen
Shampa Saha
Kathleen Considine
RTI International
3040 Cornwallis Road
Research Triangle Park, NC 27709

RTI Project Number 0214097.001.100.001.005

RTI Project Number
0214097.001.100.001.005

2019 Health Center Patient
Survey

Deliverable 9: Statistical Design Plan

February 2018

Prepared for
Bonnie Ohri
Health Resources and Services Administration
Bureau of Primary Health Care
Parklawn Building, 5600 Fishers Lane
Rockville, MD 20857

Prepared by
Patrick Chen
Shampa Saha
Kathleen Considine
RTI International
3040 Cornwallis Road
Research Triangle Park, NC 27709

_________________________________
RTI International is a registered trademark and a trade name of Research Triangle Institute.

Contents

Section

Page

1.

Introduction

1-1

2.

Target Population

2-1

3.

Overview of Sample Design

3-1

4.

Grantee Sample Selection

4-1

5.

6.

4.1

Sampling Frame Construction .................................................................... 4-1

4.2

Stratification ........................................................................................... 4-4

4.3

Grantee Sample Allocation ........................................................................ 4-7

4.4

Select Stratified PPS Sample of Grantees .................................................... 4-9

4.5

An Illustrative Stratified Grantee Sample .................................................... 4-9

4.6

Grantee Selection Probability ................................................................... 4-12

Site Sample Selection

5-1

5.1

Determine Eligible Sites within Participating Grantees ................................... 5-1

5.2

Evaluate Distances between Eligible Sites ................................................... 5-2

5.3

Oversampling Sites with Concentrated Patients in Oversampling
Subgroups .............................................................................................. 5-2

5.4

Site Selection and Selection Probability ....................................................... 5-2

Patient Sample Selection

6-1

6.1

Patient Interview Allocation to Grantee ....................................................... 6-1

6.2

Patient Interview Allocation to Sites within Grantee ...................................... 6-1

6.3

Patient Screening and Selection ................................................................. 6-1

6.4

Patient Selection Probability ...................................................................... 6-3

6.5

Patient’s Probability of Inclusion in the Study .............................................. 6-3

7.

Sample Sizes and Statistical Power

7-1

8.

Sample Weights

8-1

8.1

Grantee Sample Selection Weights ............................................................. 8-1

8.2

Site Sample Selection Weights ................................................................... 8-1

iii

9.

8.3

Patient Sample Selection Weights .............................................................. 8-1

8.4

Nonresponse and Poststratification Weight Adjustments ................................ 8-2

Data Collection

9-1

9.1

Schedule ................................................................................................. 9-1

9.2

Costs ...................................................................................................... 9-1

10. Strengths and Limitations of Study Design

10-1

10.1 Strengths .............................................................................................. 10-1
10.2 Limitations ............................................................................................ 10-2
References

iv

R-1

Tables

Number

Page

1-1

Target Sample Sizes for the 2019 HCPS ........................................................... 1-1

3-1

Summary of Features and Benefits of the Sample Design ................................... 3-3

4-1

Grantee Characteristics in the Sampling Frame (2016 UDS) ............................... 4-2

4-2

Distribution of Patients Served in 2016 ............................................................ 4-3

4-3

Patient Population and Target Patient Sample Distribution .................................. 4-3

4-4

Expected Grantee and Patient Yields from Unstratified Random Sampling ............. 4-4

4-5

Definition of First-Level Stratification ............................................................... 4-5

4-6

Grantee Sample Final Stratification.................................................................. 4-6

4-7

Optimum Grantee Sample Allocation................................................................ 4-7

4-8

Grantee Sample Allocation and Sampling Rates in Final Grantee Strata ................ 4-8

4-9

Expected Yield of the Grantee Funding Type and Patients of a Stratified
Disproportionate Sampling ........................................................................... 4-10

4-10

Expected Grantee and Patient Sample Distribution by Region, Urban/Rural
Area, and Grantee Size ................................................................................ 4-11

4-11

Patient Coverage Rates of 210 Grantees in Race/Ethnicity ................................ 4-12

6-1

Oversampling and Non-Oversampling Patient Subgroups .................................... 6-2

7-1

Detecting Differences in Percentage Estimates between the 2019 HCPS and
the 2016 NHIS (Full Sample) .......................................................................... 7-2

7-2

Detecting Differences in Percentage Estimates between the 2019 HCPS and
the 2016 NHIS (Subsample Who Had <200% FPL) ............................................ 7-3

7-3

Detecting Differences in Percentage Estimates between the 2019 HCPS and
the 2016 NHIS (Subsample Who Visited Clinics or Health Centers) ...................... 7-4

8-1

Description and Data Source of Terms in Formulas Calculating Sample
Weights ....................................................................................................... 8-3

v

1. Introduction
The 2019 Health Center Patient Survey (HCPS), sponsored by the Health Resources and
Services Administration (HRSA), aims to collect data on patients who use health centers
funded under Section 330 of the Public Health Service Act. Results from the study will guide
and support the Bureau of Primary Health Care (BPHC) in its mission to improve the health
of the nation’s underserved communities and vulnerable populations by assuring access to
comprehensive, culturally competent, quality primary health care service. The 2019 HCPS
will collect data from the patients of health centers funded through four BPHC grant
programs: the Community Health Center (CHC) program, the Migrant Health Center (MHC)
program, the Health Care for the Homeless (HCH) program, and the Public Housing Primary
Care (PHPC) program.
Our goal is to recruit 210 grantees and complete 9,000 interviews, among them 5,100
interviews for the CHC funding program, 1,480 interviews for the MHC funding program,
1,660 interviews for the HCH funding program, and 760 interviews for the PHPC funding
program. Patients from PHPC, MHC, and HCH funding programs will be oversampled. In
addition, to meet BPHC’s research interests in race/ethnicity groups, patients of American
Indian/Alaska Native (AIAN), Native Hawaiian/Pacific Islanders (NHPI), and Asian categories
will be oversampled. Patients aged 65 and older will also be oversampled. The target
sample sizes in three design domains, namely funding program, race/ethnicity, and age
group, are shown in Table 1-1. BHPC is also interested in increasing the veteran patient
sample in the 2019 HCPS; however, the target sample size for veteran patients is not
specified.
Table 1-1.
Funding
Program

Target Sample Sizes for the 2019 HCPS
Target
Sample Size

Race/Ethnicity

Target
Sample Size

Age Group

Target
Sample Size

CHC

5,100

Hispanic

3,170

17 and
younger

2,130

MHC

1,480

Non-Hispanic White

2,250

18–64

5,770

HCH

1,660

Non-Hispanic Black

1,920

65 and older

1,100

760

Non-Hispanic Asian

650

Non-Hispanic AIAN

670

Non-Hispanic NHPI

200

Non-Hispanic Others

140

PHPC

1-1

2019 Health Center Patient Survey

In this report, we define the target population of the 2019 HCPS in Section 2. An overview
of sample design is presented in Section 3, and a detailed discussion of the proposed
three-stage sample design is presented in Sections 4 through 6. An illustrative example of
the grantee sample using the 2016 BPHC’s Uniform Data System (UDS) data is also
presented. In Section 7, we discuss sample sizes and power calculation in the context of
the illustrative example. Section 8 details the procedure for calculating sample weights.
Data collection schedules and costs are presented in Section 9. In Section 10, we list
some strengths and limitations of the statistical design.

1-2

2. Target Population
The target population for the 2019 HCPS is composed of persons who meet the definition of
a health center patient used in the BPHC’s UDS. These persons receive face-to-face services
from a CHC, MHC, HCH, or PHPC grantee clinical staff member who exercises independent
judgment in the provision of services.1 Patients from grantees located within the 50 United
States and the District of Columbia are included, while patients from grantees within U.S.
territories and possessions are excluded.
Only persons who received services through one of these grantees at least once in the year
prior to the current visit are considered eligible for the survey. This eligibility criterion will be
used because many of the questions in the survey ask about services received in the past
year; individuals without previous visits will not be able to answer these questions and,
therefore, are not considered eligible. This eligibility criterion was also implemented in the
BPHC’s 2014 HCPS, 2009 Primary Health Care Patient Surveys (PHCPS), the 2002
Community Health Center Survey, and the 2003 Healthcare for Homeless Survey.

To meet the criterion for “independent judgment,” the provider must be acting on his/her own when
serving the patient and not assisting another provider.
1

2-1

3. Overview of Sample Design
In the 2019 HCPS, the primary analytic units are patients who receive services from health
center sites2 in funded grantees. The patients are clustered within sites and the sites are
clustered within grantees. RTI International3 will use a stratified three-stage sample design.
The grantees are the first stage of selection units, also known as the primary sampling units
(PSUs). Sites within selected grantees are the second stage of selection units, and patients
within selected sites comprise the third stage of selection units. We expect to achieve the
target sample sizes for race/ethnicity by oversampling grantees and site(s) with patients
concentrated in one of the three race/ethnicity categories (AIAN, Asian, NHPI) at the first
and second stages, and oversampling patients in three race/ethnicity categories at the third
stage. We will identify and oversample sites with patients concentrated in the 65 and older
age group or veteran patients at the second stage, and oversample those patients at the
third stage as well.
At the first stage, grantees will be selected using the stratified probability proportional to
size (PPS) sampling method (Kish, 1995). Grantees participating in PHPC, MHC, and HCH
funding programs and grantees with AIAN-, Asian-, and NHPI-concentrated patients will be
oversampled. The oversampling is achieved by stratification and application of different
selection probability among strata. The explicit stratification is based on the type of funding
a grantee receives; the stratum of grantees receiving CHC funding only is further stratified
according to the proportions of patients in one of the three oversampling race/ethnicity
categories. In addition, sorting the grantee frame by region, urbanicity, and grantee size
(large, medium, or small4) before selecting grantee sample serves as the implicit
stratification, and ensures that the grantee sample has good coverage of regions, urban and
rural areas, and grantee sizes. Because of the high costs associated with recruiting a
grantee and hiring a field interviewer (FI) to perform the data collection, we will select an
independent site and patient sample from each funding program for grantees receiving
multiple funding programs.
At the second stage, sites will be selected within participating grantees, and a maximum of
three sites per funding program is allowed in each grantee. If a grantee has three or fewer
sites in a funding program, all eligible sites will be selected, assuming they are in
reasonable proximity for an FI. A grantee with more than three sites in a funding program
will have three sites selected using PPS sampling, based on the number of patients served.
When all sites for a funding program in a grantee have small patient volumes, more than

We refer “health center sites” as “sites” throughout this document.
RTI International is a trade name of Research Triangle Institute.
4 Eligible grantees are sorted by the patient volume in each grantee, and then the top third of
grantees are as classified large, the middle third of grantees as medium, and the bottom third of
grantees as small.
2
3

3-1

2019 Health Center Patient Survey

three sites could be selected to alleviate difficulties of completing assigned interviews
because of low patient volumes. Again, to ensure successful oversampling of AIAN, Asian,
NHPI patients, patients aged 65 and older, and veteran patients, sites with patients
concentrated in those subgroups will be identified and oversampled.
At the third stage, patients will be selected as they enter the site and register with the
receptionist. Patients in three oversampling race/ethnicity categories, patients aged 65 and
older, and veteran patients will be identified and oversampled; that is, they will have a
higher probability of selection than patients who are not in the oversampling subgroups. The
receptionist will refer the first eligible patients who are not in the oversampling subgroups to
the FI when the FI indicates he/she is ready for the next interview. The receptionist will
refer patients in oversampling subgroups to the FI more frequently. We are considering
developing a computer-based system for receptionists to screen and refer patients. For each
funding program, the same number of patient interviews will be completed from each
grantee to reduce unequal weighting effects (UWE) and maintain a balanced workload
across grantees. However, we may increase the number of patient interviews for grantees
with patients concentrated in oversampling race/ethnicity subgroups. The total number of
patient interviews within a grantee will be divided among multiple sites if more than one site
is selected for a funding program.
In our design, we take every measure to meet the design goals and reduce the design effect
(Deff 5) caused by clustering and oversampling. In summary, we present key elements of
the sample design and the associated benefits in Table 3-1.

The design effect (Deff) is a measure of the precision gained or lost by using the more complex
design instead of a simple random sample. For a multistage cluster sample like the 2019 Health
Center Patient Survey, Deff is a function of the clustering effect and the unequal weighting effect
(UWE) and can be defined as Deff = UWE*(1 + (m-1)*ICC), where m is the number of patient
interviews within a grantee, ICC is the intracluster correlation coefficient that measures the degree of
similarity among elements within a cluster, and UWE measures variation in the sample weight. Deff
can be reduced by reducing either UWE or the clustering effect or both.
5

3-2

Section 3 — Overview of Sample Design

Table 3-1.

Summary of Features and Benefits of the Sample Design
Key Design Features

Pros, Cons, and Comments

First Stage: Grantee Sample Selection (recruit 210 grantees)
Stratification

PROS: Ensures a representative grantee sample
and enough grantees are selected for each funding
program; ensures the selected grantees have good
coverage of patients in oversampling
race/ethnicity subgroups.

Oversample PHPC, MHC, and HCH grantees;
grantees with a high proportion of patients in
three oversampling race/ethnicity categories.

PROS: Achieves oversampling goals in funding
type, and ensures selecting grantees with patients
concentrated in three oversampling race/ethnicity
subgroups.
CONS: Disproportionate sampling increases UWE.
COMMENTS: Selecting a PPS grantee sample
from each stratum can reduce UWE. Grantee
sample allocation is determined by minimizing
UWE.

Select independent sample for each funding
program if grantee received grants from
multiple programs.

PROS: Reduces data collection costs and helps to
reduce clustering effect.

Second Stage: Site Sample Selection (up to three sites per funding program)
Select multiple sites if a grantee has more
than one site. Up to three sites will be selected
for each funding program per grantee. More
than three sites could be selected in special
situations (e.g., all sites for a funding program
have low patient volume).

PROS: Reduces clustering effect. For the funding
program with more than three sites, PPS selection
of sites reduces UWE, too.
CONS: Site selection process is tedious. Managing
data collection from multiple sites can be costly.
COMMENTS: Select sites within reasonable
proximity for an FI.

Oversample sites with patients concentrated in
three oversampling race/ethnicity categories,
patients aged 65 and older, and veteran
patients

PROS: Achieves oversampling goals.
CONS: Disproportionate sampling increases UWE.

Third Stage: Patient Sample Selection (complete 9,000 interviews)
Within each funding program, allocate the
same number of interviews to each grantee.
For grantees with patients concentrated in
oversampling race/ethnicity subgroups, the
number of interviews may be increased.

PROS: Creates even workload for FIs and reduces
clustering effect.

Select random sample as patients enter site
and are registered.

PROS: Is suitable for the mobile nature of some
of the target population.

Allocate interviews evenly to sites that are
selected through PPS.

PROS: Maintains roughly equal weights within a
stratum, thus reducing UWE; creates even
workload for FIs.

Allocate interviews to sites proportional to
patient size at sites (for grantees with two or
three sites).

PROS: Reduces UWE.

Oversample patients in three oversampling
race/ethnicity categories, patients aged 65 and
older, and veteran patients.

PROS: Achieves oversampling goals.
CONS: Disproportionate sampling increases UWE.
COMMENTS: Screen patients and divide them into
oversampling or non-oversampling subgroups.

3-3

4. Grantee Sample Selection
This section discusses the first stage of sample selection: the selection of grantees. It
covers sample frame construction, stratification, sample allocation, and selection of
stratified PPS grantee samples. An illustrative grantee sample is also presented, and the
calculation of grantee selection probability is discussed.

4.1

Sampling Frame Construction

BPHC UDS grantee-level data from the most recent available year will be used to construct
a sampling frame for the first stage of selection. The UDS is compiled each year from annual
data submissions by each Section 330-funded grantee. The UDS contains data on the
number of patients served; grantee characteristics, such as the type(s) of grant funding
received; state; urbanicity; and number of sites. The grantee characteristics will be used in
stratification. In this report, we use data from the 2016 UDS to illustrate the statistical
design plan. Once the Office of Management and Budget (OMB) approval has been received,
the final sample will be drawn using the most current UDS data.
The 2016 UDS data were collected from 1,367 grantees. Of these, 42 grantees are excluded
from the sampling frame, including:
▪

thirty grantees located in U.S. territories or possessions (i.e., those in Puerto
Rico, the Virgin Islands, and the Pacific Basin);

▪

three grantees with fewer than 300 patients;

▪

nine grantees that received MHC funding only and that served clients through a
voucher program; and

▪

any grantee that has exited or will soon be exiting the Section 330 Program.

There was no grantee in the 2016 UDS, which operated only in schools. The grantee
sampling frame includes 1,325 eligible grantees that reported in 2016. We show the
distribution of key grantee characteristics in Tables 4-1, 4-2, and 4-3. Table 4-1 breaks
down the grantees by funding program, region, urban/rural status, and number of sites
within a grantee. In the grantee sampling frame, 933 grantees had a single funding
program, while 392 grantees received funding from multiple programs. A total of 1,258
grantees (94.9%) received CHC funding, either solely or in combination with other funding
programs; 288 grantees (21.7%) received HCH funding, either solely or in combination with
other funding programs; 158 grantees (11.9%) received MHC funding, either solely or in
combination with other funding programs; and only 97 grantees (7.3%) received PHPC
funding, either solely or in combination with other funding programs. Roughly 66.0% of
grantees received CHC funding solely.

4-1

2019 Health Center Patient Survey

Table 4-1.

Grantee Characteristics in the Sampling Frame (2016 UDS)

Domain Category

Number of Grantees

Percent Distribution

Funding Program Received
C

874

65.96

H

52

3.92

M

1

0.08

P

6

0.45

CH

159

12.00

CM

117

8.83

CP

34

2.57

MH

1

0.08

PH

7

0.53

CMH

24

1.81

CMP

5

0.38

CPH

35

2.64

CMPH
Total

10
1,325

0.75
100%

Regiona
Northeast

233

17.58

Midwest

259

19.55

South

445

33.58

West

388

29.28

Total

1,325

100%

Urban/Rural Location
Urban

751

56.68

Rural

574

43.32

Total

1,325

100%

Number of Sites
1

134

10.11

2

181

13.66

3

150

11.32

4–9

526

39.70

10–14

166

12.53

15–19

69

5.21

≥ 20
Total

99
1,325

7.47
100%

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program; multiple acronyms used together
indicate that funding was received from multiple programs (e.g., CMH = a grantee received CHC,
MHC, and HPC funding; CMP = a grantee received CHC, MHC, and PHPC funding).
a

“Region” refers to the census region.

4-2

Section 4 — Grantee Sample Selection

Table 4-2.

Distribution of Patients Served in 2016
Patient Distribution

Number of Patients

Range of Number of Patients
Minimum

341

25th percentile (Q1)

5,535

Median

11,722

75th percentile (Q3)

23,589

Maximum

203,922

Mean Number of Patients per Grantee

19,145

Total Number of Patients Across All Grantees

25,367,510

The number of sites within a grantee ranged from 1 to 89, and 1,010 grantees had at least
3 sites, with an average of about 7.7 sites per grantee. The South had 445 grantees, while
the West had 388 grantees. The Northeast and Midwest had roughly the same number of
grantees each: 233 and 259, respectively. More grantees were in urban areas than were in
rural areas.
Another important grantee characteristic is the number of patients served in 2016 (Table
4-2). Among the 1,325 eligible grantees in the grantee sampling frame, the number of
patients receiving at least one face-to-face encounter for services during 2016 varied
among the grantees, ranging from 341 to 203,992 and averaging 19,145. The total number
of patients was approximately 25.4 million. Table 4-3 displays the patient distributions of
race/ethnicity, age group, and veteran status. It shows that patients in AIAN, Asian, and
NHPI race/ethnicity categories; patients aged 65 and older; and veteran patients are underrepresented. They need to be oversampled to achieve the target sample sizes.
Table 4-3.

Patient Population and Target Patient Sample Distribution

Domain Category

Number of
Patients Served in
2016

Patient
Population
Distribution

Target
Sample Size

Target
Sample
Distribution

Race/Ethnicity
Hispanic

8,467,989

33.38%

3,170

35.22%

Non-Hispanic White

9,216,856

36.33%

2,250

25.00%

Non-Hispanic Black

4,791,854

18.89%

1,920

21.33%

Non-Hispanic NHPI

129,149

0.51%

200

2.22%

Non-Hispanic AIAN

245,522

0.97%

670

7.44%

Non-Hispanic Asian

861,583

3.40%

650

7.22%

Non-Hispanic Others

797,946

3.15%

140

1.56%

Unreported
Total

856,611

3.38%

n/a

25,367,510

100%

9,000

n/a
100%
(continued)

4-3

2019 Health Center Patient Survey

Table 4-3.

Patient Population and Target Patient Sample Distribution
(continued)

Domain Category

Number of
Patients Served in
2016

Patient
Population
Distribution

Target
Sample Size

Target
Sample
Distribution

Age Group
0–17
18–64
65+
Total

7,853,690

30.96%

2,130

23.67%

15,412,358

60.76%

5,770

64.11%

2,101,462

8.28%

1,100

12.22%

25,367,510

100%

9,000

100%

328,162

1.29%

n/a

25,039,348

98.71%

n/a

25,367,510

100%

9,000

Veteran Status
Veteran
Non-veteran
Total

4.2

n/a
n/a
100%

Stratification

As shown in Table 4-1, majority of grantees receive CHC funding, while relatively few
grantees receive PHPC, MHC, or HCH funding. A random selection of grantees without any
stratification would result in very small grantee sample sizes for PHPC, MHC, and HCH
funding programs. We selected 210 grantees using the simple random sampling method,
and repeated it 100 times. Table 4-4 displays the expected number of grantees6 yielded for
each funding program from unstratified random grantee samples.
Table 4-4.

Expected Grantee and Patient Yields from Unstratified Random
Sampling

Grantee Funding
Type

Number of Grantees
Selected

Target Number of
Complete Patient
Interview

Number of Patients
Required per
Grantee

C

199

5,100

25.6

H

45

1,480

32.9

M

26

1,660

63.9

P

16

760

47.5

286

9,000

42.9

Total

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program.

For a selected grantee participating in multiple funding programs, we take an independent sample
for each funding program. For example, if a grantee receiving both CHC and MHC funding is recruited,
this grantee would be counted as a CHC grantee, and an MHC grantee as well.
6

4-4

Section 4 — Grantee Sample Selection

The unstratified random samples yield 199 CHC grantees, 45 HCH grantees, 26 MHC
grantees, and only 16 PHPC grantees. To meet the target of completed interviews for each
funding program, we have to complete a large number of interviews for the PHPC and MHC
funding programs, which has two implications: (1) the difficulty in recruiting many patients
from PHPC and MHC grantees within a short period of data collection because of the low
patient volumes in PHPC or MHC grantees; and (2) the clustering effect is inflated as the
number of completed interviews per grantee increases, and consequently the estimates will
have low precision and the statistical power of comparison will be reduced.
Stratification is needed to achieve target sample sizes for four funding programs with
relatively small cluster sizes.7 We will group grantees into four exclusive strata according to
the types of funding they receive. These four groups will serve as the first-level strata and
are defined in Table 4-5.
Table 4-5.

Definition of First-Level Stratification

First-Stage Strata

Grantee Funding
Type

Stratum 1: Grantees received PHPC funding solely or in P; CP; PH; CMP;
combination with other programs.
CPH; CMPH

Number of
Grantees in
Sampling Frame
97

Stratum 2: Grantees received MHC funding solely or in
combination with other programs.

M; CM; MH; CMH

143

Stratum 3: Grantees received HCH funding solely or in
combination with other programs.

H; CH

211

Stratum 4: Grantees received CHC funding solely.

C

874

Total

1,325

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program. Multiple acronyms used together
indicate that funding was received from multiple programs (e.g., CMH = a grantee received CHC,
MHC, and HPC funding; CMP = a grantee received CHC, MHC, and PHPC funding).

AIAN, Asian, and NHPI patients are not evenly distributed among all grantees. They tend to
be clustered in a few grantees: 1,032 grantees had fewer than 100 AIAN patients, 1,134
grantees had fewer than 100 NHPI patients, and 678 grantees had fewer than 100 Asian
patients. The 20 grantees with highest proportion of AIAN patients account for 34.0% of
total AIAN patients in all 1,325 grantees; 20 grantees with highest proportion of NHPI
patients account for 41.9% of total NHPI patients; and 20 grantees with highest proportion
of Asian patients account for 29.9% of total Asian patients. Thus, to achieve target sample
sizes in three race/ethnicity categories, grantees with patients concentrated in those three
race/ethnicity categories must be identified and selected at the first-stage selection.
Cluster size is measured as the number of completed interviews within a grantee for a funding
program.
7

4-5

2019 Health Center Patient Survey

Grantees with more than 20% of patients in one of the three race/ethnicity categories are
considered patient-concentrated grantees. Stratum 4 (CHC funding solely) has over 86% of
such grantees, and very few such grantees are from Strata 1, 2, and 3. Therefore, to
effectively select grantees with concentrated patients in three race/ethnicity categories,
Stratum 4 is further divided into four second-level strata according to whether a grantee
has patients concentrated (over 20%) in one of the three race/ethnicity categories. The
result is a total of seven final grantee strata, shown in Table 4-6.
Although some grantees have a high proportion of patients aged 65 and older, these older
patients are distributed more evenly than the patients in three race/ethnicity categories.
The 20 grantees with the highest proportion of patients aged 65 and older only account for
2.69% of total patients aged 65 and older. As a result, oversampling grantees with
concentrated patients aged 65 and older at the first stage of selection will not be as
effective as oversampling grantees with patients concentrated in the three race/ethnicity
categories. Thus, we decide not to oversample grantees with patients concentrated in the
65 and older group.
There are no grantees with patients concentrated in the veteran category; the highest
proportion of veteran patients was 16.1%. Thus, oversampling grantees with patients
concentrated in the veteran category will not be considered.
Table 4-6.

Grantee Sample Final Stratification

First-Stage and Second-Stage Strata

Grantee
Funding Type

Final
Stratum

Number of
Grantees in
Sampling Frame

Stratum 1: Grantees received PHPC funding solely P; CP; PH; CMP;
or in combination with other programs.
CPH; CMPH

1

97

Stratum 2: Grantees received MHC funding solely
or in combination with other programs.

M; CM; MH;
CMH

2

143

Stratum 3: Grantees received HCH funding solely
or in combination with other programs.

H; CH

3

211

Stratum 4: Grantees received CHC funding solely.

C

Stratum 4.1: Grantees with more than 20% of
AIAN patients.

C

4

34

Stratum 4.2: Grantees with more than 20% of
Asian patients.

C

5

33

Stratum 4.3: Grantees with more than 20% of
NHPI patients.

C

6

8

Stratum 4.4: All remaining grantees in Stratum 4.

C

7

799

Total

1,325

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program. Multiple acronyms used together
indicate that funding was received from multiple programs (e.g., CMH = a grantee received CHC,
MHC, and HPC funding; CMP = a grantee received CHC, MHC, and PHPC funding).

4-6

Section 4 — Grantee Sample Selection

4.3

Grantee Sample Allocation

Before selecting a grantee sample from each final stratum, we need to determine the
grantee sample allocation for each final stratum. To minimize the variation in sample
weights introduced by oversampling grantees who received funding from PHPC, MHC, or
HCH programs, we allocate the grantee sample such that a minimum UWE is achieved. We
employed a nonlinear optimization procedure OPTMODEL in SAS,8 which minimizes the UWE
with the following constraints:
▪

Select 210 grantees.

▪

Complete 9,000 interviews.

▪

Complete 5,100 CHC interviews, 1,480 MHC interviews, 1,660 HCH interviews,
and 760 PHPC interviews.

▪

Complete interviews per grantee: 26 for CHC, 25 for MHC, 25 for HCH, and 17 for
PHPC.

▪

Select at least one grantee from each grantee type. 9

The optimum sample allocation to each grantee type is presented in Table 4-7. After
aggregating grantee allocations to the seven final strata and selecting all grantees in Strata
4, 5, and 6, the grantee sample allocation to the seven strata along with the sampling rates
in each stratum are presented in Table 4-8. The sampling rates for Strata 1, 2, 4, 5, and 6
are much higher than the overall sampling rate (21.1%), indicating that we oversample
grantees in these strata.
Table 4-7.

Optimum Grantee Sample Allocation

Domain Category

Number of Grantees

Grantee Sample Allocation

C

874

94

H

52

3

M

1

1

P

6

1

CH

159

23

CM

117

30

CP

34

12

Funding Program Received

(continued)

http://support.sas.com/documentation/cdl/en/ormpug/59679/HTML/default/
viewer.htm#optmodel.htm
9 Grantee type is defined according to what funding program(s) a grantee participated or received
funding from, there are 13 grantee types as shown in Table 4-7.
8

4-7

2019 Health Center Patient Survey

Table 4-7.

Optimum Grantee Sample Allocation (continued)

Domain Category

Number of Grantees

Grantee Sample Allocation

MH

1

1

PH

7

7

CMH

24

13

CMP

5

5

CPH

35

10

CMPH

10

10

1,325

210

Total

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program; multiple acronyms used together
indicate that funding was received from multiple programs (e.g., CMH = a grantee received CHC,
MHC, and HPC funding; CMP = a grantee received CHC, MHC, and PHPC funding).

Table 4-8.

Grantee Sample Allocation and Sampling Rates in Final Grantee Strata

First-Stage and Second-Stage
Strata

Grantee
Selected
Number of
(Assuming
Grantees in
Grantee
75%
Final
Sampling
Sample
Recruitment Sampling
Stratum
Frame
Allocation
Rate)
Rate

Stratum 1: Grantees received PHPC
funding solely or in combination
with other programs.

1

97

45

60

61.9%

Stratum 2: Grantees received MHC
funding solely or in combination
with other programs.

2

143

45

60

42.0%

Stratum 3: Grantees received HCH
funding solely or in combination
with other programs.

3

211

26

35

16.6%

Stratum 4.1: Grantees with more
than 20% of AIAN patients.

4

34

26

34

100.0%

Stratum 4.2: Grantees with more
than 20% of Asian patients.

5

33

25

33

100.0%

Stratum 4.3: Grantees with more
than 20% of NHPI patients.

6

8

6

8

100.0%

Stratum 4.4: All remaining
grantees in Stratum 4.

7

799

37

50

6.1%

1,325

210

280

21.1%

Stratum 4: Grantees received CHC
funding solely.

Total

4-8

Section 4 — Grantee Sample Selection

4.4

Select Stratified PPS Sample of Grantees

As mentioned in Section 4.1, the grantees differ widely in the number of patients served.
PPS sampling is a commonly used method of unequal probability sampling to handle the
large variation in patients served among grantees. In this method, the probability of a
grantee being sampled is proportional to a size measure. The size measure will be the
number of patients who visited the grantee for services from the 2016 UDS file. We will use
PPS sampling to select the grantee sample from each final stratum.
A PPS grantee sample will be selected using the SAS SURVEYSELECT10 procedure with
predetermined sample allocation in Table 4-8 for each final stratum. During the selection, in
addition to the seven strata for grantee sample selection discussed previously, we will sort
the sampling frame by region (Northeast, Midwest, South, and West), urban/rural location,
and the grantee size (large, medium, small) when applying Chromy’s (1981) probability
minimal replacement sequential PPS selection procedure. Sorting the sampling frame by
these key grantee characteristics and then applying the PPS sequential procedure induces
implicit stratification according to the order of the units in a stratum. Therefore, the selected
grantee samples will be distributed among various regions, urban/rural locations, and
various grantee sizes to ensure a representative grantee sample is selected.

4.5

An Illustrative Stratified Grantee Sample

In this section, we present an illustrative example of a stratified grantee sample based on a
simulation study where 100 independent grantee samples are selected, and the results are
averaged over the 100 samples.
In this example, 210 grantees were selected with the sample allocation for the final seven
strata specified in Table 4-8. The PPS sequential method was used to select the grantees
from each of the seven strata, and this process was repeated 100 times. As stated in
Section 4.2, an independent sample was selected for each funding program, if a selected
grantee participated in multiple funding programs. This process yielded 369 grantees for
four funding programs: 206 CHC grantees, 61 HCH grantees, 57 MHC grantees, and 45
PHPC grantees, as shown in Table 4-9. To achieve the interview targets for each funding
program, the expected number of complete interviews per grantee for each funding type
was calculated, as displayed in Table 4-9.11

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#
surveyselect_toc.htm
11 Note that during the sampling plan implementation, the sample realization may yield a slightly
different distribution of grantees for each funding type.
10

4-9

2019 Health Center Patient Survey

Table 4-9.

Funding
Program

Expected Yield of the Grantee Funding Type and Patients of a
Stratified Disproportionate Sampling
Number of Grantees
for Each Funding
Program

Average Number of
Patients per Grantee

Number of Completed
Interviews for Each
Funding Program

C

206

24.8

5,100

H

61

27.2

1,660

M

57

26.0

1,480

P

45

16.9

760

Total

369

9,000

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant
Health Center program; P = Public Housing Primary Care program.

Table 4-10 displays the grantee sampling frame and expected sample distribution by
region, urban/rural area, and grantee size from the illustrative example. In the distribution
of regions, the West has a higher proportion in the grantee sample, while the proportions of
the other three regions in the grantee sample are lower compared to the grantee sampling
frame. This difference is mainly because of oversampling grantees with AIAN- and NHPIconcentrated patients, and most of these grantees are in the West region (Alaska and
Hawaii). The grantee sample has higher proportions in urban areas compared to the grantee
sampling frame; the reason for this difference is that we oversample PHPC grantees and
they are mainly in urban areas. The grantee sample has lower proportions of small- and
medium-size grantees compared to the grantee sampling frame. This disparity occurs
because of the PPS sampling method employed in grantee sample selection, which gives
grantees with large patient volume a better chance of being selected than grantees with a
small patient volume. A best practice is to select more large grantees to lower data
collection costs a large patient volume ensures that the quota per grantee (as shown in
Table 4-9) can be easily met within the data collection time period.
In general, our proposed grantee sample selection algorithm generates grantee samples
that represent different regions, urban/rural areas, and grantee size very well.

4-10

Section 4 — Grantee Sample Selection

Table 4-10. Expected Grantee and Patient Sample Distribution by Region,
Urban/Rural Area, and Grantee Size
Grantee Frame
Domains
Region

N

Expected Grantee Sample
%

n

%

1,325

100.00

210

100.00

Northeast

233

17.58

33

15.67

Midwest

259

19.55

38

18.02

South

445

33.58

48

22.72

West

388

29.28

92

43.59

1,325

100.00

210

100.00

Urban

751

56.68

140

66.76

Rural

574

43.32

70

33.24

1,325

100.00

210

100.00

Large

451

34.04

151

71.70

Medium

437

32.98

36

17.10

Small

437

32.98

24

11.21

Urban/Rural

Grantee Size

NOTE: The grantee sample sizes and proportions are the average from the 100 repeated samples. The
sample sizes may not add up to 210, and proportions may not be exactly the sample sizes divided
by 210 because of rounding.

To evaluate the effectiveness of oversampling grantees with patients concentrated in the
oversampling race/ethnicity categories (AIAN, NHPI, and Asians), we calculated the
coverage rates12 of the three race/ethnicity categories from the sampled 210 grantees (see
Table 4-11). The 210 selected grantees cover 30.8% of patient population from all 1,325
grantees. The coverage rate for AIAN patients is 48.1%, 46.7% for NHPI patients, and
52.6% for Asian patients, while the coverage rate for races other than AIAN, NHPI and
Asian is 27.4%. With the high coverage rates from the selected grantees, additional
oversampling of sites with patients concentrated in the selected categories at the second
selection stage, and oversampling of patients in the three race/ethnicity categories at the
third selection stage, we are confident that we can achieve the oversampling goals in the
three race/ethnicity categories. The oversampling procedure at the second and third stages
of selection is discussed in Sections 5 and 6.

Coverage rate is the ratio of (number of patients in the selected 210 grantees/number of patients in
all 1,325 grantees).
12

4-11

2019 Health Center Patient Survey

Table 4-11. Patient Coverage Rates of 210 Grantees in Race/Ethnicity
Race/Ethnicity

Patient Coverage Rate

American Indian/Alaska Native

48.1%

Asian

52.6%

Native Hawaiian/Pacific Islander

46.7%

Races Other Than AIAN, NHPI and Asian

27.4%

Overall

30.8%

4.6

Grantee Selection Probability

The selection probability for the ith grantee within the hth stratum can be calculated as

G

hi

n

S

hi ,
h S
hi
i

(1)

where h stands for the strata (h = 1, 2, …, 7, corresponding to 7 final strata); i is the index
for grantees on the frame within each stratum; nh is the number of grantees to select in the
hth stratum; and Shi is the size measure, which is the number of patients served by each
grantee. Note that we assume a 75% participation rate among grantees based on the
results of the 2014 HCPS. As a result, nh will be inflated to account for nonresponse among
sampled grantees.
We are aware that applying different sampling rates for each stratum and oversampling at
the second stage and the third stage will cause deviations from a self-weighting design. As a
result, the variations in sample weights will be increased and variances in survey estimates
will be inflated, thereby reducing precision or statistical power in data analysis. To maintain
a near self-weighting design within each stratum, we will select sites within grantees using
PPS sampling at the second stage of selection and select the same number of patients per
grantee in the third stage.

4-12

5. Site Sample Selection
As discussed previously, more than 75% of grantees have three or more sites. In general,
grantees with more sites tend to have more patients. At the first stage of selection,
grantees are selected with the PPS method, which means that grantees with large numbers
of patients have a higher probability of being selected in the sample. As a result, we expect
a fair number of recruited grantees to have more than three sites. We will spread the
sample of patients across multiple sites to reduce the within-grantee clustering effect and
increase the precision of the analysis. We will select up to three sites for each funding
program within a grantee for the 2019 HCPS. When all sites for one funding program in a
grantee have low patient volumes, more than three sites could be selected. This section
discusses the second stage of selection: the selection of sites from participating grantees
that have multiple sites.

5.1

Determine Eligible Sites within Participating Grantees

Once a grantee is recruited and agrees to conduct the study in its sites, our recruiters will
work with the grantee’s administration to identify eligible sites. The following eligibility
criteria will be used, and we will consult with the BPHC Contracting Officer Representative
(COR) to determine the site eligibility on a case-by-case basis whenever it is necessary.

▪

The site should participate in at least one of the four specific funding programs and
must have been operating under the grantee for at least 1 year.

▪

The site is not a school-based health center.

▪

The site is not a specialized clinic, except clinics providing OB/GYN services.

▪

The site does not provide services only through the migrant and seasonal
farmworker voucher screening program.

▪

A site serves at least 100 patients.

After eligible sites are identified, we will collect from or verify with each participating
grantee the following information:

▪

number of eligible sites serving each patient type (i.e., migrant and seasonal
farmworkers, homeless, public housing, and general patients);

▪

address and contact information for each eligible site;

▪

number of patients served in each eligible site, overall and by type of patient (CHC,
MHC, HCH, and PHPC); and

▪

sites with patients concentrated in one of the three race/ethnicity categories (AIAN,
Asian, or NHPI), patients aged 65 and older, or veteran patients.

5-1

2019 Health Center Patient Survey

5.2

Evaluate Distances between Eligible Sites

In most cases, one FI will be hired to collect data for each participating grantee. Therefore,
selected sites must be within manageable distances for the FI(s). The grantees tend to
operate sites in relatively localized areas. Our sampling staff will evaluate distances between
the administrative office/central site and the associated sites. For a specific funding
program, the site with the largest patient volume could be used as the central site.
Typically, sites will be excluded if they are located more than 100 miles from the central
site. However, we will consult with the BPHC COR to determine whether special data
collection arrangements should be made for remote sites.

5.3

Oversampling Sites with Patients Concentrated in
Oversampling Subgroups

To achieve our target sample sizes of AIAN, Asian, and NHPI patients, we will not only
oversample grantees with patients concentrated in these three race/ethnicity groups at the
first stage of selection, but we will also identify sites with patients concentrated in at least
one of the three targeted race/ethnicity categories. Sites with patients concentrated in the
65 and older group or veteran patients will also be identified. These sites will be selected
with higher probabilities than sites without patients concentrated in these categories.

5.4

Site Selection and Selection Probability

If there are three or fewer sites for a patient type (i.e., migrant and seasonal farmworkers,
homeless, public housing, and general patients) and they are within a manageable distance
for one FI, all the sites will be included in the study. If one site is far from the other sites
and the other sites are close to one another, the two sites that are close to each other will
be selected. However, if all three sites are far from one another, we will select the site with
the largest patient volume. Similarly, when two sites for a specific funding program are far
from each other, the one with the largest number of patients will be selected. Again, these
special cases will be reviewed with the COR.
For grantees with more than three sites for a patient type, we will use a PPS sampling
method similar to the one for grantees discussed in Section 4.4 to select three sites from
the sites within a manageable distance. The number of patients served by each site under a
specific funding program will serve as the size measure in the PPS sampling. For the
grantees that participate in multiple funding programs, an independent PPS selection of
sites will be conducted for each funding program, if needed. When sites within a grantee
have low patient volume for a funding program, we may allow selecting more than three
sites so that it is easier to meet the patient interview quota for that grantee.

5-2

Section 5 — Site Sample Selection

The selection probability for the jth site within the ith grantee for funding program f is given
by

C fij



 1 , if 3 or fewer sites are all selected, or


 3s fij , if 3 sites are selected through PPS sampling,

  s fij
 j

(2)

where sfij is the number of patients in site j within grantee i for funding program f. Based on
our experience with the 2014 HCPS, we expect nearly all selected sites within participating
grantees to participate in the 2019 HCPS.

5-3

6. Patient Sample Selection
Because some of the target populations of this study are quite mobile, a random sample of
patients will be selected for interview as they enter the site and register with the
receptionist for services. An FI will visit a selected site for a predetermined number of days
and time slots in the sampling period to conduct interviews. This section of the report
presents the methodology and specifications for selecting patients from participating sites.

6.1

Patient Interview Allocation to Grantee

To achieve the near self-weighting sample of patient interviews within each grantee
stratum, the same number of patients will be interviewed from the grantees in each funding
program. As shown in Table 4-9 in Section 4.5 from the illustrative grantee sample
example, 206 CHC grantees, 57 MHC grantees, 61 HCH grantees, and 45 PHPC grantees are
to be recruited. To achieve 5,100 completed interviews for CHC, we will need to complete
24–25 patient interviews per CHC grantee. We will need 25–26 completed interviews per
MHC grantee to achieve 1,480 interviews for MHC; 26–27 completed patient interviews per
HCH grantee to yield a total of 1,660 interviews for HCH; and 16–17 completed interviews
per PHPC grantee to yield a total of 760 interviews for PHPC.
We may increase the patient interview quota for some grantees with patients concentrated
in the oversampling race/ethnicity categories to achieve the target sample sizes if
necessary.

6.2

Patient Interview Allocation to Sites within Grantee

Within each grantee, we will use different methods to allocate patient interviews to multiple
sites for grantees with three or fewer sites in a funding program and grantees with more
than three sites in a funding program. For grantees with three or fewer sites, the number of
patient interviews within that grantee will be allocated proportionally to the patient size of
the sites. That is,

n fij  n fi

s fij

s
j

fij

,

where nfi is the number of patients selected from a grantee for funding program f. For
grantees with more than three sites that are selected through PPS, the number of selected
patients will be divided equally among three selected sites. Doing so will help to reduce the
UWE.

6.3

Patient Screening and Selection

RTI will design a screening sheet that the receptionist can use to screen and select patients
when a patient enters the site and registers for service. A patient will be considered eligible

6-1

2019 Health Center Patient Survey

if the patient received service through one of the grantees supported by BPHC funding
programs at least once in the past 12 months prior to the current visit. The receptionist will
ask eligible patients questions about their race/ethnicity, age, and veteran status to
determine whether they belong to the oversampling subgroups. If a patient belongs to a
subgroup that will not be oversampled, the receptionist will refer the first eligible patient
registered after the FI has informed the receptionist that he/she is ready for the next
interview. We are considering developing a computer-based system to track patient
eligibility and referral status wherever feasible. If a patient belongs to one of the
oversampling subgroups, the receptionist will always refer the patient. The receptionist will
first read a brief script about the study to the referred patient and direct the patient to the
FI for questions or participation. Table 6-1 shows the oversampling and non-oversampling
subgroups.
Table 6-1.

Oversampling and Non-Oversampling Patient Subgroups

Patient Subgroup

Patient Aged 64 and Younger

Patients Aged 65 and Older

AIAN

Yesa

Yes

Asian

Yes

Yes

NHPI

Yes

Yes

Races Other Than AIAN,
ASIAN, and NHPI

Nob

Yes

Veteran

Yes

Yes

Non-Veteran

No

Yes

Race/Ethnicity

Veteran Status

a

Yes – oversampling.

b

No – non-oversampling.

The receptionist will be asked to keep track of the number of patients who enter the site,
the number of patients who are eligible, and number of patients selected while the FI is at
the site to conduct data collection for each patient subgroup, as shown in Table 6-1. The
receptionist will either use tally marks to count patients as they enter or complete a table
based on the sign-in sheet or appointment list before the FI leaves the site. The patient
count sheets for each FI data collection visit will be sent to RTI for data entry, and counts
will be used to calculate the analysis weights for the study. For sites that have more than
one receptionist, all receptionists must track number of visited, eligible, and selected
patients even though we may only recruit patients using one receptionist. As mentioned
above, if a computer-based system is developed, it will be used to replace this process in
capturing patient eligibility and referral information.

6-2

Section 6 — Patient Sample Selection

If a site is chosen for data collection in multiple funding programs, the FI will screen
participating patients to determine patient population type (i.e., homeless, migrant and
seasonal farmworkers, public housing, or low income) and will use the appropriate
questionnaire to conduct the patient interview.
We will closely monitor the data collection and adjust the sampling rate if necessary to
ensure that target sample sizes in three race/ethnicity categories and patient aged 65 and
older are met, and the sample size for veteran patients is reasonably increased.

6.4

Patient Selection Probability

The selection probability of patient k from grantee i, site j for funding program f is given by

Pfijk 

m fij weeks
M fij 52

,

(3)

where Mfij is the number of eligible patients in the site during the sampling window (number
of weeks) and where mfij is the target number of selected patients inflated for nonresponse.
We may have to estimate the proportion of patients from different funding programs if the
site is selected in data collection for more than one funding program. The proportion of
patients from different funding programs for the grantee or other sites within the grantee
can be used as an approximation. Note that the patient selection probability will be
calculated separately for each patient group as shown in Table 6-1.

6.5

Patient’s Probability of Inclusion in the Study

The probability of a patient being included in the study is the product of Ghi, Cfij, and Pfjik in
Formulas (1), (2), and (3), respectively. That is,

 hfijk 

n h s hi 3s fij m fij weeks
 s hi  s fij M fij 52
i

j

.

(4)

The design is supposed to achieve near self-weighting within each grantee stratum if no
oversampling is conducted when selecting sites at the second-stage selection, and no
oversampling of patients is conducted at the third-stage selection. The oversampling at the
second and third stages causes the deviation from a near self-weighting design, meaning
probabilities in Formula (4) will not be equal within the same grantee stratum. As a result,
the UWE will be inflated.

6-3

7. Sample Sizes and Statistical Power
Statistical tests use data from samples to determine whether a difference exists in a
population or between two populations. An example of a statistical test is testing the null
hypothesis that the proportion of having serious mental illness does not differ between the
population of the 2019 HCPS and general population for the 2016 National Health Interview
Survey (NHIS). The power of the test is the probability that the test will find a statistically
significant difference between two populations given that there is a true difference between
those two populations. There is always a chance that the samples will appear to support or
to refute a tested hypothesis when the reality is the opposite. That risk is quantified as the
statistical significance level. We use a significance level of 0.05 to calculate statistical power
in this document.
To reduce data collection costs and meet the target sample sizes for four funding programs
and for race/ethnicity and age groups, we propose a stratified three-stage clustering design
and oversampling of certain subgroups. Large variations in sample weights caused by
oversampling and the intra-class correlation among patients from the same grantee because
of clustering can increase sampling error, thereby reducing statistical power and precision of
survey estimates. The design effect (Deff) can be used to measure the loss of precision and
statistical power caused by oversampling and clustering. Deff is a function of the clustering
effect and the UWE and can be defined as Deff = UWE * (1 + (m−1) * ICC), where m is the
number of patient interviews within a grantee, ICC is the intracluster correlation coefficient
that measures the degree of similarity among elements within a cluster, and UWE measures
variation in the sample weight. Deff can be reduced by reducing either UWE or the
clustering effect or both. The effective sample size is the target sample size divided by Deff.
Table 7-1 displays the power calculation for proportion estimates between the 2019 HCPS
and 2016 NHIS, showing that minimum differences can be detected with 80% of statistical
power at the 0.05 level for various domains. In the calculation, we used a proportion
(p=0.5); the statistical power is the smallest for proportion estimates when the proportion
is in the middle range (0.4–0.6) because the variance is the largest. The detectable
differences will be smaller if the proportion estimate is out of the middle range.

7-1

2019 Health Center Patient Survey

Table 7-1.

Detecting Differences in Percentage Estimates between the 2019
HCPS and the 2016 NHIS (Full Sample)

44,135

2.0

22,068

3.1

17 and younger

2,130

4.0

533

11,107

2.0

5,554

6.4

18-64

5,770

4.0

1,443

24,126

2.0

12,063

3.9

65 and older

1,100

4.0

275

8,902

2.0

4,451

8.6

Hispanic

3,170

4.0

793

6,212

2.0

3,106

5.5

NH-White

2,250

4.0

563

29,209

2.0

14,605

6.0

NH-Black

1,920

4.0

480

4,830

2.0

2,415

6.8

NH-Asian

650

4.0

163

2,180

2.0

1,090

11.5

NH-AIAN

670

4.0

168

424

2.0

212

14.2

NH-NHPId

200

4.0

50

—

2.0

—

—

Othersd

140

4.0

35

1,280

2.0

640

—

Serious Mental Illness

1,003

4.0

251

914

2.0

457

10.9

Tobacco Use

2,312

4.0

578

5,340

2.0

2,670

6.4

Substance Usee

1,190

4.0

298

5,837

2.0

2,919

8.4

Adult Obesity (18 and older)

3,880

4.0

970

21,889

2.0

10,945

4.6

575

4.0

144

2,996

2.0

1,498

12.0

Diabetes

1,648

4.0

412

3,540

2.0

1,770

7.6

Hypertension

3,299

4.0

825

11,664

2.0

5,832

5.2

812

4.0

203

3,358

2.0

1,679

10.2

Effective
Sample
Size

Estimated
Deff

2,250

Effective
Sample
Size

4.0

Overall

Expected
Sample
Size

Sample
Size

NHISa

9,000

Domain

Estimated
Deffb

HCPS

Detectable
Differencec
%

Age Group

Race/Ethnicity

Health and Chronic Conditions

Child Obesity (17 and
younger)f

Cardiovascular Disease
a

Based on the 2016 NHIS full sample

b

Deff: Design Effect, it measures the loss of efficiency resulting from the use of cluster sampling,
instead of simple random sampling.

c

Difference in percentage estimates will be detected with 80% power at the 0.05 level of significance.

d

Projected sample size was too small for detecting differences with acceptable power.

e

Excluding tobacco and alcohol use. The NHIS sample size was estimated using the same substance
use prevalence rate as in the 2014 HCPS.

f

Defined as obesity when BMI>=25. The NHIS sample size was estimated using the same child
obesity prevalence rate as in the 2014 HCPS.

7-2

Section 7 — Sample Sizes and Statistical Power

Table 7-2.

Detecting Differences in Percentage Estimates between the 2019
HCPS and the 2016 NHIS (Subsample Who Had <200% FPL)

14,506

2.0

7,253

3.4

17 and younger

2,130

4.0

533

4,063

2.0

2,032

6.8

18-64

5,770

4.0

1,443

7,846

2.0

3,923

4.3

65 and older

1,100

4.0

275

2,597

2.0

1,299

9.2

Hispanic

3,170

4.0

793

3,162

2.0

1,581

6.1

NH-White

2,250

4.0

563

7,525

2.0

3,763

6.3

NH-Black

1,920

4.0

480

2,476

2.0

1,238

7.5

NH-Asian

650

4.0

163

596

2.0

298

13.5

NH-AIAN

670

4.0

168

228

2.0

114

17.5

NH-NHPId

200

4.0

50

—

2.0

—

—

Othersd

140

4.0

35

519

2.0

260

—

Serious Mental Illness

1,003

4.0

251

544

2.0

272

12.1

Tobacco Use

2,312

4.0

578

2,476

2.0

1,238

7.0

Substance Usee

1,190

4.0

298

1,918

2.0

959

9.2

Adult Obesity (18 and older)

3,880

4.0

970

6,834

2.0

3,417

5.1

575

4.0

144

1,097

2.0

549

12.9

Diabetes

1,648

4.0

412

1,406

2.0

703

8.6

Hypertension

3,299

4.0

825

3,907

2.0

1,954

5.8

812

4.0

203

1,372

2.0

686

11.0

Effective
Sample
Size

Estimated
Deff

2,250

Effective
Sample
Size

4.0

Overall

Expected
Sample
Size

Sample
Size

NHISa

9,000

Domain

Estimated
Deffb

HCPS

Detectable
Differencec
%

Age Group

Race/Ethnicity

Health and Chronic Conditions

Child Obesity (17 and
younger)f

Cardiovascular Disease
a

Based on the 2016 NHIS who had less than 200% FPL.

b

Deff: Design Effect, it measures the loss of efficiency resulting from the use of cluster sampling,
instead of simple random sampling.

c

Difference in percentage estimates will be detected with 80% power at the 0.05 level of significance.

d

Projected sample size was too small for detecting differences with acceptable power.

e

Excluding tobacco and alcohol use. The NHIS sample size was estimated using the same substance
use prevalence rate as in the 2014 HCPS.

f

Defined as obesity when BMI>=25. The NHIS sample size was estimated using the same child
obesity prevalence rate as in the 2014 HCPS.

7-3

2019 Health Center Patient Survey

Table 7-3.

Detecting Differences in Percentage Estimates between the 2019
HCPS and the 2016 NHIS (Subsample Who Visited Clinics or Health
Centers)

10,171

2.0

5,086

3.5

17 and younger

2,130

4.0

533

2,736

2.0

1,368

7.1

18-64

5,770

4.0

1,443

5,641

2.0

2,821

4.5

65 and older

1,100

4.0

275

1,794

2.0

897

9.5

Hispanic

3,170

4.0

793

1,995

2.0

998

6.6

NH-White

2,250

4.0

563

6,091

2.0

3,046

6.4

NH-Black

1,920

4.0

480

1,059

2.0

530

8.7

NH-Asian

650

4.0

163

441

2.0

221

14.2

NH-AIAN

670

4.0

168

263

2.0

132

16.0

NH-NHPId

200

4.0

50

—

2.0

—

—

Othersd

140

4.0

35

322

2.0

161

—

Serious Mental Illness

1,003

4.0

251

248

2.0

124

15.1

Tobacco Use

2,312

4.0

578

1,286

2.0

643

7.9

Substance Usee

1,190

4.0

298

1,345

2.0

673

9.6

Adult Obesity (18 and older)

3,880

4.0

970

5,002

2.0

2,501

5.3

575

4.0

144

739

2.0

370

13.5

Diabetes

1,648

4.0

412

856

2.0

428

9.6

Hypertension

3,299

4.0

825

2,653

2.0

1,327

6.2

812

4.0

203

742

2.0

371

12.1

Effective
Sample
Size

Estimated
Deff

2,250

Effective
Sample
Size

4.0

Overall

Expected
Sample
Size

Sample
Size

NHISa

9,000

Domain

Estimated
Deffb

HCPS

Detectable
Differencec
%

Age Group

Race/Ethnicity

Health and Chronic Conditions

Child Obesity (17 and
younger)f

Cardiovascular Disease
a

Based on the 2016 NHIS who answered ‘Clinic or health center’ to the question ‘What kind of place
do you go to most often.’

b

Deff: Design Effect, it measures the loss of efficiency resulting from the use of cluster sampling,
instead of simple random sampling.

c

Difference in percentage estimates will be detected with 80% power at the 0.05 level of significance.

d

Projected sample size was too small for detecting differences with acceptable power.

e

Excluding tobacco and alcohol use. The NHIS sample size was estimated using the same substance
use prevalence rate as in the 2014 HCPS.

f

Defined as obesity when BMI>=25. The NHIS sample size was estimated using the same child
obesity prevalence rate as in the 2014 HCPS.

7-4

Section 7 — Sample Sizes and Statistical Power

The power analysis estimates in Table 7-1 shows that the detectable differences are well
below 8% between the 2019 HCPS and the 2016 NHIS for age group, race/ethnicity, and
health and chronic condition domains, except for Non-Hispanic Asian, Non-Hispanic
American Indian/Alaska Native, serious mental illness, substance use, child obesity, and
cardiovascular disease. Tables 7-2 and 7-3 show the detectable differences between 2019
HCPS and two subsamples of the 2016 NHIS. Table 7-2 included respondents who had less
than 200% FPL in the 2016 NHIS, and Table 7-3 included respondents who answered ‘Clinic
or Health Center’ to the question ‘What kind of place do you go to most often’ in the 2016
NHIS.

7-5

8. Sample Weights
Patients, the primary analytic units for the 2019 HCPS, are selected through a three-staged
sample design, as discussed in Sections 4–6. Disproportionate sample selection is used at
all three stages; therefore, the patient samples are not self-weighting. To make inferences
about the target population or any subdomains of the target population, sample weights are
needed. We will calculate base weights for each respondent reflecting each respondent’s
probability of inclusion in the study. To account for nonresponse, a nonresponse adjustment
on the base weight will be calculated. Poststratification adjustment will also be conducted to
adjust for coverage bias and reduce variance.

8.1

Grantee Sample Selection Weights

The first-stage sampling weight for each grantee will be the inverse of the probability of
selection as calculated in Formula (1) in Section 4.6. Therefore, the grantee sample
selection weight for grantee i within the hth stratum is given by

w (1) hi  1 / G hi
8.2

.

(6)

Site Sample Selection Weights

For the grantees that have more than three sites for a specific funding program, a
subsample of three sites was selected as discussed in Section 5.4. Thus, the site sample
selection weight for the jth site within the ith grantee for funding program f is given by

w( 2) fij  1 / C fij
where

C fij

8.3

Patient Sample Selection Weights

,

(7)

is calculated in Formula (2).

From the patient recruitment logs, the number of eligible patients, the number of patients
who were selected by a receptionist and sent to an FI, and the number of patients who
agreed to participate during the patient recruitment time periods will be determined. The
number of patients selected at each site for a specific funding program within a participating
grantee, summed across the days in which the sampling for that site took place, will be
divided by the total number of patients the site served in the year prior to the survey year,
to obtain the probability of selection for each patient as discussed in Section 6.4. Thus, the
patient sample selection weight for the kth patient at the jth site within the ith grantee for
funding program f is given by

w(3) fijk  1 / Pfijk
where

,

(8)

p fijk is calculated in Formula (3).
8-1

2019 Health Center Patient Survey

The product of three weight components discussed above forms the design-based weights
for each patient. That is,

w fijk  w(1) hi  w( 2) fij  w(3) fijk
8.4

.

(9)

Nonresponse and Poststratification Weight Adjustments

To reduce the nonresponse bias on the estimates, the design-based weight

w fijk

will be

adjusted for nonresponse. A nonresponse adjustment will be calculated separately for each
funding program. Since we have age and race information for both respondents and
nonrespondents collected by receptionists, weighting classes will be formed by age group
and race/ethnicity, and a ratio adjustment will be calculated within each class. The
adjustment within each class is calculated as:

Adjnr  s w fijk / r w fijk

,

(10)

where s is for all selected patients and r is for respondents.
The poststratification is anticipated to reduce the coverage bias and variance of survey
outcomes, and it will be implemented using RTI’s generalized exponential model (GEM;
Folsom & Singh, 2000). Coverage bias can occur when a set of individuals in a sample does
not match the target population. For example, if there are more young patients in the
study, then estimates based on the sample may be biased if young patients respond to
survey questions differently from patients in other age groups. Poststratification adjustment
adjusts weights so that weights for young patients will be adjusted downward. Thus, the
youth over-representing issue in the sample is corrected. GEM can use more predictors in
the model than the conventional weighting class methods. The predictors will be limited by
available data from the UDS, including age, race/ethnicity, gender, and poverty level. A
separate poststratification adjustment will be conducted for each funding program so that
the sum of final analysis weights from all respondents in a funding program will match the
total number of patients served by the corresponding funding program. The
poststratification adjustment factor denotes

Adj ps .

The final analysis weights for 2019 HCPS are the product of the design-based weights and
two adjustment factors. That is,

ANALWT fijk  w fijk  Adj nr  Adj ps

.

Table 8-1 displays and explains the terms in the formulas from this section and from
Sections 4 through 6 and provides the resource of the information as well.

8-2

(11)

Table 8-1.

Description and Data Source of Terms in Formulas Calculating Sample Weights

Formula

S

hi
G n
hi
h S
hi
i

Terms

G

Prespecified number of grantees selected for the study in RTI calculates the sampling rates and
allocates grantee samples into each
h th stratum
stratum (see example in Table 4-8)

S hi

Number of patients served in the year prior to the
survey year in i th grantee within h th stratum
hi

C fij

Total number of patients the grantees served in the year BPHC’s UDS
prior to the survey year in h th stratum
Selection probability for

j th site within

Output from PROC SURVEYSELECT in
SAS, or equals to 1

i th grantee for

Number of patients served in the year prior to the

S fij

survey year from

j

th

site within i

th

RTI recruiters collect this information
from the grantee or site in recruiting
process

grantee for funding

program f

j

fij

Total number of patients served in the year prior to the
survey year from all sites within i th grantee for funding
program f

Pfijk

Selection probability of patient k from site j of grantee
for funding program f

m fij

Number of selected patients to yield
interview from grantee

M fij

n fij

complete

i , site j for funding program f

Number of patients entered in the site during the
sampling window (number of weeks)

i

Sum of

S fij

within the grantee for a

specific funding program
Calculate from the formula

FI keeps track of the number of
selected patients sent by a receptionist
for each funding program
RTI collects data from receptionists’
tally sheets or computer-based system

8-3

(continued)

Section 8 — Sample Weights

m fij weeks
M fij 52

BPHC’s UDS

funding program f ; equal to 1 if three or fewer sites
are selected, or is calculated if three sites are selected
using PPS

S
Pfijk 

Output from PROC SURVEYSELECT in
SAS

nh

i

C fij

Data Source

Selection probability for the i th grantee within h th
stratum

hi

S


 1, or


 3s fij

  s fij
 j

Description

Description and Data Source of Terms in Formulas Calculating Sample Weights (continued)
Formula

Terms

Description

Data Source

w (1) hi  1 / G hi

w(1) hi

Design weight corresponding to grantee
selection

w( 2) fij  1 / C fij

w ( 2 ) fij

Design weight corresponding to site selection

w(3) fijk  1 / Pfijk

w ( 3) fijk

Design weight corresponding to patient
selection

w fijk  w(1) hi  w( 2) fij  w(3) fijk

w fijk

Design weights for each selected patient

Product of three design-based weight
components corresponding to three
selection stages

Adjnr  s w fijk / r w fijk

Adjnr

A weighting class nonresponse adjustment

Calculate the nonresponse adjustment
within each weighting class separately for
each funding program

W
s

W
r

fijk

fijk

Adj ps

Adj ps

ANALWT fijk  w fijk  Adjnr  Adj ps

ANALWT fijk

Inverse of

G

Inverse of

C fij

Inverse of

Pfijk

Sum of the design weights of all selected
patients for a specific funding program

Sum of

Sum of the design weights of completed
interview for a specific funding program

Sum of

w fijk

hi

of all selected patients

within a weighting class

w fijk

of completed interviews

within a weighting class

Poststratification adjustment done by each
funding program; adjusts weights to BPHC’s
UDS total number of patients for various
demographic domains

Generalized Exponential Model developed
at RTI; control totals are from BPHC’s
UDS

Final analysis weight

Product of design weight, nonresponse,
and poststratification adjustments

2019 Health Center Patient Survey

8-4

Table 8-1.

9. Data Collection
9.1

Schedule

The 2019 HCPS data will be collected over 5.5 months, from mid-July to December 2019.
Typically, a work day will be divided into morning or afternoon time slots. We will send an FI
to a site on predetermined days and time slots. An FI will normally work in multiple sites
from one grantee or multiple grantees. We will determine the FI’s time slots for each site by
considering the production goal of a site, estimated patient volume in a site, the FI’s
working schedule, and the site’s operating schedule. The production goal, which is the
number of completed interviews, varies for each site; it can be as low as five or six
interviews when three sites are selected for a PHPC grantee (16–17 interviews for PHPC per
grantee) or it can be as high as 91–95 interviews when a site is the only site selected for
data collection for all four funding programs (although that scenario rarely happens).
Achieving the production goal at each site should not be difficult in a 5.5-month data
collection window. However, for some sites, because of unexpected low patient volume or
an unusual operating schedule, the production goal could potentially be missed. We will
closely watch the data collection process, and if a delay occurs, we will send an FI more
often to the site. We may have to reduce the production goal for a site and allocate more
interviews to other sites if meeting the production goal proves to be extremely difficult.

9.2

Costs

The three primary field costs are FI labor, mileage incurred by FIs, and incentives paid to
respondents. We estimate that we need 4.8 hours on average to obtain one interview for
the CHC. MHC, PHPC, and HCH patients will require 6.7 hours for interviews done in an
Asian language and 6.8 hours per interview for patients aged 65 and older. These hours
include time for driving to and from a facility, waiting to be approached by eligible patients,
screening potential participants, administering informed consent, administering an
interview, updating field status codes and completing other administrative paper work,
shipping materials back to RTI, and participating in regular conference calls with his/her
field supervisor. We also assume that FIs will require reimbursement for an average of 40
miles per completed interview. Finally, we have budgeted for $25 in incentives for each
survey respondent.

9-1

10. Strengths and Limitations of Study Design
10.1 Strengths
The three-stage sample design will produce a nationally representative sample of grantees,
sites, and patients across the United States, across urban/rural locations, and across
various grantee sizes.
We will create seven grantee strata according to funding program(s) in which a grantee
participated and whether a grantee has patients concentrated in one of the three
race/ethnicity categories (AIAN, Asian, and NHPI). We will oversample grantees receiving
PHPC, MHC, and/or HCH funding and grantees with patients concentrated in one of three
race/ethnicity categories. The stratified disproportionate sample at the grantee selection
stage will yield a grantee sample with more grantees participating in PHPC, MHC, and/or
HCH funding programs and grantees with a large number of patients in three race/ethnicity
categories. These aspects of the design are key so that the target sample sizes for funding
programs and race/ethnicity groups can be met. The optimum grantee sample allocation
procedure reduces UWE. Independent site and patient samples will be selected for each
funding program if a grantee participated in multiple funding programs. This step reduces
data collection cost and increases sampling efficiency because of the large costs of
recruiting a grantee.
Oversampling sites with concentrated patients in one of the three race/ethnicity categories,
patients aged 65 and older, or veteran patients will further guarantee successfully achieving
target sample sizes in the oversampling subgroups. Allocating interviews per funding
program in a grantee to up to three sites when possible will help to reduce the clustering
effect, thus reducing sampling error and improving precision on survey estimates. We will
allow selecting more than three sites for a funding program with low patient volume so that
the grantee patient interview quota can be met more easily.
We will oversample patients at the third selection stage for patients aged 65 and older,
patients in race/ethnicity categories (AIAN, Asian, and NHPI), and veteran patients. We will
closely monitor the data collection on a weekly basis, and adjust the sampling rates and
frequency of an FI on a site to ensure target sample sizes in each subgroup will be met
within the 4-month sampling window.
When the target sample for each funding program is met, BPHC can compare survey
estimates among funding programs. The combined sample of patients from the four funding
programs will be sufficient for comparative analyses with national estimates of U.S.
residents from the NHIS on various survey outcomes at the national level and some
subgroups, such as race/ethnicity, age group, health condition, etc.

10-1

2019 Health Center Patient Survey

10.2 Limitations
The sample size has increased from 6,600 in the 2014 study to 9,000 for the 2019 study so
the precision of survey estimates should improve in the 2019 study. However, oversampling
grantees, sites, and patients at all three stages can cause large variation in sample weights,
thereby increasing variances associated with survey estimates and reducing statistical
power in data analysis. This design efficiency loss caused by oversampling could partially
offset the gain of the increased sample sizes.
An additional limitation is the capture of seasonal variation in health care needs and service
utilization. The time constraints for completing the study within the contract time period
limit the data collection period to 5.5 months, not a full year; thus, the study will not be
able to address any seasonal fluctuations in the types of services provided to the health
center patients during different seasons of the year. The short time period for data
collection may also miss groups of seasonal farmworkers who move from one part of the
country to another during the year. After grantee samples are selected, we will evaluate and
consider the migrant farmer worker situation based on the most current National Agriculture
Workers Survey results. We will plan data collection in MHC grantees accordingly.

10-2

References
Chromy, J. R. (1981). Variance estimations for a sequential sample selection procedure. In
D. Krewski, R. Platek, & J.N.K. Rao, eds. Current Topics in Survey Sampling. New
York: Academic Press, Inc.
Folsom, R. E., & Singh, A. C. (2000). The generalized exponential model for sampling
weight calibration for extreme values, nonresponse, and poststratification.
Proceedings of the American Statistical Association Section on Survey Research
Methods, 598–603.
Kish, L. (1995). Survey Sampling. New York: Wiley Classics Library Edition Published. p.,
217-246.

R-1


File Typeapplication/pdf
AuthorSnaauw, Roxanne
File Modified2018-02-28
File Created2018-02-28

© 2024 OMB.report | Privacy Policy