A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database

ImportanceThe United States (US) Medicare claims files are valuable sources of national healthcare utilization data with over 45 million beneficiaries each year. Due to their massive sizes and costs involved in obtaining the data, a method of randomly drawing a representative sample for retrospectiv...

Full description

Bibliographic Details
Main Authors: Timothy L. McMurry, Jennifer M. Lobo, Soyoun Kim, Hyojung Kang, Min-Woong Sohn
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-02-01
Series:Frontiers in Public Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpubh.2024.1257163/full
_version_ 1797336099866017792
author Timothy L. McMurry
Jennifer M. Lobo
Soyoun Kim
Soyoun Kim
Hyojung Kang
Min-Woong Sohn
author_facet Timothy L. McMurry
Jennifer M. Lobo
Soyoun Kim
Soyoun Kim
Hyojung Kang
Min-Woong Sohn
author_sort Timothy L. McMurry
collection DOAJ
description ImportanceThe United States (US) Medicare claims files are valuable sources of national healthcare utilization data with over 45 million beneficiaries each year. Due to their massive sizes and costs involved in obtaining the data, a method of randomly drawing a representative sample for retrospective cohort studies with multi-year follow-up is not well-documented.ObjectiveTo present a method to construct longitudinal patient samples from Medicare claims files that are representative of Medicare populations each year.DesignRetrospective cohort and cross-sectional designs.ParticipantsUS Medicare beneficiaries with diabetes over a 10-year period.MethodsMedicare Master Beneficiary Summary Files were used to identify eligible patients for each year in over a 10-year period. We targeted a sample of ~900,000 patients per year. The first year's sample is stratified by county and race/ethnicity (white vs. minority), and targeted at least 250 patients in each stratum with the remaining sample allocated proportional to county population size with oversampling of minorities. Patients who were alive, did not move between counties, and stayed enrolled in Medicare fee-for-service (FFS) were retained in the sample for subsequent years. Non-retained patients (those who died or were dropped from Medicare) were replaced with a sample of patients in their first year of Medicare FFS eligibility or patients who moved into a sampled county during the previous year.ResultsThe resulting sample contains an average of 899,266 ± 408 patients each year over the 10-year study period and closely matches population demographics and chronic conditions. For all years in the sample, the weighted average sample age and the population average age differ by <0.01 years; the proportion white is within 0.01%; and the proportion female is within 0.08%. Rates of 21 comorbidities estimated from the samples for all 10 years were within 0.12% of the population rates. Longitudinal cohorts based on samples also closely resembled the cohorts based on populations remaining after 5- and 10-year follow-up.Conclusions and relevanceThis sampling strategy can be easily adapted to other projects that require random samples of Medicare beneficiaries or other national claims files for longitudinal follow-up with possible oversampling of sub-populations.
first_indexed 2024-03-08T08:49:11Z
format Article
id doaj.art-496b9e34872c41fc8e1798840ae5e2db
institution Directory Open Access Journal
issn 2296-2565
language English
last_indexed 2024-03-08T08:49:11Z
publishDate 2024-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Public Health
spelling doaj.art-496b9e34872c41fc8e1798840ae5e2db2024-02-01T12:19:28ZengFrontiers Media S.A.Frontiers in Public Health2296-25652024-02-011210.3389/fpubh.2024.12571631257163A sampling strategy for longitudinal and cross-sectional analyses using a large national claims databaseTimothy L. McMurry0Jennifer M. Lobo1Soyoun Kim2Soyoun Kim3Hyojung Kang4Min-Woong Sohn5Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United StatesDepartment of Public Health Sciences, University of Virginia, Charlottesville, VA, United StatesDepartment of Public Health Sciences, University of Virginia, Charlottesville, VA, United StatesDepartment of Social Welfare, Ewha Womans University, Seoul, Republic of KoreaDepartment of Kinesiology and Community Health, University of Illinois, Champaign, IL, United StatesDepartment of Health Management and Policy, University of Kentucky, Lexington, KY, United StatesImportanceThe United States (US) Medicare claims files are valuable sources of national healthcare utilization data with over 45 million beneficiaries each year. Due to their massive sizes and costs involved in obtaining the data, a method of randomly drawing a representative sample for retrospective cohort studies with multi-year follow-up is not well-documented.ObjectiveTo present a method to construct longitudinal patient samples from Medicare claims files that are representative of Medicare populations each year.DesignRetrospective cohort and cross-sectional designs.ParticipantsUS Medicare beneficiaries with diabetes over a 10-year period.MethodsMedicare Master Beneficiary Summary Files were used to identify eligible patients for each year in over a 10-year period. We targeted a sample of ~900,000 patients per year. The first year's sample is stratified by county and race/ethnicity (white vs. minority), and targeted at least 250 patients in each stratum with the remaining sample allocated proportional to county population size with oversampling of minorities. Patients who were alive, did not move between counties, and stayed enrolled in Medicare fee-for-service (FFS) were retained in the sample for subsequent years. Non-retained patients (those who died or were dropped from Medicare) were replaced with a sample of patients in their first year of Medicare FFS eligibility or patients who moved into a sampled county during the previous year.ResultsThe resulting sample contains an average of 899,266 ± 408 patients each year over the 10-year study period and closely matches population demographics and chronic conditions. For all years in the sample, the weighted average sample age and the population average age differ by <0.01 years; the proportion white is within 0.01%; and the proportion female is within 0.08%. Rates of 21 comorbidities estimated from the samples for all 10 years were within 0.12% of the population rates. Longitudinal cohorts based on samples also closely resembled the cohorts based on populations remaining after 5- and 10-year follow-up.Conclusions and relevanceThis sampling strategy can be easily adapted to other projects that require random samples of Medicare beneficiaries or other national claims files for longitudinal follow-up with possible oversampling of sub-populations.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1257163/fulldiabetesMedicare claimssamplelongitudinal analysiscross-sectional design
spellingShingle Timothy L. McMurry
Jennifer M. Lobo
Soyoun Kim
Soyoun Kim
Hyojung Kang
Min-Woong Sohn
A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
Frontiers in Public Health
diabetes
Medicare claims
sample
longitudinal analysis
cross-sectional design
title A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
title_full A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
title_fullStr A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
title_full_unstemmed A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
title_short A sampling strategy for longitudinal and cross-sectional analyses using a large national claims database
title_sort sampling strategy for longitudinal and cross sectional analyses using a large national claims database
topic diabetes
Medicare claims
sample
longitudinal analysis
cross-sectional design
url https://www.frontiersin.org/articles/10.3389/fpubh.2024.1257163/full
work_keys_str_mv AT timothylmcmurry asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT jennifermlobo asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT soyounkim asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT soyounkim asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT hyojungkang asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT minwoongsohn asamplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT timothylmcmurry samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT jennifermlobo samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT soyounkim samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT soyounkim samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT hyojungkang samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase
AT minwoongsohn samplingstrategyforlongitudinalandcrosssectionalanalysesusingalargenationalclaimsdatabase