Unfortunately, this kind of local-level information is difficult to come by.
This is partly because the documents (database tables) that contain cost information may not have the appropriate geographical location tags in them.
To match healthcare costs to hospitals in your region, you’d need to merge and combine different sources of information that weren’t originally designed to be together.
This is precisely what I did for this analysis.
My goal was to separate New York State (NYS) healthcare cost information by geographical region and report my findings as a range of prices for the most common medical conditions.
To accomplish this task I used the cost transparency dataset from NYS Department of Health and incorporated geographical location information at the county and regional level to make the appropriate groupings.
I’m familiar with this dataset since I worked through it previously when I charted the trends in patient volumes, hospital costs, and pricing markups for the most common medical conditions observed in NYS.
Through this analysis I was able to discover the high variation in hospital costs within and between Western, Upstate and Downstate New York regions.
Price variations can be in the thousands to tens and some times hundreds of thousands of dollars for certain procedures like joint knee replacement surgery.
Moreover, hospitals in Western and Upstate New York regions charge less, relative to the average in the state, while hospitals in Long Island, Lower Hudson Valley and New York City charge 50% and sometimes 100% more than the average in the state.
This article explains my findings.
How was New York State partitioned into regions?To accomplish my goal I incorporated geographical location information to the cost transparency dataset in order to group hospitals by their geographical regions in NYS.
I divided NYS into 9 regions based on their county and region designations from the Department of Environmental Conservation.
Each hospital was assigned to the county it resided in based on its zip code.
Each county was then matched to its corresponding region.
The regions were:Western New York: Central New York, Finger Lakes, Western New YorkUpstate New York: Capital Region, Eastern Adirondacks, Western AdirondacksDownstate New York: Long Island, New York City, Lower Hudson ValleyWhich medical conditions were investigated?Originally, the summary cost transparency database table contained 979,862 rows of entries with multiple pieces of information per entry, one of which was an All Patients Refined Diagnosis Related Group (APR DRG) classification.
Broadly speaking, an APR DRG classification is the medical condition the patient was tagged as being hospitalized for.
Example APR DRG classifications are knee joint replacement surgery, chronic obstructive pulmonary disease and heart failure among others.
According to the cost transparency dataset, in 2016 patients were hospitalized for 315 distinct APR DRG classifications.
Meaning, those nearly 1M entries fell into one of those 315 distinct categories.
For this analysis to make sense, I selected the APR DRG classifications that would be most representative across NYS.
Thus, I chose the classifications which had:the highest patient volumes and thus were treated most often in NYS,were observed in the most hospitals and thus treated throughout NYS.
In the end, 15 conditions were chosen that were seen in >88% of hospitals and all together represented 33% of the entire inpatient population volume in 2016.
Grouped loosely by their physiological system in no particular order they were:Reproduction-related: 1) vaginal delivery, 2) cesarean delivery, 3) normal newborn or newborn with conditionRespiratory: 4) chronic obstructive pulmonary disease (COPD), 5) pulmonary edema & respiratory failure, 6) pneumoniaInfections: 7) septicemia & disseminated infections, 8) cellulitis & other bacterial skin infectionsMusculoskeletal: 9) other musculoskeletal system (MSK) & connective tissue diagnoses, 10) knee joint replacementCardiovascular: 11) heart failure, 12) cardiac arrhythmia & conduction disordersPsychological: 13) schizophreniaGastrointestinal: 14) major gastrointestinal (GI) & peritoneal infectionsRenal: 15) kidney & urinary tract infections (UTI)How do I interpret the results?Medical costs represented as individual values are of limited use.
No two patients are the same so their bills won’t be either.
Rather, knowing a range (in the colloquial sense of the word not statistical) of potential price tags for a procedure can be more helpful for our financial planning.
For example, knowing that the median cost of a procedure is $15,000 is informative.
But, if in addition you knew the distribution of price tags and that the middle 50% of patients were charged between $9,000 — $27,000, you’d be left with a better sense on how much your coinsurance, deductible, co-pays and other healthcare-related expenses may be.
In my example, the phrase ‘middle 50% of patients’ refers to the center of a distribution of healthcare costs.
This is called the interquartile range (IQR).
In descriptive statistics, the IQR is used because it is said to be robust to outliers, in other words is not influenced by them.
In a dataset of healthcare prices, outliers can represent the very expensive or inexpensive cases.
The IQR is plotted using box and whisker plots.
A box a whisker plot has three components: a box, a line inside the box and two whiskers.
The box represents the IQR, the line inside the median, and in my choosing, the two whiskers extend to the 10th and 90th percentiles.
In effect my box and whisker plots capture 80% of all healthcare costs.
The cheapest 10% of cases and most expensive 10% are ignored.
If you need a more detailed explanation on interpreting box and whisker plots, I included a small section at the end of this article for additional clarification.
There are many resources online to better understand percentiles, quartiles and box and whisker plots.
Here is one.
Now, onto my findings.
Healthcare costs vary greatly between the three Downstate New York regions and the rest of the stateWestern New YorkFigure 1: Box and whisker plots showing the IQR, median (red line), 10th and 90th percentiles (whiskers) hospital charges for each of the 15 most common medical conditions (APR DRG classifications) of hospitals within Central NY, the Finger Lakes and Western NY.
Roughly all three regions in Western NY had similar healthcare costs with a few notable exceptions.
Almost all of the 15 APR DRG classifications explored had medians tightly constrained between $9,000 to $17,000.
The three exceptions were neonatal care, vaginal deliveries and knee replacement surgery.
The median cost for neonatal care at hospitals hovered near $3,000.
The median cost for a vaginal deliveries was closer to $7,000.
Since cesarean deliveries are surgical procedures it makes sense that their costs would be significantly greater with median costs of $12,000, almost double the cost to vaginal deliveries.
Surprisingly, knee joint replacement surgeries had the highest median, greatest IQR and widest spread, clearly observed with the whiskers being longest.
According to my analysis, knee replacement surgeries had values of:Central NY: IQR ranged from $29,000 to $48,000 ($37,000 median).
When considering 10th and 90th percentiles hospital costs extended to $27,000 and $80,000 respectively.
Finger Lakes: IQR ranged from $30,000 to $42,000 ($36,000 median).
When considering 10th and 90th percentiles hospital costs extended to $23,000 and $63,000 respectively.
Western NY: IQR ranged from $26,000 to $36,000 ($29,000 median).
When considering 10th and 90th percentiles hospital costs extended to $24,000 and $43,000 respectively.
To place this into perspective, the median cost for a knee replacement surgery was higher than the 90th percentile cost for all of the remaining APR DRG classifications except schizophrenia.
The only place where this trend was not observed was Western NY.
Upstate New YorkFigure 2: Box and whisker plots showing the IQR, median (red line), 10th and 90th percentiles (whiskers) hospital charges for each of the 15 most common medical conditions (APR DRG classifications) of hospitals within the Capital, Eastern and Western Adirondack regions.
Similar trends observed in the Western NY regions were also observed in Upstate NY regions.
Almost all of the 15 APR DRG classifications explored had medians ranging from $11,000 to $21,000.
Again, median costs for neonatal infant care at hospitals in Upstate NY ranged between $3,000 to $5,000.
The median costs for vaginal deliveries ranged between $8,000 to $12,000 while the median costs for cesarean deliveries ranged between $15,000 to $18,000.
The IQR and spread of healthcare costs was narrower in the capital region and Western Adirondacks when compared to the Eastern Adirondacks.
This was the case for septicemia, edema & respiratory failure, pneumonia, major gastrointestinal infections.
As an example, edema & respiratory failure in the Eastern Adirondacks region had an IQR of $22,000 (it ranged from $15,000 to $37,000; $27,000 median).
In the capital region it had an IQR of $13,000 (it ranged from $18,000 to $31,000; $21,000 median), while the Western Adirondacks region had an IQR of $15,000 (it ranged from $12,000 to $27,000; $17,000 median).
A higher IQR indicates healthcare costs are more spread out.
As in Western NY knee joint replacement surgeries had the highest median, greatest IQR and widest spread in hospital charges.
Capital Region: IQR ranged from $27,000 to $43,000 ($33,000 median).
When considering 10th and 90th percentiles hospital costs extended to $22,000 and $59,000 respectively.
Eastern Adirondacks: IQR ranged from $40,000 to $51,000 ($42,000 median).
When considering 10th and 90th percentiles hospital costs extended to $32,000 and $60,000 respectively.
Western Adirondacks: IQR ranged from $30,000 to $51,000 ($39,000 median).
When considering 10th and 90th percentiles hospital costs extended to $26,000 and $52,000 respectively.
Downstate New YorkFigure 3: Box and whisker plots showing the IQR, median (red line), 10th and 90th percentiles (whiskers) hospital charges for each of the 15 most common medical conditions (APR DRG classifications) of hospitals within Long Island, New York City and Lower Hudson Valley regions.
Note the changes in scale and intervals for healthcare costs.
The maximum value in the x-axis now reaches $200,000 and the intervals are in $20,000 increments.
Healthcare prices are in a different league in the three regions comprising Downstate NY.
Note the difference in scale and intervals between these graphs and Western/Upstate NY.
The maximum cost in Western and Upstate NY combined did not exceed $80,000.
In Downstate NY, the maximum cost nears $180,000.
To accommodate these changes the intervals are now expressed in $20,000 increments instead of $10,000.
Everything is much more expensive in Downstate NY.
Continuing with our analysis of neonatal infant care, median costs for hospitals in Downstate NY ranged between $7,000 to $12,000.
The median costs for vaginal deliveries ranged between $18,000 to $20,000 while the median costs for cesarean deliveries ranged between $26,000 to $34,000.
Neonatal care and vaginal deliveries represented the two cheapest median costs in Downstate NY.
All the other APR DRG classifications had median costs between $24,000 to $53,000.
In comparison, in Western and Upstate NY, the majority of all median costs of all other APR DRG classifications did not break $20,000.
The IQR of Downstate NY were also significantly larger, the smallest being $6,000 the largest $59,000.
Again, this indicates that the middle 50% of healthcare costs were widely distributed between tens of thousands of dollars.
However, most startling costs came again from knee joint replacement surgeries.
In Downstate NY, the costs were as follows:Long Island: IQR ranged from $65,000 to $116,000 ($80,000 median).
When considering 10th and 90th percentiles hospital costs extended to $52,000 and $156,000 respectively.
New York City: IQR ranged from $48,000 to $107,000 ($72,000 median).
When considering 10th and 90th percentiles hospital costs extended to $34,000 and $182,000 respectively.
Lower Hudson Valley: IQR ranged from $50,000 to $100,000 ($63,000 median).
When considering 10th and 90th percentiles hospital costs extended to $37,000 and $130,000 respectively.
I previously reported that knee replacement surgeries were one of the few APR DRG classifications that had actually seen a significant spike in patient discharge volumes and had the greatest increase in costs from 2009 to 2016.
Which hospitals are overcharging?Next, I wanted to determine which hospitals were charging more than their counterparts and where they were located.
To compare between hospitals I defined a new term called relative price for each of the 211 hospitals in NYS.
A hospital’s relative price is calculated by averaging all the APR DRG charge ratios.
An APR DRG charge ratio was determined as the ratio between the amount a hospital charged for a particular APR DRG classification divided by the average charge for the same APR DRG classification for all hospitals in NYS.
In other words, the APR DRG charge ratio gives information on whether a hospital charges more or less than all hospitals for that particular condition.
A hospitals relative price ratio is an indicator of how much a hospital is charging on average compared to all the other hospitals of the state.
A relative price ratio < 1.
0 indicates the hospital charges below the average, a relative price > 1.
0 indicates a hospital charges above the average, and a relative charge = 1.
0 indicates a hospital charges the same as the average for the state.
Do hospitals with greater market share charge more?I first separated hospitals by their discharge market share.
Their discharge market share was the amount of inpatient discharges a hospital had divided by the total inpatient discharges in NYS.
There was a weak positive correlation (Pearson’s correlation coefficient = 0.
33) between the two, indicating that hospitals with greater market share were weakly correlated with charging above the mean.
Figure 4: Relative Price of hospitals (each dot represents a hospital) plotted against its discharge market share.
Not many hospitals had high market shares to begin with (as expected since there are 200+ hospitals in NYS).
But there were many hospitals which charge more than the average amount of NYS.
Those hospitals are above the 1.
0 threshold mark for relative price.
Now that we know that hospitals charge more, where are they located?Which region in NYS contained the hospitals with high relative prices?I took all the hospitals and their relative price ratios and separated them by the region they belonged to.
I found that all the hospitals in Upstate and Western NY had relative price ratios < 1.
0, indicating they are not charging above the average for NYS.
A relative price ratio > 1 indicates a hospital charges above the median price.
A relative price ratio < 1 indicates a hospital charges below the median price.
Interestingly, plenty of the hospitals found in the lower Hudson Valley and New York City also charging less than the state average.
But, the lower Hudson Valley region did have a few hospitals charging 50% more than the average (relative price ratio = 1.
5) and in some cases, close to 100% more than the average (relative price ratio > 2.
New York City hospitals ran the full spectrum of price ratios.
A significant portion had relative price ratios < 1.
0, meaning their charges were comparable to those hospitals in Upstate and Western NY regions.
But, NYC also contained a large portion of hospitals with relative price ratios > 1.
0, and some even > 2.
Long Island was interesting in this regard since all except one of its hospitals had relative price ratios > 1.
SummarySmart data analytics combined with clean visualizations can lead to a greater understanding of healthcare costs in NYS.
Some of the key findings I made were:Inpatient hospitalizations for knee joint replacement surgeries were consistently the most expensive medical condition treated.
Long Island hospitals had the highest at median costs of $80,000.
The range of costs was incredibly varied between medical conditions.
In some conditions like neonatal care, provider charges ranged between several thousand dollars.
In others like knee replacement surgery provider charges ranged closer to $150,000.
Overall the costs for healthcare procedures between Western NY and Upstate NY hospital was very comparable.
Hospitals in Downstate NY (Long Island, NYC and Lower Hudson Valley regions) charged between $24,000 to $53,000 for 13 of the 15 medical conditions looked at, while hospitals in Western NY and Upstate NY did not go above $21,000.
A weak positive correlation existed between a hospitals relative price and its discharge market share.
All the hospitals in Western NY and Upstate NY charged less than the average for NYS.
A strong majority of Long Island hospitals charge more than the average for the state.
Meanwhile, the New York City and Lower Hudson Valley regions had hospitals that that charged less than, equal to, and much greater than the average of the state based on their relative prices.
Thanks for reading!Code:This data analysis was performed in Python using the pandas, numpy and matplotlib libraries.
The code is found in my github repository.
Attached are the database tables I used in the analysis.
Code: GithubHealth Facility General InformationNYS Cost Transparency 2009–2016NYS zip code and county tableNYS county and regionPrimer on IQRs and Box and Whisker PlotsThe interquartile regions (IQR)s are typically displayed as box and whisker plots.
There are a multitude of resources online explaining percentiles and box and whisker plots to link here, but I attached one of my favorites for its simplicity.
To understand percentiles, imagine a set of numbers in increasing order and divide them into 4 equally-sized groups.
You’ve divided the number set into its 25th percentile, 50th percentile, and 75th percentile.
Each one of your groups can also be called a quartiles, for the quarter division you performed.
The number in the exact middle of your set is called the median.
If you were to look at the boundaries of your 25th and 75th percentile (or 2nd and 3rd quartiles), you’d identify the ‘middle’ quartile.
The numbers within the middle quartile are known as the inner quartile range (IQR).
In essence, the IQR tells you information on the center of your dataset.
The graph used to relay this information is known as a box and whisker plot.
Example box and whisker plot showing the IQR, median, 10th and 90th percentiles labelled.
A histogram of the data is charted above it, with shaded regions highlighting the IQR in blue, the 10th to 25th percentile and 75th to 90th percentile.
Clearly, some data points are excluded from the box and whisker plot.
A box and whisker plot has — as its name implies — box and whiskers.
The box represents the IQR we discussed above.
The whiskers typically extend to the minimum and maximum values of the data (effectively showing the range of the dataset).
However, in my analysis, I defined the whiskers to extend up to the 10th and 90th percentiles instead since I wanted to avoid outliers.
On the left is a box and whisker plot below a histogram.
The histogram plots the entire distribution of healthcare costs for this example APR DRG classification.
Notice how the area shaded in blue in the histogram has the highest counts (high frequency) of the dataset.
If you extend the shaded region from a histogram into a box and whisker plot it would highlight the box.
The size of the IQR will shrink (or lengthen) the size of the box.
Inside the box is a red line which is the median, the center point of the distribution.
In this example the left whisker extends to the 10th percentile, while the right to the 90th percentile, each shaded in grey.
Notice how there are data points in the histogram not covered by the shaded grey region.
They reside beyond the 10th and 90th percentiles which are not captured by the whiskers of the box and whisker plot.
By limiting the whiskers to the 10th and 90th percentiles, this information is ignored.
My goal in using box and whisker plots to chart healthcare prices was to convey the range of prices most individuals would fall into.
Those incredibly expensive cases are at the tail end of the distribution (notice how their frequency count is 1 on the histogram) and not the norm.
Reporting them would distract from the main message I was trying to convey.