# Relationships validated between population health chronic indicators

This is where visualization of the data will paint a picture to understand the overall relationships.

Originally, I started out using the pivot_table from Table 3 but the axis labels were too long and difficult to read.

Backing up slightly, we’re going to create a different pivot_table (df_new_qloc2) based on the new the column QuestionAbbr to take the first 37 characters of Question:df_new['QuestionAbbr'] = df_new['Question'].

str[:37]df_new_qloc2 = df_new.

pivot_table(values='DataValueAlt',index=['Topic','QuestionID','QuestionAbbr'], columns='LocationAbbr',aggfunc='mean',dropna=True).

round(2)By transposing df_new_qloc2 pivot table, we’re ready to apply the correlation method .

corr() and visualize the data.

Using the code from the seaborn documentation, we can plot a correlation matrix heatmap.

This visualizes each pair of indicators to understand where the positive correlation pairs reside.

Since a correlation matrix yields duplicate views above and below the diagonal, we’ll mask the upper half of the heatmap for simplicity by creating an array of zeros using np.

zeros_like() and returning the indices of the region above the diagonal using triu_indices_from().

Using matplotlib, we set a large figure size to allow zooming in on the jupyter notebook.

The cmap sets the colormap for the figure (more options here).

Lastly, sns.

heatmap() renders the heatmap with the mask.

# Table 4 – Using Table 3 with question indicators as columnsnew_qloc2_corr = df_new_qloc2.

transpose().

zeros_like(new_qloc2_corr, dtype=np.

triu_indices_from(mask)] = True# Set up the matplotlib figuref, ax = plt.

subplots(figsize=(150, 150))f.

suptitle('All Chronic Indicators Correlation Heatmap', x=0.

4,y=0.

85,fontsize=150)ax.

tick_params(labelsize=50)# Generate a custom diverging colormapcmap = sns.

diverging_palette(190, 10, as_cmap=True)# Draw the heatmap with the mask and correct aspect ratiosns.

3, center=0,square=True, linewidths=2, cbar_kws={"shrink": .

3})Even with some tweaking, the information is dense and could be more user friendly to read (we’ll address this in the next post).

Still, the heatmap axes show the topic, QuestionID, and the abbreviated question QuestionAbbr.

We can see some areas with more positive correlation in pink and more negative in green.

Visually, there are areas in the Cardiovascular (CDV) and Cancer (CAN) that appear to have higher correlation.

It’s helpful to see this presentation but ultimately, we will want to have a more granular look by creating a list of top correlation.

Table 5.

Top Correlation Table of Indicator PairsTaking only the QuestionID, LocationAbbr, and DataValueAlt out of df_new, I created new_qloc2_1_corr table.

Using linear algebra, we can vectorize the arrays by multiplying the new_qloc2_1_corr correlation dataframe and the lower half of an array the same shape as the prior dataframe but with values of one’s.

The .

stack() puts the array into a list and .

sort_values() sorts the list into the descending order.

Top_corr consists of about 36k items.

top_corr = (new_qloc2_1_corr * np.

tril(np.

ones(new_qloc2_1_corr.

shape),-1)).

stack().

sort_values(by=['DataValueAlt'],ascending=False)With a few additional steps to reset the indexes, change the column names, and adding a few new columns based on the QuestionID, we have a dataframe with the top correlation pairs.

Since there many duplicate pairs by the topic, I’m going to de-duplicate and not include the same pairs of indicators by topic category which results in about 33k items:top_corr_pair_dd = []for row in range(len(top_corr_pair)): if top_corr_pair.

loc[row]['Topic1'] != top_corr_pair.

loc[row].

['Topic2']: top_corr_pair_dd.

append(top_corr_pair.

loc[row])Table 5.

Top Correlation by Topic DescendingAt this point, we see themes emerging from the indicators.

Table 5 shows us that there are certain patterns based on the topic categories.

For example, those QuestionIDs (QID1 and QID2 columns) within Chronic Kidney Disease (CKD) and Cardiovascular Disease (CVD); those within CKD and Diabetes (DIA); those within CVD and DIA; those within Overarching Conditions (OVC) and prior indicators; and finally, those within Chronic Obstructive Pulmonary Disease (COP) with DIA and CVD.

Does this make sense overall that we are seeing relationships among these indicators such as CVD, CKD, and DIA?Cardiovascular disease includes a number of heart related diseases, including arrhythmia (improper irregular heart beating), high blood pressure (high force against artery walls), cardiac arrest (sudden loss of function), coronary artery disease (damage of major blood vessels), and congestive heart failure (chronic condition where heart doesn’t pump well).

It’s a very common disease that stems from blocked blood vessels and impacts 1 in 4 Americans.

According to the CDC, risk factors that increase this disease include diet, obesity, diabetes, excessive alcohol use and physical inactivity.

Chronic kidney disease is the lengthy disease of the kidney and its loss of function leading to failure, also known as end-stage renal disease (ESRD).

It’s also a common disease that affects 1 in 7 Americans.

According to the Mayo Clinic, risk factors that increase this disease include diabetes, high blood pressure, smoking, cardiovascular disease, older age, and certain ethnic backgrounds.

In patients with kidney disease, cardiovascular disease tends to be “underdiagnosed and undertreated” (Said S, Hernandez GT.

The link between chronic kidney disease and cardiovascular disease.

J Nephropathol.

2014;3(3):99–104.

).

When the kidneys don’t function optimally, this requires the heart and cardiovascular system to work harder according to the American Kidney Fund.

Diabetes mellitus relates to the disregulation of the glucose in the body, and includes a few diseases such as Type 1, Type 2 and gestational diabetes.

While Type 1 cause is unknown such as family background and environment, gestational diabetes usually comes and goes with pregnancy.

Type 2 diabetes has similar risk factors to above diseases, including physical inactivity, age, family background, high blood pressure, weight, polycystic ovary syndrome, and abnormal cholesterol / triglyceride levels (Source: Mayo Clinic).

Having diabetes leads to various complications including cardiovascular and chronic kidney disease.

The answer is yes!.This dataset is consistent with the literature showing that there is high correlation among cardiovascular disease, chronic kidney disease, and diabetes.

In the next blog post, I’d like to continue this insight to better understand the following:What are the Overarching Conditions?What are the relationships with Chronic Obstructive Pulmonary Disease?What are additional highlights on the specific top indicators?What trends can we see by year and by stratification?Hope you’ve enjoyed reading to this point and feel free to drop me some ideas and suggestions!.