Skip to content

Conversation

@Makhsuda
Copy link
Collaborator

No description provided.

@Makhsuda Makhsuda requested a review from chendaniely June 25, 2020 00:02
@chendaniely
Copy link
Member

I can get the plot to work, but I think it's best to change up the code so that we put in a place holder for the date. I think the dashboard can go and handle the animation instead of plotly directly. See code comment for changes I made to make the iteration process faster

Copy link
Member

@chendaniely chendaniely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change you code to this, it should at least plot faster...

# reversed_viridis = color_map.reversed()


fig = px.choropleth(molten_df,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a dataframe that was just a subset of a particular date, and then used that subseted dataframe to plot the figure

plot_data = molten_df[molten_df.date_iso == '2020-02-01']
fig = px.choropleth(plot_data,
                    geojson=counties,
                    locations=plot_data.fips_str,
                    color='value',
                    #animation_frame='date',
                    hover_data=['State', 'value'],
                    color_continuous_scale='viridis_r',
                    range_color=(0, 300),
                    scope="usa",
                    title='Confirmed cases',
                    labels={'value': 'confirmed cases'}
                    )

@@ -34,7 +34,10 @@
molten_df['date_iso'] = pd.to_datetime(molten_df['date'], format="%m/%d/%y") # change date to ISO8601 standard format

fips = molten_df['fips_str'].tolist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the below changes, you don't need this line anymore since you're passing in the column of values into the plotting function

confirmed_df = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/'
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
pop_df = pd.read_excel(here('./data/db/original/maps/PopulationEstimates.xls')) # population dataset for 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this dataset come from?

Comment on lines 19 to 21
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
pop_df = pd.read_excel(here('./data/db/original/maps/PopulationEstimates.xls')) # population dataset for 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should provide a download link to where you got these datasets from.


molten_pop_df = pd.merge(molten_df, pop_df, on='fips_str') # add population per county
grouped_by = molten_pop_df.groupby(['fips_str', 'date_iso', 'Admin2', 'POP_ESTIMATE_2019'])['value'].sum().reset_index()
grouped_by['value'] = grouped_by['value']/grouped_by['POP_ESTIMATE_2019'] # get per capita value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't overwrite the original 'value' column. you should make a new column (in this case something like 'total_per_cap') that is assigned the per capita value

color_continuous_scale="Viridis",
range_color=(0, 300),
color_continuous_scale='viridis_r',
range_color=(0, 500),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose 500? can we set this to something like max(per_cap) and use a variable instead of hard-coding a value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you are right and I am working on that. I was thinking of putting there the third quartile as 75%, cause when I am taking the max value, which is for New York, it is much higher than other states and that's why it gets a bit wrong coloring. I tried to use quartile's fuction, but range_color didn't accept my input. The same goes with per capita case, but there it shows another state with the highest cases number, which is very strange, so I am assuming that I might be doing wrong calculations

Comment on lines +38 to +46
'''
# ax = sns.lineplot(x="date_iso", y="value", hue='Province_State', data=grouped_counts) # show cases per state monthly
# ax = sns.stripplot(x="date_iso", y="value", hue='Province_State', data=grouped_counts)
# ax = sns.violinplot(x='date_iso', y='value', hue='Province_State', data=grouped_counts, palette="Set2", split=True,
# scale="count", inner="quartile")
# ax = sns.countplot(x="date_iso", hue='Province_State', data=grouped_counts) # works better if there are certain dates
# plt.tight_layout()
# plt.show()
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you comment these out? we could also add general values into the dashboard too

# animation_frame='date',
hover_data=['Admin2', 'value', 'POP_ESTIMATE_2019'],
color_continuous_scale='viridis_r',
range_color=(0, plot_data['value'].max()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you use the new column variable name make sure you change this as well.




''' No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should end with a new line

Also. might be worth having the raw count, and also the per-capita count as a toggle between the maps.
Since the only real difference between the plotting code is which column you're using to plot, we can make a function that takes a dataframe, and plotting column as input and returns the plot.

Would be able to use the function to return both plots that we would feed into the dashboard.


confirmed_df = pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/'
'csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
loc_df = pd.read_excel(here('./data/db/original/maps/State_FIPS.xlsx'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to where you got data from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants