import pandas as pd
from dfply import *
from plotnine import *
import ssl
0. Loading Libraries
1. Data Loading
## adding below ssl line as it is giving ssl error locally in my machine
= ssl._create_unverified_context
ssl._create_default_https_context
= pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-01-30/groundhogs.csv')
groundhogs = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-01-30/predictions.csv') predictions
2. Data Transformation
= pd.merge(predictions, groundhogs, on="id", how="left")
merged_df
= (merged_df >>
predictions_groundhog ~X.description, ~X.details, ~X.source, ~X.current_prediction,
select(~X.image, ~X.slug, ~X.name, ~X.predictions_count, ~X.active) >>
== True)) filter_by(X.is_groundhog
3. Plotting
3.1. Consensus of Groundhogs YonY
Code for Density Plot of Consensus
= (predictions_groundhog >>
plot >>
filter_by(X.shadow.notna()) ='year', color='shadow')) +
ggplot(aes(x+
geom_density() ='Prediction',
scale_color_manual(name=['Early Spring', 'Long Winter'],
labels=['#8B4513', '#0000CD']) +
values=14) +
theme_minimal(base_size=(0.5, 0.5)) +
theme(legend_position=(6,4)) +
theme(figure_size='Consensus of Groundhogs YoY'))
labs(title
print(plot)
3.2. Count of GroundHogs YonY
Code for Line Plot of Groundhogs
= (predictions_groundhog >>
plot >>
filter_by(X.shadow.notna()) >>
group_by(X.year) = n(X.year)) >>
summarize(n ='year', y='n')) +
ggplot(aes(x+
geom_line() +
theme_minimal() ='year', y='number of groundhogs', title='Count of Groundhogs by Year'))
labs(x
print(plot)
3.3. Consensus of GroundHogs YonY by ShortName
Code for Consensus Plot of Groundhogs by ShortName
= (predictions_groundhog >>
plot >>
filter_by(X.shadow.notna()) ='year', color='shadow')) +
ggplot(aes(x+
geom_histogram() ='Consensus of Groundhogs YoY') +
labs(title'~shortname') +
facet_wrap(='Prediction',
scale_color_manual(name=['Early Spring', 'Long Winter'],
labels=['#8B4513', '#0000CD']) +
values= 14) +
theme_minimal(base_size =(20,12))
theme(figure_size
)
print(plot)
I believe this is a simplified representation of the consensus to gain initial insights.
Considering the number of groundhogs included in the observational study, the majority predicted an early spring.
However, since we currently lack weather data, we are unable to cross-verify which predictions were accurate.
With the exception of
Phil
, most others forecasted an early spring.