import pandas as pd
from dfply import *
from plotnine import *
import ssl
0. Loading Libraries
1. Data Loading and Data Manipulation
## adding below ssl line as it is giving ssl error locally in my machine
= ssl._create_unverified_context
ssl._create_default_https_context
= pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-03-12/fiscal_sponsor_directory.csv')
data
= ( data >>
data
mutate( = 2022 - X.year_fiscal_sponsor,
fiscal_sponsor_since = X['fiscal_sponsorship_fee_description'].str.extract(r'(\d[-+]?%?.?\d?%)')
fee
)>> (
)
mutate(= X['fee'].str.extract(r'(^\d?+[\.]?\d?+)'))
fee
)
'fee'] = data['fee'].astype(float) data[
2. Plotting
2.1. Total Sponsorships Duration and Projects
Total Sponsorships Duration and Projects Plot Code
>>
(data = "fiscal_sponsor_since",y="n_sponsored",color="year_fiscal_sponsor")) +
ggplot(aes(x+
geom_point() +
theme_minimal()
labs(= "Years since Fiscal Sponsorship",
x = "Number of Sponsored Projects",
y = "Relationship between Total Sponsorships and Duration of Sponsorship",
title = "Legend of colors is on Fiscal Sponsorship year",
subtitle = "Axis is zoomed to have a better visualization"
caption +
) =guide_legend(title="")) + # Use an empty string to remove the legend title
guides(color
theme(=(0.5, 0.85),
legend_position= element_blank(),
panel_grid_minor_x = element_blank()
panel_grid_minor_y +
)
coord_cartesian( = (0, 400),
ylim = (0,75)
xlim +
)
annotate("text",
= 60,
x = 200,
y = "Older Sponsors doesn't mean \n more sponsored projects ",
label = "brown",
color = 8
size
) )
Sponsors who began sponsoring in recent years tend to have a lower number of sponsored projects, which aligns with our intuition. However, sponsors who commenced sponsorship fifty years ago exhibit a comparable number of projects to those of more recent sponsors. Upon closer examination of the last twenty years, the number of projects appears to demonstrate a linear relationship with the launch of sponsorship
2.2. Histogram of Total Sponsored Projects
Histogram of Total Sponsored Projects Plot Code
>>
(data ="n_sponsored")) +
ggplot(aes(x="dodge",fill = "#4169E1") +
geom_histogram(position+
theme_minimal()
labs (= "Number of Sponsored Projects",
x = "",
y = "Histogram of total Sponsored projects"
title +
)
theme(= element_blank(),
panel_grid_major_y = element_blank(),
panel_grid_minor_x = element_blank()
panel_grid_minor_y +
)
annotate("text",
= 200,
x = 80,
y = "Most of the sponsors \n sponsored only one/two project ",
label = "brown",
color = 8
size
) )
Most sponsors have one or two projects, while a few have a significantly higher number, indicating a long tail of active involvement.
2.3. Histogram of Fiscal Sponsorship Fee
Histrogram of Fiscal Sponsorship Fee Plot Code
>>
(data ="fee")) +
ggplot(aes(x="dodge", fill = "#708090") +
geom_histogram(position+
theme_minimal()
labs (= "Sponsorship Fee%",
x = "",
y = "Histogram of fees associated with fiscal sponsorship",
title = "Axis is zoomed to have a better visualization"
caption +
)
theme(= element_blank(),
panel_grid_major_y = element_blank(),
panel_grid_minor_x = element_blank()
panel_grid_minor_y +
)
coord_cartesian( = (0, 15),
xlim +
)
annotate("text",
= 13,
x = 50,
y = "Median range of sponsorship \n fee is b/w 5% & 10% ",
label = "brown",
color = 8
size
) )
As annotated in the plot most of the fiscal sponsorship fee starts at 5% to 10%.