Lecture 20 In-Class Exercise Solution#
Download data#
!mkdir -p data
!wget -O data/electricity.zip -nc "https://api.worldbank.org/v2/country/all/indicator/EG.ELC.ACCS.ZS?source=2&date=2002:2022&downloadformat=csv"
File ‘data/electricity.zip’ already there; not retrieving.
!rm -rf data/electricity
!unzip data/electricity.zip -d data/electricity
Archive: data/electricity.zip
inflating: data/electricity/Metadata_Indicator_API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv
inflating: data/electricity/API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv
inflating: data/electricity/Metadata_Country_API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv
Load data#
import pandas as pd
electricity = pd.read_csv("./data/electricity/API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv", header=2)
electricity
Country Name | Country Code | Indicator Name | Indicator Code | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | Unnamed: 25 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | ... | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 99.900000 | NaN |
1 | Africa Eastern and Southern | AFE | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 21.580282 | 22.516962 | 23.753953 | 23.487376 | 25.198527 | 26.807288 | ... | 31.860474 | 33.903800 | 38.854624 | 40.199898 | 43.017148 | 44.381259 | 46.264875 | 48.100862 | 48.711995 | NaN |
2 | Afghanistan | AFG | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 14.100000 | 19.000000 | 23.800000 | 28.700000 | 33.500000 | 38.400000 | ... | 89.500000 | 71.500000 | 97.700000 | 97.700000 | 93.400000 | 97.700000 | 97.700000 | 97.700000 | 85.300000 | NaN |
3 | Africa Western and Central | AFW | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 35.700600 | 39.821304 | 37.492583 | 37.784220 | 39.830830 | 40.743670 | ... | 47.662536 | 46.758739 | 50.906115 | 48.789457 | 51.211055 | 51.168083 | 51.730899 | 54.224724 | 55.437577 | NaN |
4 | Angola | AGO | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 26.300000 | 27.400000 | 28.400000 | 29.400000 | 30.500000 | 37.500000 | ... | 32.000000 | 42.000000 | 41.800000 | 42.900000 | 45.300000 | 45.600000 | 47.000000 | 48.200000 | 48.500000 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
261 | Kosovo | XKX | Access to electricity (% of population) | EG.ELC.ACCS.ZS | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
262 | Yemen, Rep. | YEM | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 51.500000 | 52.600000 | 49.600000 | 54.800000 | 55.800000 | 57.200000 | ... | 66.100000 | 67.400000 | 68.900000 | 79.200000 | 62.000000 | 72.800000 | 73.900000 | 74.900000 | 76.000000 | NaN |
263 | South Africa | ZAF | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 76.700000 | 78.800000 | 80.600000 | 80.800000 | 80.700000 | 82.000000 | ... | 85.900000 | 85.300000 | 83.900000 | 84.400000 | 84.700000 | 85.000000 | 90.000000 | 89.300000 | 86.500000 | NaN |
264 | Zambia | ZMB | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 17.400000 | 18.500000 | 20.300000 | 23.300000 | 24.100000 | 18.500000 | ... | 27.900000 | 31.100000 | 35.400000 | 40.300000 | 40.200000 | 43.000000 | 44.600000 | 46.700000 | 47.800000 | NaN |
265 | Zimbabwe | ZWE | Access to electricity (% of population) | EG.ELC.ACCS.ZS | 34.200000 | 35.100000 | 35.600000 | 36.100000 | 37.200000 | 37.200000 | ... | 32.300000 | 33.700000 | 42.500000 | 44.000000 | 45.400000 | 46.700000 | 52.700000 | 49.000000 | 50.100000 | NaN |
266 rows × 26 columns
Reshape#
electricity_by_country_by_year = (
electricity.drop(columns=["Indicator Name", "Indicator Code"])
.melt(
id_vars=["Country Name", "Country Code"],
var_name="Year",
value_name="Access to electricity (% of population)",
)
.dropna()
)
electricity_by_country_by_year
Country Name | Country Code | Year | Access to electricity (% of population) | |
---|---|---|---|---|
0 | Aruba | ABW | 2002 | 100.000000 |
1 | Africa Eastern and Southern | AFE | 2002 | 21.580282 |
2 | Afghanistan | AFG | 2002 | 14.100000 |
3 | Africa Western and Central | AFW | 2002 | 35.700600 |
4 | Angola | AGO | 2002 | 26.300000 |
... | ... | ... | ... | ... |
5580 | Samoa | WSM | 2022 | 98.300000 |
5582 | Yemen, Rep. | YEM | 2022 | 76.000000 |
5583 | South Africa | ZAF | 2022 | 86.500000 |
5584 | Zambia | ZMB | 2022 | 47.800000 |
5585 | Zimbabwe | ZWE | 2022 | 50.100000 |
5506 rows × 4 columns
Plot#
import plotly.express as px
fig = px.line(
electricity_by_country_by_year,
x="Year",
y="Access to electricity (% of population)",
color="Country Name",
title="Access to electricity over time",
)
fig.show()
Plotly showing that many values on an axis (the Year
in this case) indicates that it’s a string.
Fix year dtype#
electricity_by_country_by_year.dtypes
Country Name object
Country Code object
Year object
Access to electricity (% of population) float64
dtype: object
electricity_by_country_by_year["Year"] = electricity_by_country_by_year["Year"].astype(int)
Filter#
avg_electricity = electricity_by_country_by_year.groupby("Country Name")["Access to electricity (% of population)"].mean()
avg_electricity
Country Name
Afghanistan 61.876190
Africa Eastern and Southern 32.878831
Africa Western and Central 45.321271
Albania 99.757143
Algeria 99.142857
...
West Bank and Gaza 99.742857
World 85.355292
Yemen, Rep. 63.404762
Zambia 30.523810
Zimbabwe 40.633333
Name: Access to electricity (% of population), Length: 263, dtype: float64
bottom_cutoff = avg_electricity.quantile(0.05)
bottom_cutoff
np.float64(21.837142857142858)
bottom_countries = avg_electricity[avg_electricity < bottom_cutoff]
bottom_countries
Country Name
Burkina Faso 14.676190
Burundi 6.547619
Central African Republic 10.990476
Chad 7.566667
Congo, Dem. Rep. 14.438095
Guinea-Bissau 16.771429
Liberia 14.768750
Madagascar 21.823810
Malawi 9.547619
Mozambique 20.528571
Niger 14.133333
Papua New Guinea 16.957143
Sierra Leone 17.604762
South Sudan 4.475000
Name: Access to electricity (% of population), dtype: float64
bottom_countries_electricity = electricity_by_country_by_year[electricity_by_country_by_year["Country Name"].isin(bottom_countries.index)]
fig = px.line(
bottom_countries_electricity,
x="Year",
y="Access to electricity (% of population)",
color="Country Name",
title="Access to electricity over time by country, 5th percentile",
)
fig.show()
World Happiness Report data#
Requires xlrd
package. How to install.
happiness = pd.read_excel("https://happiness-report.s3.amazonaws.com/2024/DataForTable2.1.xls")
happiness
Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.350416 | 0.450662 | 50.500000 | 0.718114 | 0.164055 | 0.881686 | 0.414297 | 0.258195 |
1 | Afghanistan | 2009 | 4.401778 | 7.508646 | 0.552308 | 50.799999 | 0.678896 | 0.187297 | 0.850035 | 0.481421 | 0.237092 |
2 | Afghanistan | 2010 | 4.758381 | 7.613900 | 0.539075 | 51.099998 | 0.600127 | 0.117861 | 0.706766 | 0.516907 | 0.275324 |
3 | Afghanistan | 2011 | 3.831719 | 7.581259 | 0.521104 | 51.400002 | 0.495901 | 0.160098 | 0.731109 | 0.479835 | 0.267175 |
4 | Afghanistan | 2012 | 3.782938 | 7.660506 | 0.520637 | 51.700001 | 0.530935 | 0.234157 | 0.775620 | 0.613513 | 0.267919 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2358 | Zimbabwe | 2019 | 2.693523 | 7.697755 | 0.759162 | 53.099998 | 0.631908 | -0.050874 | 0.830652 | 0.658434 | 0.235354 |
2359 | Zimbabwe | 2020 | 3.159802 | 7.596050 | 0.717243 | 53.575001 | 0.643303 | 0.002848 | 0.788523 | 0.660658 | 0.345736 |
2360 | Zimbabwe | 2021 | 3.154578 | 7.656878 | 0.685151 | 54.049999 | 0.667636 | -0.079007 | 0.756945 | 0.609917 | 0.241682 |
2361 | Zimbabwe | 2022 | 3.296220 | 7.670073 | 0.666172 | 54.525002 | 0.651987 | -0.072935 | 0.752632 | 0.640609 | 0.191350 |
2362 | Zimbabwe | 2023 | 3.572386 | 7.678590 | 0.693817 | 55.000000 | 0.734613 | -0.068837 | 0.757494 | 0.609752 | 0.178953 |
2363 rows × 11 columns
fig = px.line(happiness, x="year", y="Life Ladder", color="Country name", title="World Happiness")
fig = fig.show()
happiness.dtypes
Country name object
year int64
Life Ladder float64
Log GDP per capita float64
Social support float64
Healthy life expectancy at birth float64
Freedom to make life choices float64
Generosity float64
Perceptions of corruption float64
Positive affect float64
Negative affect float64
dtype: object
Single country#
country_name = "Madagascar"
madagascar_electricity = electricity_by_country_by_year[electricity_by_country_by_year["Country Name"] == country_name]
madagascar_electricity
Country Name | Country Code | Year | Access to electricity (% of population) | |
---|---|---|---|---|
151 | Madagascar | MDG | 2002 | 14.0 |
417 | Madagascar | MDG | 2003 | 20.3 |
683 | Madagascar | MDG | 2004 | 15.3 |
949 | Madagascar | MDG | 2005 | 16.0 |
1215 | Madagascar | MDG | 2006 | 16.8 |
1481 | Madagascar | MDG | 2007 | 17.5 |
1747 | Madagascar | MDG | 2008 | 18.3 |
2013 | Madagascar | MDG | 2009 | 17.4 |
2279 | Madagascar | MDG | 2010 | 12.3 |
2545 | Madagascar | MDG | 2011 | 14.3 |
2811 | Madagascar | MDG | 2012 | 18.7 |
3077 | Madagascar | MDG | 2013 | 12.9 |
3343 | Madagascar | MDG | 2014 | 23.0 |
3609 | Madagascar | MDG | 2015 | 23.8 |
3875 | Madagascar | MDG | 2016 | 22.9 |
4141 | Madagascar | MDG | 2017 | 24.1 |
4407 | Madagascar | MDG | 2018 | 36.5 |
4673 | Madagascar | MDG | 2019 | 31.0 |
4939 | Madagascar | MDG | 2020 | 32.0 |
5205 | Madagascar | MDG | 2021 | 35.1 |
5471 | Madagascar | MDG | 2022 | 36.1 |
madagascar_happiness = happiness[happiness["Country name"] == country_name]
madagascar_happiness
Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect | |
---|---|---|---|---|---|---|---|---|---|---|---|
1250 | Madagascar | 2006 | 3.979751 | 7.351137 | 0.711135 | 54.139999 | NaN | -0.041855 | NaN | 0.562594 | 0.161333 |
1251 | Madagascar | 2008 | 4.640079 | 7.413261 | 0.775689 | 54.619999 | 0.332436 | -0.102579 | 0.773067 | 0.582537 | 0.214525 |
1252 | Madagascar | 2011 | 4.381415 | 7.308839 | 0.818403 | 55.340000 | 0.545556 | -0.065021 | 0.897100 | 0.516020 | 0.234826 |
1253 | Madagascar | 2012 | 3.550610 | 7.311225 | 0.673088 | 55.580002 | 0.487008 | -0.058006 | 0.853590 | 0.587817 | 0.193977 |
1254 | Madagascar | 2013 | 3.815607 | 7.307252 | 0.672547 | 55.820000 | 0.479550 | -0.021556 | 0.867708 | 0.600077 | 0.241231 |
1255 | Madagascar | 2014 | 3.675627 | 7.313828 | 0.655214 | 56.060001 | 0.528805 | -0.026291 | 0.791056 | 0.641384 | 0.192182 |
1256 | Madagascar | 2015 | 3.592514 | 7.318788 | 0.646717 | 56.299999 | 0.544754 | -0.044008 | 0.860953 | 0.674233 | 0.226243 |
1257 | Madagascar | 2016 | 3.663086 | 7.332083 | 0.746497 | 56.549999 | 0.569645 | -0.072243 | 0.864171 | 0.669859 | 0.204255 |
1258 | Madagascar | 2017 | 4.078620 | 7.344820 | 0.626332 | 56.799999 | 0.570348 | -0.036631 | 0.847261 | 0.701162 | 0.374838 |
1259 | Madagascar | 2018 | 4.070587 | 7.350723 | 0.665513 | 57.049999 | 0.551473 | 0.000118 | 0.889146 | 0.723372 | 0.362014 |
1260 | Madagascar | 2019 | 4.339087 | 7.368637 | 0.700610 | 57.299999 | 0.549535 | -0.014780 | 0.719983 | 0.698720 | 0.303960 |
1261 | Madagascar | 2022 | 4.019134 | 7.314874 | 0.642429 | 58.049999 | 0.523493 | 0.070040 | 0.740403 | 0.686895 | 0.345107 |
1262 | Madagascar | 2023 | 4.433001 | 7.332643 | 0.692405 | 58.299999 | 0.503716 | 0.095865 | 0.783531 | 0.718467 | 0.389158 |
Two Y Axes#
Based on Plotly example.
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add traces
fig.add_trace(
go.Scatter(
x=madagascar_electricity["Year"],
y=madagascar_electricity["Access to electricity (% of population)"],
name="Access to electricity",
),
secondary_y=False,
)
fig.add_trace(
go.Scatter(
x=madagascar_happiness["year"],
y=madagascar_happiness["Life Ladder"],
name="Life Ladder",
),
secondary_y=True,
)
# Add figure title
fig.update_layout(title_text=f"Access to electricity vs. Happiness, {country_name}")
# Set x-axis title
fig.update_xaxes(title_text="Year")
# Set y-axes titles
fig.update_yaxes(title_text="% of population", secondary_y=False)
fig.update_yaxes(title_text="Happiness score", secondary_y=True)
fig.show()