Lecture 20 In-Class Exercise Solution#

Download data#

More about the URL below.

!mkdir -p data
!wget -O data/electricity.zip -nc "https://api.worldbank.org/v2/country/all/indicator/EG.ELC.ACCS.ZS?source=2&date=2002:2022&downloadformat=csv"
File ‘data/electricity.zip’ already there; not retrieving.
!rm -rf data/electricity
!unzip data/electricity.zip -d data/electricity
Archive:  data/electricity.zip
  inflating: data/electricity/Metadata_Indicator_API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv  
  inflating: data/electricity/API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv  
  inflating: data/electricity/Metadata_Country_API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv  

Load data#

import pandas as pd

electricity = pd.read_csv("./data/electricity/API_EG.ELC.ACCS.ZS_DS2_EN_csv_v2_12260.csv", header=2)
electricity
Country Name Country Code Indicator Name Indicator Code 2002 2003 2004 2005 2006 2007 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022 Unnamed: 25
0 Aruba ABW Access to electricity (% of population) EG.ELC.ACCS.ZS 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 ... 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 99.900000 NaN
1 Africa Eastern and Southern AFE Access to electricity (% of population) EG.ELC.ACCS.ZS 21.580282 22.516962 23.753953 23.487376 25.198527 26.807288 ... 31.860474 33.903800 38.854624 40.199898 43.017148 44.381259 46.264875 48.100862 48.711995 NaN
2 Afghanistan AFG Access to electricity (% of population) EG.ELC.ACCS.ZS 14.100000 19.000000 23.800000 28.700000 33.500000 38.400000 ... 89.500000 71.500000 97.700000 97.700000 93.400000 97.700000 97.700000 97.700000 85.300000 NaN
3 Africa Western and Central AFW Access to electricity (% of population) EG.ELC.ACCS.ZS 35.700600 39.821304 37.492583 37.784220 39.830830 40.743670 ... 47.662536 46.758739 50.906115 48.789457 51.211055 51.168083 51.730899 54.224724 55.437577 NaN
4 Angola AGO Access to electricity (% of population) EG.ELC.ACCS.ZS 26.300000 27.400000 28.400000 29.400000 30.500000 37.500000 ... 32.000000 42.000000 41.800000 42.900000 45.300000 45.600000 47.000000 48.200000 48.500000 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 Kosovo XKX Access to electricity (% of population) EG.ELC.ACCS.ZS NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
262 Yemen, Rep. YEM Access to electricity (% of population) EG.ELC.ACCS.ZS 51.500000 52.600000 49.600000 54.800000 55.800000 57.200000 ... 66.100000 67.400000 68.900000 79.200000 62.000000 72.800000 73.900000 74.900000 76.000000 NaN
263 South Africa ZAF Access to electricity (% of population) EG.ELC.ACCS.ZS 76.700000 78.800000 80.600000 80.800000 80.700000 82.000000 ... 85.900000 85.300000 83.900000 84.400000 84.700000 85.000000 90.000000 89.300000 86.500000 NaN
264 Zambia ZMB Access to electricity (% of population) EG.ELC.ACCS.ZS 17.400000 18.500000 20.300000 23.300000 24.100000 18.500000 ... 27.900000 31.100000 35.400000 40.300000 40.200000 43.000000 44.600000 46.700000 47.800000 NaN
265 Zimbabwe ZWE Access to electricity (% of population) EG.ELC.ACCS.ZS 34.200000 35.100000 35.600000 36.100000 37.200000 37.200000 ... 32.300000 33.700000 42.500000 44.000000 45.400000 46.700000 52.700000 49.000000 50.100000 NaN

266 rows × 26 columns

Reshape#

electricity_by_country_by_year = (
    electricity.drop(columns=["Indicator Name", "Indicator Code"])
    .melt(
        id_vars=["Country Name", "Country Code"],
        var_name="Year",
        value_name="Access to electricity (% of population)",
    )
    .dropna()
)

electricity_by_country_by_year
Country Name Country Code Year Access to electricity (% of population)
0 Aruba ABW 2002 100.000000
1 Africa Eastern and Southern AFE 2002 21.580282
2 Afghanistan AFG 2002 14.100000
3 Africa Western and Central AFW 2002 35.700600
4 Angola AGO 2002 26.300000
... ... ... ... ...
5580 Samoa WSM 2022 98.300000
5582 Yemen, Rep. YEM 2022 76.000000
5583 South Africa ZAF 2022 86.500000
5584 Zambia ZMB 2022 47.800000
5585 Zimbabwe ZWE 2022 50.100000

5506 rows × 4 columns

Plot#

import plotly.express as px

fig = px.line(
    electricity_by_country_by_year,
    x="Year",
    y="Access to electricity (% of population)",
    color="Country Name",
    title="Access to electricity over time",
)
fig.show()

Plotly showing that many values on an axis (the Year in this case) indicates that it’s a string.

Fix year dtype#

electricity_by_country_by_year.dtypes
Country Name                                object
Country Code                                object
Year                                        object
Access to electricity (% of population)    float64
dtype: object
electricity_by_country_by_year["Year"] = electricity_by_country_by_year["Year"].astype(int)

Filter#

avg_electricity = electricity_by_country_by_year.groupby("Country Name")["Access to electricity (% of population)"].mean()
avg_electricity
Country Name
Afghanistan                    61.876190
Africa Eastern and Southern    32.878831
Africa Western and Central     45.321271
Albania                        99.757143
Algeria                        99.142857
                                 ...    
West Bank and Gaza             99.742857
World                          85.355292
Yemen, Rep.                    63.404762
Zambia                         30.523810
Zimbabwe                       40.633333
Name: Access to electricity (% of population), Length: 263, dtype: float64
bottom_cutoff = avg_electricity.quantile(0.05)
bottom_cutoff
np.float64(21.837142857142858)
bottom_countries = avg_electricity[avg_electricity < bottom_cutoff]
bottom_countries
Country Name
Burkina Faso                14.676190
Burundi                      6.547619
Central African Republic    10.990476
Chad                         7.566667
Congo, Dem. Rep.            14.438095
Guinea-Bissau               16.771429
Liberia                     14.768750
Madagascar                  21.823810
Malawi                       9.547619
Mozambique                  20.528571
Niger                       14.133333
Papua New Guinea            16.957143
Sierra Leone                17.604762
South Sudan                  4.475000
Name: Access to electricity (% of population), dtype: float64
bottom_countries_electricity = electricity_by_country_by_year[electricity_by_country_by_year["Country Name"].isin(bottom_countries.index)]

fig = px.line(
    bottom_countries_electricity,
    x="Year",
    y="Access to electricity (% of population)",
    color="Country Name",
    title="Access to electricity over time by country, 5th percentile",
)
fig.show()

World Happiness Report data#

Requires xlrd package. How to install.

happiness = pd.read_excel("https://happiness-report.s3.amazonaws.com/2024/DataForTable2.1.xls")
happiness
Country name year Life Ladder Log GDP per capita Social support Healthy life expectancy at birth Freedom to make life choices Generosity Perceptions of corruption Positive affect Negative affect
0 Afghanistan 2008 3.723590 7.350416 0.450662 50.500000 0.718114 0.164055 0.881686 0.414297 0.258195
1 Afghanistan 2009 4.401778 7.508646 0.552308 50.799999 0.678896 0.187297 0.850035 0.481421 0.237092
2 Afghanistan 2010 4.758381 7.613900 0.539075 51.099998 0.600127 0.117861 0.706766 0.516907 0.275324
3 Afghanistan 2011 3.831719 7.581259 0.521104 51.400002 0.495901 0.160098 0.731109 0.479835 0.267175
4 Afghanistan 2012 3.782938 7.660506 0.520637 51.700001 0.530935 0.234157 0.775620 0.613513 0.267919
... ... ... ... ... ... ... ... ... ... ... ...
2358 Zimbabwe 2019 2.693523 7.697755 0.759162 53.099998 0.631908 -0.050874 0.830652 0.658434 0.235354
2359 Zimbabwe 2020 3.159802 7.596050 0.717243 53.575001 0.643303 0.002848 0.788523 0.660658 0.345736
2360 Zimbabwe 2021 3.154578 7.656878 0.685151 54.049999 0.667636 -0.079007 0.756945 0.609917 0.241682
2361 Zimbabwe 2022 3.296220 7.670073 0.666172 54.525002 0.651987 -0.072935 0.752632 0.640609 0.191350
2362 Zimbabwe 2023 3.572386 7.678590 0.693817 55.000000 0.734613 -0.068837 0.757494 0.609752 0.178953

2363 rows × 11 columns

fig = px.line(happiness, x="year", y="Life Ladder", color="Country name", title="World Happiness")
fig = fig.show()
happiness.dtypes
Country name                         object
year                                  int64
Life Ladder                         float64
Log GDP per capita                  float64
Social support                      float64
Healthy life expectancy at birth    float64
Freedom to make life choices        float64
Generosity                          float64
Perceptions of corruption           float64
Positive affect                     float64
Negative affect                     float64
dtype: object

Single country#

country_name = "Madagascar"

madagascar_electricity = electricity_by_country_by_year[electricity_by_country_by_year["Country Name"] == country_name]
madagascar_electricity
Country Name Country Code Year Access to electricity (% of population)
151 Madagascar MDG 2002 14.0
417 Madagascar MDG 2003 20.3
683 Madagascar MDG 2004 15.3
949 Madagascar MDG 2005 16.0
1215 Madagascar MDG 2006 16.8
1481 Madagascar MDG 2007 17.5
1747 Madagascar MDG 2008 18.3
2013 Madagascar MDG 2009 17.4
2279 Madagascar MDG 2010 12.3
2545 Madagascar MDG 2011 14.3
2811 Madagascar MDG 2012 18.7
3077 Madagascar MDG 2013 12.9
3343 Madagascar MDG 2014 23.0
3609 Madagascar MDG 2015 23.8
3875 Madagascar MDG 2016 22.9
4141 Madagascar MDG 2017 24.1
4407 Madagascar MDG 2018 36.5
4673 Madagascar MDG 2019 31.0
4939 Madagascar MDG 2020 32.0
5205 Madagascar MDG 2021 35.1
5471 Madagascar MDG 2022 36.1
madagascar_happiness = happiness[happiness["Country name"] == country_name]
madagascar_happiness
Country name year Life Ladder Log GDP per capita Social support Healthy life expectancy at birth Freedom to make life choices Generosity Perceptions of corruption Positive affect Negative affect
1250 Madagascar 2006 3.979751 7.351137 0.711135 54.139999 NaN -0.041855 NaN 0.562594 0.161333
1251 Madagascar 2008 4.640079 7.413261 0.775689 54.619999 0.332436 -0.102579 0.773067 0.582537 0.214525
1252 Madagascar 2011 4.381415 7.308839 0.818403 55.340000 0.545556 -0.065021 0.897100 0.516020 0.234826
1253 Madagascar 2012 3.550610 7.311225 0.673088 55.580002 0.487008 -0.058006 0.853590 0.587817 0.193977
1254 Madagascar 2013 3.815607 7.307252 0.672547 55.820000 0.479550 -0.021556 0.867708 0.600077 0.241231
1255 Madagascar 2014 3.675627 7.313828 0.655214 56.060001 0.528805 -0.026291 0.791056 0.641384 0.192182
1256 Madagascar 2015 3.592514 7.318788 0.646717 56.299999 0.544754 -0.044008 0.860953 0.674233 0.226243
1257 Madagascar 2016 3.663086 7.332083 0.746497 56.549999 0.569645 -0.072243 0.864171 0.669859 0.204255
1258 Madagascar 2017 4.078620 7.344820 0.626332 56.799999 0.570348 -0.036631 0.847261 0.701162 0.374838
1259 Madagascar 2018 4.070587 7.350723 0.665513 57.049999 0.551473 0.000118 0.889146 0.723372 0.362014
1260 Madagascar 2019 4.339087 7.368637 0.700610 57.299999 0.549535 -0.014780 0.719983 0.698720 0.303960
1261 Madagascar 2022 4.019134 7.314874 0.642429 58.049999 0.523493 0.070040 0.740403 0.686895 0.345107
1262 Madagascar 2023 4.433001 7.332643 0.692405 58.299999 0.503716 0.095865 0.783531 0.718467 0.389158

Two Y Axes#

Based on Plotly example.

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(
        x=madagascar_electricity["Year"],
        y=madagascar_electricity["Access to electricity (% of population)"],
        name="Access to electricity",
    ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(
        x=madagascar_happiness["year"],
        y=madagascar_happiness["Life Ladder"],
        name="Life Ladder",
    ),
    secondary_y=True,
)

# Add figure title
fig.update_layout(title_text=f"Access to electricity vs. Happiness, {country_name}")

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="% of population", secondary_y=False)
fig.update_yaxes(title_text="Happiness score", secondary_y=True)

fig.show()