Lecture 20 exercise - section 1#
Live-coded solution.
Electricity data#
import pandas as pd
electricity = pd.read_csv("data/P_Data_Extract_From_World_Development_Indicators/0a6eb0d7-9c99-4d18-bf17-7839ff3ad1ed_Data.csv")
electricity
Series Name | Series Code | Country Name | Country Code | 1990 [YR1990] | 2000 [YR2000] | 2014 [YR2014] | 2015 [YR2015] | 2016 [YR2016] | 2017 [YR2017] | 2018 [YR2018] | 2019 [YR2019] | 2020 [YR2020] | 2021 [YR2021] | 2022 [YR2022] | 2023 [YR2023] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Afghanistan | AFG | .. | 4.4 | 89.5 | 71.5 | 97.7 | 97.7 | 93.4 | 97.7 | 97.7 | 97.7 | 85.3 | .. |
1 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Albania | ALB | 100 | 99.4 | 100 | 100 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | .. |
2 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Algeria | DZA | .. | 98.6 | 99.3 | 99.4 | 99.4 | 99.5 | 99.6 | 99.5 | 99.7 | 99.8 | 100 | .. |
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
4 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Andorra | AND | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | .. |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
266 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
267 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
268 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
269 | Data from database: World Development Indicators | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
270 | Last Updated: 10/24/2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
271 rows × 16 columns
Clean#
electricity.fillna(0)
Series Name | Series Code | Country Name | Country Code | 1990 [YR1990] | 2000 [YR2000] | 2014 [YR2014] | 2015 [YR2015] | 2016 [YR2016] | 2017 [YR2017] | 2018 [YR2018] | 2019 [YR2019] | 2020 [YR2020] | 2021 [YR2021] | 2022 [YR2022] | 2023 [YR2023] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Afghanistan | AFG | .. | 4.4 | 89.5 | 71.5 | 97.7 | 97.7 | 93.4 | 97.7 | 97.7 | 97.7 | 85.3 | .. |
1 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Albania | ALB | 100 | 99.4 | 100 | 100 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | .. |
2 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Algeria | DZA | .. | 98.6 | 99.3 | 99.4 | 99.4 | 99.5 | 99.6 | 99.5 | 99.7 | 99.8 | 100 | .. |
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
4 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Andorra | AND | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | .. |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
266 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
267 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
268 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
269 | Data from database: World Development Indicators | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
270 | Last Updated: 10/24/2024 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
271 rows × 16 columns
electricity[electricity["Country Code"] == "ASM"]
Series Name | Series Code | Country Name | Country Code | 1990 [YR1990] | 2000 [YR2000] | 2014 [YR2014] | 2015 [YR2015] | 2016 [YR2016] | 2017 [YR2017] | 2018 [YR2018] | 2019 [YR2019] | 2020 [YR2020] | 2021 [YR2021] | 2022 [YR2022] | 2023 [YR2023] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
electricity["1990 [YR1990]"].value_counts(dropna=False)
1990 [YR1990]
.. 159
100 78
NaN 5
99.8443069458008 2
92.1548004150391 1
27.3 1
99.9943911362469 1
99.546800179424 1
99.8425200685198 1
13.9 1
88.7 1
32.8 1
91.3068161010742 1
99.9834289550781 1
70.19 1
96.6253509521484 1
99.8532638549805 1
98.9990844726563 1
99.9413986206055 1
99.0415649414063 1
95.7630462646484 1
99.8828430175781 1
96.8 1
70.334586 1
89.9 1
92.257427 1
99.846923828125 1
94.606559753418 1
87.475116 1
99.9842529296875 1
99.8492813110352 1
Name: count, dtype: int64
electricity["1990 [YR1990]"].mean()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 electricity["1990 [YR1990]"].mean()
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/series.py:6549, in Series.mean(self, axis, skipna, numeric_only, **kwargs)
6541 @doc(make_doc("mean", ndim=1))
6542 def mean(
6543 self,
(...)
6547 **kwargs,
6548 ):
-> 6549 return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/generic.py:12420, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
12413 def mean(
12414 self,
12415 axis: Axis | None = 0,
(...)
12418 **kwargs,
12419 ) -> Series | float:
> 12420 return self._stat_function(
12421 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
12422 )
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/generic.py:12377, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
12373 nv.validate_func(name, (), kwargs)
12375 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 12377 return self._reduce(
12378 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
12379 )
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/series.py:6457, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
6452 # GH#47500 - change to TypeError to match other methods
6453 raise TypeError(
6454 f"Series.{name} does not allow {kwd_name}={numeric_only} "
6455 "with non-numeric dtypes."
6456 )
-> 6457 return op(delegate, skipna=skipna, **kwds)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/nanops.py:147, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
145 result = alt(values, axis=axis, skipna=skipna, **kwds)
146 else:
--> 147 result = alt(values, axis=axis, skipna=skipna, **kwds)
149 return result
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/nanops.py:404, in _datetimelike_compat.<locals>.new_func(values, axis, skipna, mask, **kwargs)
401 if datetimelike and mask is None:
402 mask = isna(values)
--> 404 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
406 if datetimelike:
407 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/nanops.py:719, in nanmean(values, axis, skipna, mask)
716 dtype_count = dtype
718 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 719 the_sum = values.sum(axis, dtype=dtype_sum)
720 the_sum = _ensure_numeric(the_sum)
722 if axis is not None and getattr(the_sum, "ndim", False):
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/numpy/_core/_methods.py:53, in _sum(a, axis, dtype, out, keepdims, initial, where)
51 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
52 initial=_NoValue, where=True):
---> 53 return umr_sum(a, axis, dtype, out, keepdims, initial, where)
TypeError: can only concatenate str (not "int") to str
electricity["1990 [YR1990]"].unique()
array(['..', '100', '95.7630462646484', '92.1548004150391',
'99.9413986206055', '99.9842529296875', '87.475116',
'94.606559753418', '99.846923828125', '92.257427', '89.9',
'99.8443069458008', '70.334586', '96.8', '99.8828430175781',
'99.8492813110352', '99.0415649414063', '98.9990844726563', '27.3',
'99.8532638549805', '96.6253509521484', '70.19',
'99.9834289550781', '91.3068161010742', '32.8', '88.7', '13.9',
'99.8425200685198', '99.546800179424', '99.9943911362469', nan],
dtype=object)
electricity["1990 [YR1990]"].astype(int)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 electricity["1990 [YR1990]"].astype(int)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/generic.py:6643, in NDFrame.astype(self, dtype, copy, errors)
6637 results = [
6638 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
6639 ]
6641 else:
6642 # else, only a single dtype is given
-> 6643 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6644 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6645 return res.__finalize__(self, method="astype")
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/internals/managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/internals/blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors)
234 dtype = dtype.numpy_dtype
236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/dtypes/astype.py:182, in astype_array(values, dtype, copy)
179 values = values.astype(dtype, copy=copy)
181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/dtypes/astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
129 raise ValueError(msg)
131 if copy or arr.dtype == object or dtype == object:
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)
ValueError: invalid literal for int() with base 10: '..'
import numpy as np
electricity["1990 [YR1990]"].replace({"..": np.nan}).value_counts()
1990 [YR1990]
100 78
99.8443069458008 2
99.0415649414063 1
99.546800179424 1
99.8425200685198 1
13.9 1
88.7 1
32.8 1
91.3068161010742 1
99.9834289550781 1
70.19 1
96.6253509521484 1
99.8532638549805 1
27.3 1
98.9990844726563 1
99.8492813110352 1
95.7630462646484 1
99.8828430175781 1
96.8 1
70.334586 1
89.9 1
92.257427 1
99.846923828125 1
94.606559753418 1
87.475116 1
99.9842529296875 1
99.9413986206055 1
92.1548004150391 1
99.9943911362469 1
Name: count, dtype: int64
electricity["1990 [YR1990]"].replace({"..": np.nan}).astype(float)
0 NaN
1 100.0
2 NaN
3 NaN
4 100.0
...
266 NaN
267 NaN
268 NaN
269 NaN
270 NaN
Name: 1990 [YR1990], Length: 271, dtype: float64
electricity_clean = electricity.replace({"..": np.nan})
electricity_clean
/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_70664/1911216467.py:1: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
electricity_clean = electricity.replace({
Series Name | Series Code | Country Name | Country Code | 1990 [YR1990] | 2000 [YR2000] | 2014 [YR2014] | 2015 [YR2015] | 2016 [YR2016] | 2017 [YR2017] | 2018 [YR2018] | 2019 [YR2019] | 2020 [YR2020] | 2021 [YR2021] | 2022 [YR2022] | 2023 [YR2023] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Afghanistan | AFG | NaN | 4.4 | 89.5 | 71.5 | 97.7 | 97.7 | 93.4 | 97.7 | 97.7 | 97.7 | 85.3 | NaN |
1 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Albania | ALB | 100 | 99.4 | 100 | 100 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | NaN |
2 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Algeria | DZA | NaN | 98.6 | 99.3 | 99.4 | 99.4 | 99.5 | 99.6 | 99.5 | 99.7 | 99.8 | 100 | NaN |
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Andorra | AND | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
266 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
267 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
268 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
269 | Data from database: World Development Indicators | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
270 | Last Updated: 10/24/2024 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
271 rows × 16 columns
Reshape#
electricity_per_year = electricity_clean.melt(
id_vars=["Series Name", "Series Code", "Country Name", "Country Code"],
var_name="Year",
value_name="Access to electricity (% pop)",
)
electricity_per_year
Series Name | Series Code | Country Name | Country Code | Year | Access to electricity (% pop) | |
---|---|---|---|---|---|---|
0 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Afghanistan | AFG | 1990 [YR1990] | NaN |
1 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Albania | ALB | 1990 [YR1990] | 100 |
2 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Algeria | DZA | 1990 [YR1990] | NaN |
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | 1990 [YR1990] | NaN |
4 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Andorra | AND | 1990 [YR1990] | 100 |
... | ... | ... | ... | ... | ... | ... |
3247 | NaN | NaN | NaN | NaN | 2023 [YR2023] | NaN |
3248 | NaN | NaN | NaN | NaN | 2023 [YR2023] | NaN |
3249 | NaN | NaN | NaN | NaN | 2023 [YR2023] | NaN |
3250 | Data from database: World Development Indicators | NaN | NaN | NaN | 2023 [YR2023] | NaN |
3251 | Last Updated: 10/24/2024 | NaN | NaN | NaN | 2023 [YR2023] | NaN |
3252 rows × 6 columns
Plot#
electricity_per_year["Access to electricity (% pop)"] = electricity_per_year["Access to electricity (% pop)"].astype(float)
import plotly.express as px
px.line(
electricity_per_year,
x="Year",
y="Access to electricity (% pop)",
color="Country Name",
title="Access to electricity for each country by year",
)
Happiness data#
happiness = pd.read_excel("data/DataForTable2.1.xls")
happiness
Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.350416 | 0.450662 | 50.500000 | 0.718114 | 0.164055 | 0.881686 | 0.414297 | 0.258195 |
1 | Afghanistan | 2009 | 4.401778 | 7.508646 | 0.552308 | 50.799999 | 0.678896 | 0.187297 | 0.850035 | 0.481421 | 0.237092 |
2 | Afghanistan | 2010 | 4.758381 | 7.613900 | 0.539075 | 51.099998 | 0.600127 | 0.117861 | 0.706766 | 0.516907 | 0.275324 |
3 | Afghanistan | 2011 | 3.831719 | 7.581259 | 0.521104 | 51.400002 | 0.495901 | 0.160098 | 0.731109 | 0.479835 | 0.267175 |
4 | Afghanistan | 2012 | 3.782938 | 7.660506 | 0.520637 | 51.700001 | 0.530935 | 0.234157 | 0.775620 | 0.613513 | 0.267919 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2358 | Zimbabwe | 2019 | 2.693523 | 7.697755 | 0.759162 | 53.099998 | 0.631908 | -0.050874 | 0.830652 | 0.658434 | 0.235354 |
2359 | Zimbabwe | 2020 | 3.159802 | 7.596050 | 0.717243 | 53.575001 | 0.643303 | 0.002848 | 0.788523 | 0.660658 | 0.345736 |
2360 | Zimbabwe | 2021 | 3.154578 | 7.656878 | 0.685151 | 54.049999 | 0.667636 | -0.079007 | 0.756945 | 0.609917 | 0.241682 |
2361 | Zimbabwe | 2022 | 3.296220 | 7.670073 | 0.666172 | 54.525002 | 0.651987 | -0.072935 | 0.752632 | 0.640609 | 0.191350 |
2362 | Zimbabwe | 2023 | 3.572386 | 7.678590 | 0.693817 | 55.000000 | 0.734613 | -0.068837 | 0.757494 | 0.609752 | 0.178953 |
2363 rows × 11 columns
Merge#
pd.merge(
electricity_per_year,
happiness,
left_on=["Country Code", "Year"],
right_on=["Country name", "year"],
)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 pd.merge(
2 electricity_per_year,
3 happiness,
4 left_on=["Country Code", "Year"],
5 right_on=["Country name", "year"]
6 )
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/reshape/merge.py:170, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
155 return _cross_merge(
156 left_df,
157 right_df,
(...)
167 copy=copy,
168 )
169 else:
--> 170 op = _MergeOperation(
171 left_df,
172 right_df,
173 how=how,
174 on=on,
175 left_on=left_on,
176 right_on=right_on,
177 left_index=left_index,
178 right_index=right_index,
179 sort=sort,
180 suffixes=suffixes,
181 indicator=indicator,
182 validate=validate,
183 )
184 return op.get_result(copy=copy)
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/reshape/merge.py:807, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, indicator, validate)
803 self._validate_tolerance(self.left_join_keys)
805 # validate the merge keys dtypes. We may need to coerce
806 # to avoid incompatible dtypes
--> 807 self._maybe_coerce_merge_keys()
809 # If argument passed to validate,
810 # check if columns specified as unique
811 # are in fact unique.
812 if validate is not None:
File /opt/homebrew/Caskroom/mambaforge/base/envs/computing-in-context/lib/python3.12/site-packages/pandas/core/reshape/merge.py:1508, in _MergeOperation._maybe_coerce_merge_keys(self)
1502 # unless we are merging non-string-like with string-like
1503 elif (
1504 inferred_left in string_types and inferred_right not in string_types
1505 ) or (
1506 inferred_right in string_types and inferred_left not in string_types
1507 ):
-> 1508 raise ValueError(msg)
1510 # datetimelikes must match exactly
1511 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):
ValueError: You are trying to merge on object and int64 columns for key 'Year'. If you wish to proceed you should use pd.concat
Clean#
electricity_per_year["Year"].unique()
array(['1990 [YR1990]', '2000 [YR2000]', '2014 [YR2014]', '2015 [YR2015]',
'2016 [YR2016]', '2017 [YR2017]', '2018 [YR2018]', '2019 [YR2019]',
'2020 [YR2020]', '2021 [YR2021]', '2022 [YR2022]', '2023 [YR2023]'],
dtype=object)
electricity_per_year["Year"] = electricity_per_year["Year"].str.replace(r" \[YR\d{4}\]", "", regex=True).astype(int)
electricity_per_year
Series Name | Series Code | Country Name | Country Code | Year | Access to electricity (% pop) | |
---|---|---|---|---|---|---|
0 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Afghanistan | AFG | 1990 | NaN |
1 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Albania | ALB | 1990 | 100.0 |
2 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Algeria | DZA | 1990 | NaN |
3 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | American Samoa | ASM | 1990 | NaN |
4 | Access to electricity (% of population) | EG.ELC.ACCS.ZS | Andorra | AND | 1990 | 100.0 |
... | ... | ... | ... | ... | ... | ... |
3247 | NaN | NaN | NaN | NaN | 2023 | NaN |
3248 | NaN | NaN | NaN | NaN | 2023 | NaN |
3249 | NaN | NaN | NaN | NaN | 2023 | NaN |
3250 | Data from database: World Development Indicators | NaN | NaN | NaN | 2023 | NaN |
3251 | Last Updated: 10/24/2024 | NaN | NaN | NaN | 2023 | NaN |
3252 rows × 6 columns
pd.merge(
electricity_per_year,
happiness,
left_on=["Country Code", "Year"],
right_on=["Country name", "year"],
)
Series Name | Series Code | Country Name | Country Code | Year | Access to electricity (% pop) | Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect |
---|
Jump over to the end of official solution.