Lab 11 solutions#
Buses#
Load data#
import pandas as pd
ridership = pd.read_csv("https://data.ny.gov/api/views/vxuj-8kew/rows.csv?accessType=DOWNLOAD")
ridership
Date | Subways: Total Estimated Ridership | Subways: % of Comparable Pre-Pandemic Day | Buses: Total Estimated Ridership | Buses: % of Comparable Pre-Pandemic Day | LIRR: Total Estimated Ridership | LIRR: % of Comparable Pre-Pandemic Day | Metro-North: Total Estimated Ridership | Metro-North: % of Comparable Pre-Pandemic Day | Access-A-Ride: Total Scheduled Trips | Access-A-Ride: % of Comparable Pre-Pandemic Day | Bridges and Tunnels: Total Traffic | Bridges and Tunnels: % of Comparable Pre-Pandemic Day | Staten Island Railway: Total Estimated Ridership | Staten Island Railway: % of Comparable Pre-Pandemic Day | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 01/01/2021 | 613692 | 0.29 | 378288 | 0.41 | 28977 | 0.35 | 14988 | 0.17 | 5960 | 0.44 | 445950 | 0.65 | 805 | 0.29 |
1 | 01/01/2022 | 1027918 | 0.38 | 350845 | 0.29 | 33980 | 0.35 | 30341 | 0.23 | 4904 | 0.34 | 498515 | 0.65 | 1262 | 0.31 |
2 | 01/01/2023 | 1675507 | 0.80 | 475226 | 0.52 | 67722 | 0.82 | 66309 | 0.73 | 11476 | 0.85 | 737533 | 1.08 | 1771 | 0.65 |
3 | 01/01/2024 | 1648734 | 0.79 | 455965 | 0.50 | 82811 | 1.00 | 73957 | 0.82 | 9165 | 0.68 | 730489 | 1.07 | 2018 | 0.74 |
4 | 01/02/2021 | 988418 | 0.37 | 608686 | 0.51 | 28879 | 0.30 | 23139 | 0.18 | 10450 | 0.72 | 624765 | 0.82 | 1194 | 0.29 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1714 | 12/30/2023 | 2440211 | 0.74 | 720564 | 0.57 | 121688 | 0.95 | 121647 | 0.77 | 16983 | 0.99 | 833538 | 0.94 | 2780 | 0.56 |
1715 | 12/31/2020 | 1274984 | 0.24 | 792993 | 0.40 | 75157 | 0.24 | 39947 | 0.14 | 17167 | 0.59 | 704297 | 0.79 | 2162 | 0.14 |
1716 | 12/31/2021 | 1627589 | 0.64 | 699749 | 0.71 | 96699 | 0.91 | 85232 | 0.79 | 11498 | 0.64 | 628305 | 0.79 | 2270 | 0.67 |
1717 | 12/31/2022 | 1927101 | 0.58 | 651474 | 0.51 | 81590 | 0.64 | 81198 | 0.52 | 14090 | 0.82 | 686434 | 0.78 | 1847 | 0.37 |
1718 | 12/31/2023 | 1934651 | 0.76 | 548344 | 0.56 | 99817 | 0.94 | 89554 | 0.83 | 20005 | 1.12 | 671873 | 0.84 | 2029 | 0.60 |
1719 rows × 15 columns
ridership.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1719 entries, 0 to 1718
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1719 non-null object
1 Subways: Total Estimated Ridership 1719 non-null int64
2 Subways: % of Comparable Pre-Pandemic Day 1719 non-null float64
3 Buses: Total Estimated Ridership 1719 non-null int64
4 Buses: % of Comparable Pre-Pandemic Day 1719 non-null float64
5 LIRR: Total Estimated Ridership 1719 non-null int64
6 LIRR: % of Comparable Pre-Pandemic Day 1719 non-null float64
7 Metro-North: Total Estimated Ridership 1719 non-null int64
8 Metro-North: % of Comparable Pre-Pandemic Day 1719 non-null float64
9 Access-A-Ride: Total Scheduled Trips 1719 non-null int64
10 Access-A-Ride: % of Comparable Pre-Pandemic Day 1719 non-null float64
11 Bridges and Tunnels: Total Traffic 1719 non-null int64
12 Bridges and Tunnels: % of Comparable Pre-Pandemic Day 1719 non-null float64
13 Staten Island Railway: Total Estimated Ridership 1719 non-null int64
14 Staten Island Railway: % of Comparable Pre-Pandemic Day 1719 non-null float64
dtypes: float64(7), int64(7), object(1)
memory usage: 201.6+ KB
Re-format table#
ridership["Date"] = pd.to_datetime(ridership["Date"], format="%m/%d/%Y")
buses = ridership[["Date", "Buses: Total Estimated Ridership"]]
buses
Date | Buses: Total Estimated Ridership | |
---|---|---|
0 | 2021-01-01 | 378288 |
1 | 2022-01-01 | 350845 |
2 | 2023-01-01 | 475226 |
3 | 2024-01-01 | 455965 |
4 | 2021-01-02 | 608686 |
... | ... | ... |
1714 | 2023-12-30 | 720564 |
1715 | 2020-12-31 | 792993 |
1716 | 2021-12-31 | 699749 |
1717 | 2022-12-31 | 651474 |
1718 | 2023-12-31 | 548344 |
1719 rows × 2 columns
buses = buses.sort_values("Date")
buses
Date | Buses: Total Estimated Ridership | |
---|---|---|
237 | 2020-03-01 | 984908 |
242 | 2020-03-02 | 2209066 |
247 | 2020-03-03 | 2228608 |
252 | 2020-03-04 | 2177165 |
257 | 2020-03-05 | 2244515 |
... | ... | ... |
1506 | 2024-11-09 | 917222 |
1511 | 2024-11-10 | 711123 |
1516 | 2024-11-11 | 1166004 |
1521 | 2024-11-12 | 1463667 |
1526 | 2024-11-13 | 1401832 |
1719 rows × 2 columns
Ridership by day#
import plotly.express as px
px.line(
buses,
x="Date",
y="Buses: Total Estimated Ridership",
title="Estimated MTA bus ridership per day",
labels={
"Buses: Total Estimated Ridership": "Num riders",
},
)
Ridership by month#
ridership_by_month = buses.resample("MS", on="Date").sum()
ridership_by_month
Buses: Total Estimated Ridership | |
---|---|
Date | |
2020-03-01 | 31871784 |
2020-04-01 | 471515 |
2020-05-01 | 410704 |
2020-06-01 | 658374 |
2020-07-01 | 828108 |
2020-08-01 | 1609110 |
2020-09-01 | 28067094 |
2020-10-01 | 30114260 |
2020-11-01 | 27049641 |
2020-12-01 | 26307109 |
2021-01-01 | 25035873 |
2021-02-01 | 22875301 |
2021-03-01 | 29700545 |
2021-04-01 | 30573447 |
2021-05-01 | 31797784 |
2021-06-01 | 32745118 |
2021-07-01 | 32814466 |
2021-08-01 | 32707261 |
2021-09-01 | 35736273 |
2021-10-01 | 38137454 |
2021-11-01 | 35751726 |
2021-12-01 | 33762618 |
2022-01-01 | 28820239 |
2022-02-01 | 30658017 |
2022-03-01 | 38160474 |
2022-04-01 | 35717326 |
2022-05-01 | 37640891 |
2022-06-01 | 36705213 |
2022-07-01 | 34610861 |
2022-08-01 | 35776375 |
2022-09-01 | 37419219 |
2022-10-01 | 38174050 |
2022-11-01 | 36040114 |
2022-12-01 | 34224045 |
2023-01-01 | 35224849 |
2023-02-01 | 33215878 |
2023-03-01 | 39859431 |
2023-04-01 | 35671477 |
2023-05-01 | 39582710 |
2023-06-01 | 36208100 |
2023-07-01 | 34367040 |
2023-08-01 | 35479692 |
2023-09-01 | 34880433 |
2023-10-01 | 36771898 |
2023-11-01 | 33290217 |
2023-12-01 | 30988074 |
2024-01-01 | 31810480 |
2024-02-01 | 31355350 |
2024-03-01 | 34122949 |
2024-04-01 | 33317334 |
2024-05-01 | 35068922 |
2024-06-01 | 32526588 |
2024-07-01 | 32503359 |
2024-08-01 | 31340557 |
2024-09-01 | 37040344 |
2024-10-01 | 40362017 |
2024-11-01 | 15705476 |
px.line(
ridership_by_month,
y="Buses: Total Estimated Ridership",
title="Estimated MTA bus ridership per month",
labels={
"Buses: Total Estimated Ridership": "Num riders",
},
)
Didn’t need to specify an x
, since the date is the index.