Lab 11 solutions#

Buses#

Load data#

import pandas as pd

ridership = pd.read_csv("https://data.ny.gov/api/views/vxuj-8kew/rows.csv?accessType=DOWNLOAD")
ridership
Date Subways: Total Estimated Ridership Subways: % of Comparable Pre-Pandemic Day Buses: Total Estimated Ridership Buses: % of Comparable Pre-Pandemic Day LIRR: Total Estimated Ridership LIRR: % of Comparable Pre-Pandemic Day Metro-North: Total Estimated Ridership Metro-North: % of Comparable Pre-Pandemic Day Access-A-Ride: Total Scheduled Trips Access-A-Ride: % of Comparable Pre-Pandemic Day Bridges and Tunnels: Total Traffic Bridges and Tunnels: % of Comparable Pre-Pandemic Day Staten Island Railway: Total Estimated Ridership Staten Island Railway: % of Comparable Pre-Pandemic Day
0 01/01/2021 613692 0.29 378288 0.41 28977 0.35 14988 0.17 5960 0.44 445950 0.65 805 0.29
1 01/01/2022 1027918 0.38 350845 0.29 33980 0.35 30341 0.23 4904 0.34 498515 0.65 1262 0.31
2 01/01/2023 1675507 0.80 475226 0.52 67722 0.82 66309 0.73 11476 0.85 737533 1.08 1771 0.65
3 01/01/2024 1648734 0.79 455965 0.50 82811 1.00 73957 0.82 9165 0.68 730489 1.07 2018 0.74
4 01/02/2021 988418 0.37 608686 0.51 28879 0.30 23139 0.18 10450 0.72 624765 0.82 1194 0.29
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1714 12/30/2023 2440211 0.74 720564 0.57 121688 0.95 121647 0.77 16983 0.99 833538 0.94 2780 0.56
1715 12/31/2020 1274984 0.24 792993 0.40 75157 0.24 39947 0.14 17167 0.59 704297 0.79 2162 0.14
1716 12/31/2021 1627589 0.64 699749 0.71 96699 0.91 85232 0.79 11498 0.64 628305 0.79 2270 0.67
1717 12/31/2022 1927101 0.58 651474 0.51 81590 0.64 81198 0.52 14090 0.82 686434 0.78 1847 0.37
1718 12/31/2023 1934651 0.76 548344 0.56 99817 0.94 89554 0.83 20005 1.12 671873 0.84 2029 0.60

1719 rows × 15 columns

ridership.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1719 entries, 0 to 1718
Data columns (total 15 columns):
 #   Column                                                   Non-Null Count  Dtype  
---  ------                                                   --------------  -----  
 0   Date                                                     1719 non-null   object 
 1   Subways: Total Estimated Ridership                       1719 non-null   int64  
 2   Subways: % of Comparable Pre-Pandemic Day                1719 non-null   float64
 3   Buses: Total Estimated Ridership                         1719 non-null   int64  
 4   Buses: % of Comparable Pre-Pandemic Day                  1719 non-null   float64
 5   LIRR: Total Estimated Ridership                          1719 non-null   int64  
 6   LIRR: % of Comparable Pre-Pandemic Day                   1719 non-null   float64
 7   Metro-North: Total Estimated Ridership                   1719 non-null   int64  
 8   Metro-North: % of Comparable Pre-Pandemic Day            1719 non-null   float64
 9   Access-A-Ride: Total Scheduled Trips                     1719 non-null   int64  
 10  Access-A-Ride: % of Comparable Pre-Pandemic Day          1719 non-null   float64
 11  Bridges and Tunnels: Total Traffic                       1719 non-null   int64  
 12  Bridges and Tunnels: % of Comparable Pre-Pandemic Day    1719 non-null   float64
 13  Staten Island Railway: Total Estimated Ridership         1719 non-null   int64  
 14  Staten Island Railway: % of Comparable Pre-Pandemic Day  1719 non-null   float64
dtypes: float64(7), int64(7), object(1)
memory usage: 201.6+ KB

Re-format table#

ridership["Date"] = pd.to_datetime(ridership["Date"], format="%m/%d/%Y")
buses = ridership[["Date", "Buses: Total Estimated Ridership"]]
buses
Date Buses: Total Estimated Ridership
0 2021-01-01 378288
1 2022-01-01 350845
2 2023-01-01 475226
3 2024-01-01 455965
4 2021-01-02 608686
... ... ...
1714 2023-12-30 720564
1715 2020-12-31 792993
1716 2021-12-31 699749
1717 2022-12-31 651474
1718 2023-12-31 548344

1719 rows × 2 columns

buses = buses.sort_values("Date")
buses
Date Buses: Total Estimated Ridership
237 2020-03-01 984908
242 2020-03-02 2209066
247 2020-03-03 2228608
252 2020-03-04 2177165
257 2020-03-05 2244515
... ... ...
1506 2024-11-09 917222
1511 2024-11-10 711123
1516 2024-11-11 1166004
1521 2024-11-12 1463667
1526 2024-11-13 1401832

1719 rows × 2 columns

Ridership by day#

import plotly.express as px

px.line(
    buses,
    x="Date",
    y="Buses: Total Estimated Ridership",
    title="Estimated MTA bus ridership per day",
    labels={
        "Buses: Total Estimated Ridership": "Num riders",
    },
)

Ridership by month#

ridership_by_month = buses.resample("MS", on="Date").sum()
ridership_by_month
Buses: Total Estimated Ridership
Date
2020-03-01 31871784
2020-04-01 471515
2020-05-01 410704
2020-06-01 658374
2020-07-01 828108
2020-08-01 1609110
2020-09-01 28067094
2020-10-01 30114260
2020-11-01 27049641
2020-12-01 26307109
2021-01-01 25035873
2021-02-01 22875301
2021-03-01 29700545
2021-04-01 30573447
2021-05-01 31797784
2021-06-01 32745118
2021-07-01 32814466
2021-08-01 32707261
2021-09-01 35736273
2021-10-01 38137454
2021-11-01 35751726
2021-12-01 33762618
2022-01-01 28820239
2022-02-01 30658017
2022-03-01 38160474
2022-04-01 35717326
2022-05-01 37640891
2022-06-01 36705213
2022-07-01 34610861
2022-08-01 35776375
2022-09-01 37419219
2022-10-01 38174050
2022-11-01 36040114
2022-12-01 34224045
2023-01-01 35224849
2023-02-01 33215878
2023-03-01 39859431
2023-04-01 35671477
2023-05-01 39582710
2023-06-01 36208100
2023-07-01 34367040
2023-08-01 35479692
2023-09-01 34880433
2023-10-01 36771898
2023-11-01 33290217
2023-12-01 30988074
2024-01-01 31810480
2024-02-01 31355350
2024-03-01 34122949
2024-04-01 33317334
2024-05-01 35068922
2024-06-01 32526588
2024-07-01 32503359
2024-08-01 31340557
2024-09-01 37040344
2024-10-01 40362017
2024-11-01 15705476
px.line(
    ridership_by_month,
    y="Buses: Total Estimated Ridership",
    title="Estimated MTA bus ridership per month",
    labels={
        "Buses: Total Estimated Ridership": "Num riders",
    },
)

Didn’t need to specify an x, since the date is the index.