Lecture 20: Data visualization, continued#
How was Project 1?
Start by importing necessary packages#
import pandas as pd
import plotly.express as px
Load the 311 requests per capita data from last class#
districts = pd.read_csv("https://storage.googleapis.com/python-public-policy2/data/311_community_districts.csv.zip")
districts.head()
boro_cd | Borough | CD Name | 2010 Population | num_311_requests | requests_per_capita | |
---|---|---|---|---|---|---|
0 | 112 | Manhattan | Washington Heights, Inwood | 190020 | 14110 | 0.074255 |
1 | 405 | Queens | Ridgewood, Glendale, Maspeth | 169190 | 12487 | 0.073805 |
2 | 412 | Queens | Jamaica, St. Albans, Hollis | 225919 | 12228 | 0.054126 |
3 | 301 | Brooklyn | Williamsburg, Greenpoint | 173083 | 11863 | 0.068539 |
4 | 303 | Brooklyn | Bedford Stuyvesant | 152985 | 11615 | 0.075922 |
Map complaint counts by CD#
We’ll follow this example, using community district GIS data.
Jump ahead to the map, work backwards
First, let’s take a look at the GeoJSON data. We’re looking for what we can match our boro_cd
column up to. One way to inspect it:
Open Chrome
Install JSON Viewer
Open https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson
Load the GeoJSON data using the requests package (nothing to do with 311 requests):
import requests
response = requests.get("https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson")
shapes = response.json()
print("loaded")
# intentionally not outputting the data here since it's large
loaded
This is equivalent to the use of urlopen()
and json.load()
in the Plotly examples.
Notes:
boro_cd
is the property we’re looking for. We’ll specify this as thefeatureidkey
.response.json()
turns JSON data into nested Python objects:shapes
is a dictionary,features
is a list beneath it, etc.
This code requires Plotly v2.35.0+. You may need to upgrade Plotly.
def plot_nyc(df):
"""This function makes a chloropleth map of NYC, using a DataFrame with a boro_cd and a requests_per_capita column."""
fig = px.choropleth_map(
df,
locations="boro_cd", # column name to match on
color="requests_per_capita", # column name for values
geojson=shapes,
featureidkey="properties.boro_cd", # GeoJSON property to match on
center={"lat": 40.71, "lon": -73.98},
zoom=9,
height=600,
title="Requests per capita across Community Districts",
)
fig.show()
Wrapping this Plotly code in a function to:
Save space on subsequent slides
Make the code reusable for plotting different DataFrames
plot_nyc(districts)