Lecture 20: Data visualization, continued#

How was Project 1?

Start by importing necessary packages#

import pandas as pd
import plotly.express as px

Load the 311 requests per capita data from last class#

districts = pd.read_csv("https://storage.googleapis.com/python-public-policy2/data/311_community_districts.csv.zip")
districts.head()
boro_cd Borough CD Name 2010 Population num_311_requests requests_per_capita
0 112 Manhattan Washington Heights, Inwood 190020 14110 0.074255
1 405 Queens Ridgewood, Glendale, Maspeth 169190 12487 0.073805
2 412 Queens Jamaica, St. Albans, Hollis 225919 12228 0.054126
3 301 Brooklyn Williamsburg, Greenpoint 173083 11863 0.068539
4 303 Brooklyn Bedford Stuyvesant 152985 11615 0.075922

Map complaint counts by CD#

We’ll follow this example, using community district GIS data.

Jump ahead to the map, work backwards

First, let’s take a look at the GeoJSON data. We’re looking for what we can match our boro_cd column up to. One way to inspect it:

  1. Open Chrome

  2. Install JSON Viewer

  3. Open https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson

Load the GeoJSON data using the requests package (nothing to do with 311 requests):

import requests

response = requests.get("https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson")
shapes = response.json()
print("loaded")

# intentionally not outputting the data here since it's large
loaded

This is equivalent to the use of urlopen() and json.load() in the Plotly examples.

Notes:

  • boro_cd is the property we’re looking for. We’ll specify this as the featureidkey.

  • response.json() turns JSON data into nested Python objects: shapes is a dictionary, features is a list beneath it, etc.

This code requires Plotly v2.35.0+. You may need to upgrade Plotly.

def plot_nyc(df):
    """This function makes a chloropleth map of NYC, using a DataFrame with a boro_cd and a requests_per_capita column."""

    fig = px.choropleth_map(
        df,
        locations="boro_cd",  # column name to match on
        color="requests_per_capita",  # column name for values
        geojson=shapes,
        featureidkey="properties.boro_cd",  # GeoJSON property to match on
        center={"lat": 40.71, "lon": -73.98},
        zoom=9,
        height=600,
        title="Requests per capita across Community Districts",
    )

    fig.show()

Wrapping this Plotly code in a function to:

  • Save space on subsequent slides

  • Make the code reusable for plotting different DataFrames

plot_nyc(districts)