Lecture 15: Introduction to the Policy context#

Aidan Feldman

Computing in Context (SIPA)

Structure for today#

  1. Intro

  2. Going over course info like the syllabus, tools, etc.

  3. Rewind: Programming languages, data, and Jupyter

About me#

  • Coding since 2005 🖥

  • Government since 2014 🦅

  • Teaching since 2011 🎓

  • Also a modern dancer 💃 and cyclist 🚲

Day jobs#

Currently freelancing with the Colorado Behavioral Health Administration. In the past, have worked for…

Government#

Non-profits#

Tech companies#

Intros#

  • Name

  • Pronouns

  • Why you’re taking this class / what you want to do with it

    • The more specific, the better.

Access the course site#

computing-in-context.afeld.me

You can also get there through CourseWorks.

Class structure#

Class materials walkthrough#

New context-specific stuff:

Disclaimers#

Me#

  • Here to teach you to:

    • Do a lot with just a little code

    • Troubleshoot

    • Google stuff

  • Not a statistician

You#

  • Are not going to understand everything the first time

  • Will want to throw your computer out a window at one or many points in the class

    • Celebrate the little victories

  • Will get out of it what you put into it

Politics/protests/war#

⏪ Restart#

Spreadsheets vs. programming languages#

What do you like about spreadsheets?

Why spreadsheets#

  • The easy stuff is easy

  • Lots of people know how to use them

  • Mostly just have to point, click, and scroll

  • Data and logic live together as one

Why programming languages#

  • Data and logic don’t live together

    • Why might this matter?

  • More powerful, flexible, and expressive than spreadsheet formulas; don’t have to cram into a single line

    =SUM(INDEX(C3:E9,MATCH(B13,C3:C9,0),MATCH(B14,C3:E3,0)))
    
  • Better at working with large data

    • Google Sheets and Excel have hard limits at 1-5 million rows, but get slow long before that

  • Reusable code (packages)

  • Automation

Side-by-side1#

Task

Spreadsheets

Programming Languages

Loading data

Easy

Medium

Viewing data

Easy

Medium

Filtering data

Easy

Medium

Manipulating data

Medium

Medium

Joining data

Hard

Medium

Complicated transforms

Impossible2

Medium

Automation

Impossible2

Medium

Making reusable

Limited3

Medium

Large datasets

Impossible

Hard

  1. These ratings are obviously subjective

  2. Not including scripting, including Excel’s new Python+pandas support

  3. Google Sheets supports named functions

Python vs. other languages#

  • Good for general-purpose and data stuff

  • Widely used in both industry and academia

  • Relatively easy to learn

  • Open source

Python logo

Where to Python#

Pyton can be run in:

Each can be on your computer (“local”), or in the cloud somewhere. All call python under the hood, more or less.

Packages#

  • a.k.a. “libraries”

  • Developers have create them to make code/functionality reusable and easily sharable

  • Software plugins that you import

  • Main packages we’ll use:

    • pandas

    • plotly

A module is a file containing Python definitions and statements.

https://docs.python.org/3/tutorial/modules.html

Your code, part of the standard library, or part of a package.

Pandas#

Review from Lab 7

  • A Python package (bundled up code that you can reuse)

  • Very common for data science in Python

  • A lot like R

    • Both organize around “data frames”

Jupyter#

  • Web based programming environment

  • Supports Python by default, and other languages with added kernels

  • Nicely displays output of your code so you can check and share the results

  • Avoids using the command line

We’ll be using JupyterLab through the Anaconda Distribution.

Command line vs. Jupyter#

Command line vs. Jupyter output

Jupyter basics#

A “cell” can be either code or Markdown (text). Raw Markdown looks like this:

## A heading

Plain text

[A link](https://somewhere.com)

Running#

  • You “run” a cell by either:

    • Pressing the ▶️ button

    • Pressing Control+Enter on your keyboard

  • Cells don’t run unless you tell them to, in the order you do so

    • Generally, you want to do so from the top every time you open a notebook

Output#

  • The last thing in a code cell is what gets displayed when it’s run

  • The output gets saved as part of the notebook

  • Just because there’s existing output from a cell, doesn’t mean that cell has been run during this session

Some pandas/Jupyter best practices#

  • Make variable names descriptive

    • Ignore that all examples use requests

  • Only do one thing per line

    • Makes troubleshooting easier

  • Make notebooks idempotent

    • Makes your work reproducible

    • Use Restart and run all (⏩ button in toolbar)