Lab 12#
You will be given a bunch of instructions without (fully) understanding what they do yet. That’s ok.
Setup#
Scraping#
Common tools:
Pull Wikipedia’s list of countries by area into a DataFrame using read_html()
.
# your code here
FEC data#
We’ll make an API call in the browser.
Reload the page.
In the Network tab’s request list:
Filter to Fetch/XHR/AJAX (terminology will differ by browser)
Right-click the API call row.
Click
Open in New Tab
. You will see an error.In the URL bar, replace the
api_key
value withDEMO_KEY
. The URL should therefore containapi_key=DEMO_KEY
.
You should see a big wall of JSON data.
Querying#
Retrieve candidates who have raised funds. API documentation.
# your code here
Turn those results into a DataFrame.
# your code here
Pagination#
Get all NYC film permits through the API. Documentation on paging.
Hints
You’ll probably want to create DataFrames for each page, then “concatenate” them. Here’s a structure you can start with:
# in a loop
# get the first/next page of data
# combine with the data that's already been retrieved
# if there are fewer than the default number of records returned, stop the loop
GitHub#
If you miss a step, don’t worry - some are more important than others, and it’s possible to do them out of order. Ask for help if you need.
Open the folder/repository from Lecture 23 in VSCode.
Click
Publish Branch
.Allow signing in with GitHub, if prompted.
Click
Publish to GitHub public repository
.Have VSCode periodically fetch, if asked.
Visit the repository on GitHub.
Click into the files.
Make a change in the repository in VSCode (locally).
Commit the change.
Push (a.k.a. “sync”) the change to GitHub.
Open the repository on GitHub, which should look something like this:
JupyterBook#
Used to build the course site
Setting up for Project 3, which we’ll kick off in Lecture 24
From here on, we recommend following these instructions on the web rather than through a downloaded notebook, which:
Makes copying-and-pasting easier
Ensures you’re seeing the latest version of the instructions
Install#
Install jupyter-book
via Anaconda. Make sure you’ve done the conda-forge
step.
The install might take a while; you can continue up until building the site in the meantime.
Config#
Using VSCode (you could use JupyterLab), create a minimal _config.yml
file containing the following:
title: Computing in Context - NAME
author: NAME
only_build_toc_files: true
execute:
execute_notebooks: "off"
sphinx:
config:
# https://myst-parser.readthedocs.io/en/latest/syntax/cross-referencing.html#implicit-targets
myst_heading_anchors: 4
# https://jupyterbook.org/en/stable/interactive/interactive.html#plotly
html_js_files:
- https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js
suppress_warnings: ["mystnb.unknown_mime_type"]
Replace the NAME
with your name. More about YAML, which rhymes with “mammal”.
Table of contents#
Move/copy the Lab 12 notebook to this folder, if it’s not there already.
Create a
_toc.yml
containing the following:format: jb-book root: lab_12 chapters: - file: project_3
Build the site#
Open the integrated terminal and run:
jupyter-book build --all .
This converted your notebooks to HTML.
Troubleshooting#
If you get an error like jupyter-book: command not found
:
Double-check you’ve done the install.
Windows: Confirm you’re using Git BASH, not Command Prompt or Powershell.
Run
conda activate base
to activate the environment.
View the site (locally)#
It will output “Your book’s HTML pages are here … paste this line directly into your browser bar”.
Copy that
file://
URL into your web browser.You should see your notebook as a JupterBook site.
Commit changes#
Ignore generated files: Create a
.gitignore
file containing the following:.DS_Store .ipynb_checkpoints/ _build/
View the diff again
Commit
Push
The GitHub repository should then look like this: