Lab 12#

General notebook information

You will be given a bunch of instructions without (fully) understanding what they do yet. That’s ok.

Setup#

Install lxml and requests packages.

Scraping#

Common tools:

Pull Wikipedia’s list of countries by area into a DataFrame using read_html().

# your code here

FEC data#

We’ll make an API call in the browser.

  1. Visit https://www.fec.gov/data/candidates/

  2. Open Developer Tools.

  3. Reload the page.

  4. In the Network tab’s request list:

    1. Filter to Fetch/XHR/AJAX (terminology will differ by browser)

    2. Right-click the API call row.

  5. Click Open in New Tab. You will see an error.

  6. In the URL bar, replace the api_key value with DEMO_KEY. The URL should therefore contain api_key=DEMO_KEY.

You should see a big wall of JSON data.

Querying#

Retrieve candidates who have raised funds. API documentation.

# your code here

Turn those results into a DataFrame.

# your code here

Pagination#

Get all NYC film permits through the API. Documentation on paging.

Hints

You’ll probably want to create DataFrames for each page, then “concatenate” them. Here’s a structure you can start with:

# in a loop
#     get the first/next page of data
#     combine with the data that's already been retrieved
#     if there are fewer than the default number of records returned, stop the loop

GitHub#

If you miss a step, don’t worry - some are more important than others, and it’s possible to do them out of order. Ask for help if you need.

  1. Sign up.

  2. Open the folder/repository from Lecture 23 in VSCode.

  3. Click Publish Branch.

  4. Allow signing in with GitHub, if prompted.

  5. Click Publish to GitHub public repository.

    1. Public vs. private

  6. Have VSCode periodically fetch, if asked.

  7. Visit the repository on GitHub.

    1. Click into the files.

  8. Make a change in the repository in VSCode (locally).

  9. Commit the change.

  10. Push (a.k.a. “sync”) the change to GitHub.

  11. Open the repository on GitHub, which should look something like this:

    repository file list

JupyterBook#

Install#

Install jupyter-book via Anaconda. Make sure you’ve done the conda-forge step.

The install might take a while; you can continue up until building the site in the meantime.

Config#

Using VSCode (you could use JupyterLab), create a minimal _config.yml file containing the following:

title: Computing in Context - NAME
author: NAME

only_build_toc_files: true

execute:
  execute_notebooks: "off"

sphinx:
  config:
    # https://myst-parser.readthedocs.io/en/latest/syntax/cross-referencing.html#implicit-targets
    myst_heading_anchors: 4
    # https://jupyterbook.org/en/stable/interactive/interactive.html#plotly
    html_js_files:
      - https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js
    suppress_warnings: ["mystnb.unknown_mime_type"]

Replace the NAME with your name. More about YAML, which rhymes with “mammal”.

Table of contents#

  1. Move/copy the Lab 12 notebook to this folder, if it’s not there already.

  2. Create a _toc.yml containing the following:

    format: jb-book
    root: lab_12
    chapters:
      - file: project_3
    

Build the site#

Open the integrated terminal and run:

jupyter-book build --all .

This converted your notebooks to HTML.

Troubleshooting#

If you get an error like jupyter-book: command not found:

  1. Double-check you’ve done the install.

  2. Windows: Confirm you’re using Git BASH, not Command Prompt or Powershell.

  3. Run conda activate base to activate the environment.

View the site (locally)#

  1. It will output “Your book’s HTML pages are here … paste this line directly into your browser bar”.

  2. Copy that file:// URL into your web browser.

  3. You should see your notebook as a JupterBook site.

    JupyterBook

Commit changes#

  1. View the diff

  2. Ignore generated files: Create a .gitignore file containing the following:

    .DS_Store
    .ipynb_checkpoints/
    _build/
    
  3. View the diff again

  4. Commit

  5. Push

  6. The GitHub repository should then look like this:

    repository with JupyterBook files


Submit.