Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In this project, you will:

Steps

  1. Read the general Project information.

  2. Find a dataset.

    • It must have:

      • At least one numeric column

      • Between one thousand and one million rows

        • If it’s larger than that, you can filter it down.

        • The alumni employment data is smaller than that; that’s ok.

    • Don’t spend too long on this step.

  3. If there’s more than one numeric column, pick one.

  4. Create a new notebook.

  5. Using pandas:

    1. Read in the data.

    2. Compute the following (or -5 points each):

      • The mean

      • The median

      • The mode

  6. Repeat the previous step (or -10 points each) using only the Python standard library (or -10 points), a.k.a. the hard way.

  7. Create a data visualization, following the instructions below.

  8. Submit.

The hard way

Data visualization

Requirements:

We’ll talk about data visualization in more detail in week 10, but none of that knowledge is expected to complete this.

Example

Data that looks like this:

Rat sightings

YearCount
20143,162
20154,985
20164,091

could be turned into a sparkline that looks like this:

Rat sightings, in thousands

2014: ***
2015: *****
2016: ****

Please don’t print thousands of asterisks (*) 😉

Tips