Project 1#
In this project, you will:
Work with a new dataset
Practice working with data in:
pandas
Steps#
Read the general Project information.
Find a dataset.
It must have:
At least one numeric column
Between one thousand and one million rows
If it’s larger than that, you can filter it down.
Don’t spend too long on this step.
If there’s more than one numeric column, pick one.
Create a new notebook.
Using pandas:
Read in the data.
Compute:
The mean
The median
The mode
Repeat the previous step using only the Python standard library, a.k.a. the hard way.
Create a data visualization, following the instructions below.
The hard way#
You may not use pandas, the statistics module, a spreadsheet program, etc.
You should be using the same dataset from the first step, but not accessing the DataFrame/Series.
In other words, if put the code for this step in a totally separate notebook, it should still work.
You should be calculating the mean, median, and mode yourself, not using functions with those names (or equivalent).
Hint: Use a dictionary to keep track of value counts.
Data visualization#
Requirements:
The data/calculations can come through pandas, but the drawing code should only use the Python standard library.
The visualization should be visual, using shape, size, symbols, etc. to represent the values. — Printing the numbers (as is) isn’t sufficient.
We’ll talk about data visualization in more detail in week 10, but none of that knowledge is expected to complete this.
Example#
Data that looks like this:
Rat sightings
Year |
Count |
---|---|
2014 |
3,162 |
2015 |
4,985 |
2016 |
4,091 |
could be turned into a sparkline that looks like this:
Rat sightings, in thousands
2014: ***
2015: *****
2016: ****
Please don’t print 3,162 asterisks (*
) 😉
Tips#
Start simple.
Start with the example above, get that working, then go from there.
Use only one or two columns of your dataset.
print()
ing strings will probably be easiest, but you can get fancy and generate HTML if you want.
Making your chart vertical (one data point per line) will probably be easier than doing something horizontal.
Techniques that may be helpful:
Python strings can contain Unicode, including emoji 📈✨
Rubric#
15% pandas steps: 5% x 3
30% Python standard library steps: 10% x 3