Lecture 17: Manipulating data#
Questions?
How was the lab? 👍/👎
Project due dates adjusted
Will introduce Project 1 at the end
Today’s goal#
Which 311 complaints are most common?
Which agencies are responsible for handling them?
Read our cleaned 311 Service Requests dataset#
import pandas as pd
url = "https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample_clean.csv.zip"
requests = pd.read_csv(url)
/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_7524/1515339211.py:4: DtypeWarning: Columns (8,20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.
requests = pd.read_csv(url)
Data cleaning#
Data Cleansing is a process of removing or fixing incorrect, malformed, incomplete, duplicate, or corrupted data
https://hevodata.com/learn/data-cleansing-a-simplified-guide/
Where have you had to do data cleaning?
Things to check for#
From my workshop on data cleaning:
Missing data
Empty values
Bad (junk) values
Duplicates
Mismatched types/formatting
Categorical data
Unique values (cardinality)
Value counts
Continuous values
Ranges
Spread (distribution)
Notes:
“Values” in this case can be a single cell (in the spreadsheet sense) or a whole row
“Missing” or “duplicates” can be columns (Series), tables (DataFrames), rows, or cells
“Categorical data” have a fixed set of values
This isn’t everything you can check for, but should cover most things
Data cleaning mnemonic#
Empty
Bad
Unique
Spread
Dealing with dtypes#
DtypeWarning: Columns (8,20,31,34) have mixed types.
requests.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 499958 entries, 0 to 499957
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 499958 non-null int64
1 Created Date 499958 non-null object
2 Closed Date 476140 non-null object
3 Agency 499958 non-null object
4 Agency Name 499958 non-null object
5 Complaint Type 499958 non-null object
6 Descriptor 492496 non-null object
7 Location Type 392573 non-null object
8 Incident Zip 480394 non-null object
9 Incident Address 434529 non-null object
10 Street Name 434504 non-null object
11 Cross Street 1 300825 non-null object
12 Cross Street 2 299624 non-null object
13 Intersection Street 1 107377 non-null object
14 Intersection Street 2 107042 non-null object
15 Address Type 451006 non-null object
16 City 476632 non-null object
17 Landmark 32516 non-null object
18 Facility Type 134918 non-null object
19 Status 499958 non-null object
20 Due Date 171534 non-null object
21 Resolution Description 457354 non-null object
22 Resolution Action Updated Date 488788 non-null object
23 Community Board 499958 non-null object
24 BBL 407338 non-null float64
25 Borough 499958 non-null object
26 X Coordinate (State Plane) 470815 non-null float64
27 Y Coordinate (State Plane) 470815 non-null float64
28 Open Data Channel Type 499958 non-null object
29 Park Facility Name 499931 non-null object
30 Park Borough 499958 non-null object
31 Vehicle Type 37 non-null object
32 Taxi Company Borough 403 non-null object
33 Taxi Pick Up Location 4474 non-null object
34 Bridge Highway Name 696 non-null object
35 Bridge Highway Direction 765 non-null object
36 Road Ramp 759 non-null object
37 Bridge Highway Segment 1007 non-null object
38 Latitude 470815 non-null float64
39 Longitude 470815 non-null float64
40 Location 470815 non-null object
dtypes: float64(5), int64(1), object(35)
memory usage: 156.4+ MB
list(requests["Incident Zip"].unique())
['11235',
'11221',
'11693',
'11216',
'10465',
'11367',
'10459',
'11101',
'11362',
'10014',
'11234',
'11436',
'10305',
'10467',
'11208',
'10451',
'11419',
'11237',
'11220',
'10469',
'11385',
'10470',
'11694',
'10036',
nan,
'10473',
'11435',
'10040',
'10472',
'11225',
'10019',
'11434',
'11226',
'10010',
'11211',
'11421',
'10026',
'10013',
'11423',
'10002',
'10453',
'11213',
'11104',
'11249',
'11361',
'11233',
'11224',
'11374',
'10025',
'10022',
'11214',
'11209',
'11366',
'10304',
'10027',
'11378',
'11206',
'10021',
'11364',
'10065',
'10456',
'10314',
'10312',
'11212',
'11379',
'10462',
'11231',
'10460',
'11416',
'10001',
'11357',
'11413',
'11210',
'11217',
'11223',
'11417',
'11418',
'11218',
'11230',
'11207',
'11691',
'10468',
'10007',
'10310',
'10306',
'11103',
'11105',
'11433',
'11203',
'10307',
'11229',
'11372',
'10032',
'11420',
'10017',
'10301',
'11368',
'11201',
'11365',
'11422',
'10452',
'11377',
'10029',
'10003',
'10075',
'11222',
'10128',
'11415',
'11204',
'10030',
'11432',
'10308',
'11102',
'10016',
'10463',
'11412',
'10011',
'11106',
'10457',
'11375',
'11356',
'11228',
'11369',
'10458',
'11411',
'11215',
'10309',
'11358',
'11355',
'11219',
'10031',
'10303',
'11232',
'10302',
'11238',
'10005',
'11429',
'11205',
'10023',
'11373',
'10033',
'10039',
'10028',
'11363',
'11354',
'11692',
'10455',
'11370',
'10035',
'10012',
'10024',
'10009',
'10034',
'11001',
'10466',
'11427',
'11004',
'10454',
'11414',
'11360',
'10461',
'10018',
'10006',
'11236',
'11428',
'10474',
'10471',
'10037',
'10475',
'11430',
'10119',
'10038',
'11426',
'11239',
'10151',
'10120',
'10112',
'10168',
'10004',
'10464',
'18773-9640',
'10282',
'11109',
'10280',
'07114',
'11040',
'10000',
'07090',
'10020',
'00083',
'10044',
'10069',
'10118',
'07102',
'11359',
'11005',
'10169',
'11697',
'10115',
11226.0,
11236.0,
10466.0,
11210.0,
10452.0,
10457.0,
10304.0,
11416.0,
11205.0,
11214.0,
11692.0,
11101.0,
11201.0,
10470.0,
11375.0,
10305.0,
11420.0,
11421.0,
11419.0,
11412.0,
10309.0,
11385.0,
11379.0,
11356.0,
10021.0,
11417.0,
11377.0,
11423.0,
10001.0,
10469.0,
10463.0,
11372.0,
11422.0,
10017.0,
11368.0,
11213.0,
11220.0,
10312.0,
10025.0,
11357.0,
11230.0,
10456.0,
11104.0,
11232.0,
11208.0,
11206.0,
11229.0,
10465.0,
10472.0,
10039.0,
10016.0,
10009.0,
11355.0,
11211.0,
10040.0,
11366.0,
11234.0,
10013.0,
11207.0,
10032.0,
10036.0,
10029.0,
11204.0,
11435.0,
11373.0,
10007.0,
11216.0,
10306.0,
11378.0,
11414.0,
10314.0,
10128.0,
10027.0,
11221.0,
10024.0,
11364.0,
10468.0,
11233.0,
10014.0,
11238.0,
11374.0,
11105.0,
11219.0,
11106.0,
10003.0,
10026.0,
11411.0,
11223.0,
11693.0,
11365.0,
11209.0,
11103.0,
11361.0,
11427.0,
10467.0,
11203.0,
11102.0,
10455.0,
10454.0,
11225.0,
11212.0,
11434.0,
10461.0,
11369.0,
11432.0,
10031.0,
11217.0,
11235.0,
10458.0,
10453.0,
11215.0,
11367.0,
11231.0,
11413.0,
11358.0,
11694.0,
10462.0,
11218.0,
11430.0,
10473.0,
10011.0,
10460.0,
10019.0,
10310.0,
10037.0,
10451.0,
10301.0,
10010.0,
11354.0,
11222.0,
11691.0,
10034.0,
10308.0,
10012.0,
11237.0,
10801.0,
10030.0,
11428.0,
11418.0,
10033.0,
10459.0,
11429.0,
10065.0,
10022.0,
10005.0,
10002.0,
10035.0,
11370.0,
10004.0,
11415.0,
11426.0,
11362.0,
10006.0,
11360.0,
10038.0,
11249.0,
11001.0,
11224.0,
10075.0,
10018.0,
10302.0,
11436.0,
11433.0,
10474.0,
10023.0,
11004.0,
10028.0,
11228.0,
10475.0,
10069.0,
11363.0,
10303.0,
10282.0,
10307.0,
10103.0,
10280.0,
10471.0,
10271.0,
10000.0,
10464.0,
10107.0,
7078.0,
10044.0,
10020.0,
56303.0,
10119.0,
11040.0,
11386.0,
10952.0,
6811.0,
11239.0,
10121.0,
37214.0,
10538.0,
10112.0,
10279.0,
11109.0,
11371.0,
18017.0,
7115.0,
77036.0,
7114.0,
'10123',
'11030',
'10801',
'92626-1902',
'07302',
'18773',
'12222',
'10710',
'10103',
'07057',
'10162',
'11582',
'10281',
'10271',
'10107',
'HARRISBURG',
'11735',
'07305',
'N5X3A6',
'11746',
'11371',
'23450',
11580.0,
10591.0,
11005.0,
19034.0,
11596.0,
11779.0,
7621.0,
11021.0,
11241.0,
10169.0,
100000.0,
11242.0,
10151.0,
11697.0,
7086.0,
10177.0,
10118.0,
10105.0,
10152.0,
10168.0,
7093.0,
10917.0,
10110.0,
10153.0,
10178.0,
11570.0,
10601.0,
10704.0,
7424.0,
10281.0,
10158.0,
'10158',
'10172',
'10179',
'IDK',
'11801',
'10601',
'11590',
'10155',
'11202',
'1801',
'11581',
11359.0,
11758.0,
10278.0,
43017.0,
10154.0,
11553.0,
10162.0,
11695.0,
10041.0,
11741.0,
98335.0,
14814.0,
10111.0,
'12345',
'11572',
'11520',
'14614-195',
'10121',
'10105',
'10701',
0.0,
17106.0,
979113.0,
10120.0,
12345.0,
11030.0,
11797.0,
100.0,
11710.0,
33624.0,
8682.0,
11747.0,
1757.0,
11561.0,
7304.0,
6851.0,
11590.0,
94267.0,
10167.0,
11749.0,
11756.0,
10174.0,
10550.0,
89119.0,
14068.0,
11722.0,
11520.0,
6460.0,
32255.0,
10173.0,
10165.0,
11946.0,
'29616-0759',
'07032',
'10278',
'11575',
11963.0,
10106.0,
83.0,
11566.0,
6870.0,
7001.0,
10710.0,
11735.0,
11572.0,
'10165',
'10279',
'11251',
'NJ 07114',
'10106',
'07666',
'11516',
'10177',
'10170',
'43215-1441',
'00000',
'08081',
'10803',
11507.0,
11701.0,
11563.0,
7047.0,
3108.0,
'11021',
'07003',
'10152',
'07029',
'10041',
'31093',
'11735-0230',
89118.0,
11803.0,
11559.0,
11565.0,
7080.0,
12601.0,
10155.0,
10171.0,
7208.0,
11757.0,
11042.0,
'1101',
'10111',
'10173',
'10096',
'07087',
'DID N',
'10956',
10048.0,
10123.0,
10122.0,
11251.0]
ZIP codes look numeric, but aren’t really.
Read the ZIP codes in as strings.
requests = pd.read_csv(url, dtype={"Incident Zip": "string"})
/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_7524/3312645178.py:1: DtypeWarning: Columns (20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.
requests = pd.read_csv(url, dtype={"Incident Zip": "string"})
We fixed the dtype warning for column 8 (Incident Zip
).
list(requests["Incident Zip"].unique())
['11235',
'11221',
'11693',
'11216',
'10465',
'11367',
'10459',
'11101',
'11362',
'10014',
'11234',
'11436',
'10305',
'10467',
'11208',
'10451',
'11419',
'11237',
'11220',
'10469',
'11385',
'10470',
'11694',
'10036',
<NA>,
'10473',
'11435',
'10040',
'10472',
'11225',
'10019',
'11434',
'11226',
'10010',
'11211',
'11421',
'10026',
'10013',
'11423',
'10002',
'10453',
'11213',
'11104',
'11249',
'11361',
'11233',
'11224',
'11374',
'10025',
'10022',
'11214',
'11209',
'11366',
'10304',
'10027',
'11378',
'11206',
'10021',
'11364',
'10065',
'10456',
'10314',
'10312',
'11212',
'11379',
'10462',
'11231',
'10460',
'11416',
'10001',
'11357',
'11413',
'11210',
'11217',
'11223',
'11417',
'11418',
'11218',
'11230',
'11207',
'11691',
'10468',
'10007',
'10310',
'10306',
'11103',
'11105',
'11433',
'11203',
'10307',
'11229',
'11372',
'10032',
'11420',
'10017',
'10301',
'11368',
'11201',
'11365',
'11422',
'10452',
'11377',
'10029',
'10003',
'10075',
'11222',
'10128',
'11415',
'11204',
'10030',
'11432',
'10308',
'11102',
'10016',
'10463',
'11412',
'10011',
'11106',
'10457',
'11375',
'11356',
'11228',
'11369',
'10458',
'11411',
'11215',
'10309',
'11358',
'11355',
'11219',
'10031',
'10303',
'11232',
'10302',
'11238',
'10005',
'11429',
'11205',
'10023',
'11373',
'10033',
'10039',
'10028',
'11363',
'11354',
'11692',
'10455',
'11370',
'10035',
'10012',
'10024',
'10009',
'10034',
'11001',
'10466',
'11427',
'11004',
'10454',
'11414',
'11360',
'10461',
'10018',
'10006',
'11236',
'11428',
'10474',
'10471',
'10037',
'10475',
'11430',
'10119',
'10038',
'11426',
'11239',
'10151',
'10120',
'10112',
'10168',
'10004',
'10464',
'18773-9640',
'10282',
'11109',
'10280',
'07114',
'11040',
'10000',
'07090',
'10020',
'00083',
'10044',
'10069',
'10118',
'07102',
'11359',
'11005',
'10169',
'11697',
'10115',
'10801',
'10103',
'10271',
'10107',
'07078',
'56303',
'11386',
'10952',
'06811',
'10121',
'37214',
'10538',
'10279',
'11371',
'18017',
'07115',
'77036',
'10123',
'11030',
'92626-1902',
'07302',
'18773',
'12222',
'10710',
'07057',
'10162',
'11582',
'10281',
'HARRISBURG',
'11735',
'07305',
'N5X3A6',
'11746',
'23450',
'11580',
'10591',
'19034',
'11596',
'11779',
'07621',
'11021',
'11241',
'100000',
'11242',
'07086',
'10177',
'10105',
'10152',
'07093',
'10917',
'10110',
'10153',
'10178',
'11570',
'10601',
'10704',
'07424',
'10158',
'10172',
'10179',
'IDK',
'11801',
'11590',
'10155',
'11202',
'1801',
'11581',
'11758',
'10278',
'43017',
'10154',
'11553',
'11695',
'10041',
'11741',
'98335',
'14814',
'10111',
'12345',
'11572',
'11520',
'14614-195',
'10701',
'00000',
'17106',
'979113',
'11797',
'100',
'11710',
'33624',
'8682',
'11747',
'01757',
'11561',
'07304',
'000000',
'06851',
'94267',
'10167',
'11749',
'11756',
'10174',
'10550',
'89119',
'14068',
'11722',
'06460',
'32255',
'10173',
'10165',
'11946',
'29616-0759',
'07032',
'11575',
'11963',
'10106',
'11566',
'06870',
'07001',
'11251',
'NJ 07114',
'07666',
'11516',
'10170',
'43215-1441',
'08081',
'10803',
'11507',
'11701',
'11563',
'07047',
'03108',
'07003',
'07029',
'31093',
'11735-0230',
'89118',
'11803',
'11559',
'11565',
'07080',
'12601',
'10171',
'07208',
'11757',
'11042',
'1101',
'10096',
'07087',
'DID N',
'10956',
'10048',
'10122']
Find invalid ZIP codes#
Use a regular expression (regex) to find strings that match a pattern:
^\d{5}(?:-\d{4})?$
│ │ │ │ │└─ end of string
│ │ │ │ └─ optional
│ │ │ └─ capture group
│ │ └─ count
│ └─ numeric/digit character
└─ start of string
regex101 is useful for testing them.
# find valid ZIP codes
valid_zips = requests["Incident Zip"].str.contains(r"^\d{5}(?:-\d{4})?$")
# filter the DataFrame to only invalid ZIP codes
invalid_zips = ~valid_zips
requests_with_invalid_zips = requests[invalid_zips]
requests_with_invalid_zips["Incident Zip"]
55017 HARRISBURG
58100 N5X3A6
80798 100000
120304 IDK
123304 1801
173518 14614-195
192034 979113
201463 100
207158 8682
216745 000000
325071 NJ 07114
425985 1101
441166 DID N
Name: Incident Zip, dtype: string
Clear any invalid ZIP codes:
requests.loc[invalid_zips, "Incident Zip"] = None
Additonal data cleaning tips:
Hard part is finding what needs to be done
Will be specific to your use case
Document what you did, since it will affect your results
Which 311 complaints are most common and which agencies are responsible for handling them?#
Which complaints are the most common?#
requests.groupby("Complaint Type").size().to_frame(name="count")
count | |
---|---|
Complaint Type | |
APPLIANCE | 2539 |
Abandoned Vehicle | 1655 |
Advocate - Other | 26 |
Advocate-Co-opCondo Abatement | 4 |
Advocate-Personal Exemptions | 24 |
... | ... |
WATER LEAK | 6641 |
Water Conservation | 853 |
Water Quality | 332 |
Water System | 12949 |
Window Guard | 6 |
213 rows × 1 columns
.to_frame(name='count')
isn’t necessary, but it’s helpful to include because it allows us to name the new column that contains the count of rows.
Which agencies are responsible for handling these complaint categories?#
requests.groupby(["Agency", "Complaint Type"]).size().to_frame(name="count")
count | ||
---|---|---|
Agency | Complaint Type | |
ACS | Forms | 56 |
COIB | Forms | 1 |
DCA | Consumer Complaint | 2892 |
DCA / DOH New License Application Request | 186 | |
DCAS | Comments | 13 |
... | ... | ... |
TLC | Lost Property | 472 |
Taxi Complaint | 2416 | |
Taxi Compliment | 41 | |
Taxi Licensee Complaint | 5 | |
Taxi Report | 318 |
221 rows × 1 columns
Which agencies receive the most total 311 requests?#
requests.groupby("Agency").size().to_frame(name="count")
count | |
---|---|
Agency | |
ACS | 56 |
COIB | 1 |
DCA | 3078 |
DCAS | 149 |
DCP | 4 |
DEP | 36283 |
DFTA | 1182 |
DHS | 3670 |
DOB | 26024 |
DOE | 442 |
DOF | 7223 |
DOHMH | 12690 |
DOITT | 115 |
DOT | 52001 |
DPR | 21086 |
DSNY | 73227 |
EDC | 303 |
HPD | 105634 |
HRA | 1649 |
NYCEM | 25 |
NYPD | 148884 |
TAT | 1 |
TAX | 37 |
TLC | 6194 |
What is the most frequent request per agency?#
First, create a dataframe that contains the count of complaints per Agency
per Complaint Type
.
agency_counts = requests.groupby(["Agency", "Complaint Type"]).size().reset_index(name="count")
agency_counts.head(20)
Agency | Complaint Type | count | |
---|---|---|---|
0 | ACS | Forms | 56 |
1 | COIB | Forms | 1 |
2 | DCA | Consumer Complaint | 2892 |
3 | DCA | DCA / DOH New License Application Request | 186 |
4 | DCAS | Comments | 13 |
5 | DCAS | Question | 136 |
6 | DCP | Research Questions | 4 |
7 | DEP | Air Quality | 1457 |
8 | DEP | Asbestos | 316 |
9 | DEP | FATF | 23 |
10 | DEP | Hazardous Materials | 549 |
11 | DEP | Industrial Waste | 263 |
12 | DEP | Lead | 1825 |
13 | DEP | Noise | 10937 |
14 | DEP | Plant | 22 |
15 | DEP | Sewer | 6757 |
16 | DEP | Water Conservation | 853 |
17 | DEP | Water Quality | 332 |
18 | DEP | Water System | 12949 |
19 | DFTA | Alzheimer's Care | 25 |
Use drop_duplicates()
to keep the row with the highest value per Agency
.
sorted_agency_counts = agency_counts.sort_values("count", ascending=False)
sorted_agency_counts
Agency | Complaint Type | count | |
---|---|---|---|
198 | NYPD | Noise - Residential | 41311 |
169 | HPD | HEAT/HOT WATER | 39095 |
194 | NYPD | Illegal Parking | 34297 |
154 | DSNY | Request Large Bulky Item Collection | 30939 |
186 | NYPD | Blocked Driveway | 25530 |
... | ... | ... | ... |
119 | DOT | Highway Sign - Dangling | 3 |
94 | DOHMH | Radioactive Material | 3 |
139 | DPR | Unsanitary Condition | 1 |
207 | TAT | Question | 1 |
1 | COIB | Forms | 1 |
221 rows × 3 columns
top_agency_counts = sorted_agency_counts.drop_duplicates("Agency")
top_agency_counts
Agency | Complaint Type | count | |
---|---|---|---|
198 | NYPD | Noise - Residential | 41311 |
169 | HPD | HEAT/HOT WATER | 39095 |
154 | DSNY | Request Large Bulky Item Collection | 30939 |
124 | DOT | Street Condition | 16895 |
18 | DEP | Water System | 12949 |
42 | DOB | General Construction/Plumbing | 9537 |
95 | DOHMH | Rodent | 6253 |
132 | DPR | Damaged Tree | 6065 |
29 | DHS | Homeless Person Assistance | 3670 |
2 | DCA | Consumer Complaint | 2892 |
212 | TLC | For Hire Vehicle Complaint | 2789 |
67 | DOF | DOF Property - Reduction Issue | 1846 |
180 | HRA | Benefit Card Replacement | 1649 |
25 | DFTA | Housing - Low Income Senior | 768 |
53 | DOE | School Maintenance | 442 |
161 | EDC | Noise - Helicopter | 303 |
5 | DCAS | Question | 136 |
106 | DOITT | Public Payphone Complaint | 75 |
0 | ACS | Forms | 56 |
209 | TAX | Question | 34 |
181 | NYCEM | OEM Literature Request | 25 |
6 | DCP | Research Questions | 4 |
207 | TAT | Question | 1 |
1 | COIB | Forms | 1 |
For nicer display, order alphabetically:
top_agency_counts.sort_values("Agency")
Agency | Complaint Type | count | |
---|---|---|---|
0 | ACS | Forms | 56 |
1 | COIB | Forms | 1 |
2 | DCA | Consumer Complaint | 2892 |
5 | DCAS | Question | 136 |
6 | DCP | Research Questions | 4 |
18 | DEP | Water System | 12949 |
25 | DFTA | Housing - Low Income Senior | 768 |
29 | DHS | Homeless Person Assistance | 3670 |
42 | DOB | General Construction/Plumbing | 9537 |
53 | DOE | School Maintenance | 442 |
67 | DOF | DOF Property - Reduction Issue | 1846 |
95 | DOHMH | Rodent | 6253 |
106 | DOITT | Public Payphone Complaint | 75 |
124 | DOT | Street Condition | 16895 |
132 | DPR | Damaged Tree | 6065 |
154 | DSNY | Request Large Bulky Item Collection | 30939 |
161 | EDC | Noise - Helicopter | 303 |
169 | HPD | HEAT/HOT WATER | 39095 |
180 | HRA | Benefit Card Replacement | 1649 |
181 | NYCEM | OEM Literature Request | 25 |
198 | NYPD | Noise - Residential | 41311 |
207 | TAT | Question | 1 |
209 | TAX | Question | 34 |
212 | TLC | For Hire Vehicle Complaint | 2789 |
Another way, only sorting it once:
agency_counts.sort_values(["Agency", "count"], ascending=False)
Agency | Complaint Type | count | |
---|---|---|---|
212 | TLC | For Hire Vehicle Complaint | 2789 |
217 | TLC | Taxi Complaint | 2416 |
216 | TLC | Lost Property | 472 |
220 | TLC | Taxi Report | 318 |
213 | TLC | For Hire Vehicle Report | 112 |
... | ... | ... | ... |
4 | DCAS | Comments | 13 |
2 | DCA | Consumer Complaint | 2892 |
3 | DCA | DCA / DOH New License Application Request | 186 |
1 | COIB | Forms | 1 |
0 | ACS | Forms | 56 |
221 rows × 3 columns
Projects#
In real/ideal world, start with specific question and find data to answer it:
Source: Big Data and Social Science
Data needed often doesn’t exist or is hard (or impossible) to find/access
This will apply to all our Projects.