David Eads @eads | NPR Visuals @nprviz
Based on an NPR Visuals blog post and a talk by Geoff Hing
NICAR 2015, ATL
Slides: http://recoveredfactory.net/cleaner-data-nicar15
Source: http://github.com/eads/data-workflow
Made with Tarbell
city_state
CCHICAGO, IL
CDHICAGO
CHHICAGO IL
CHICAGO CH
CHICAHO IL
CHCAGO IL
CHCAGO IL
CHCAGO IL
CHCAIGO IL
CHCIACO IL
CHCIAGO
...
csvstat -c statute data/Criminal_Convictions_ALLCOOK_05-09.csv
14. statute
<type 'unicode'>
Nulls: True
Unique values: 1616
5 most frequent values:
720-570/402(c):29290
720-5/19-1(a):14697
720-5/16A-3(a):13613
720-570/401(c)(2):13415
720-570/401(d)(i):10959
Max length: 27
Row count: 321590
MRAPs And Bayonets: What We Know About The Pentagon's 1033 Program
by Arezou Rezvani, Jessica Pupovac, David Eads and Tyler Fisher
#!/bin/bash
echo 'IMPORT DATA'
echo '-----------'
./import.sh
echo 'CREATE SUMMARY FILES'
echo '--------------------'
./summarize.sh
echo 'EXPORT PROCESSED DATA'
echo '---------------------'
./export.sh
# NSN parsing example
if header == 'nsn':
nsn_parts = cell_value.split('-')
row_dict['federal_supply_class'] = nsn_parts[0]
row_dict['federal_supply_group'] = nsn_parts[0][:2]
# Example import statement
echo "Import FIPS crosswalk"
psql leso -c "CREATE TABLE fips (
county varchar,
state varchar,
fips varchar
);"
psql leso -c "COPY fips FROM '`pwd`/src/fips_crosswalk.csv' DELIMITER ',' CSV HEADER;"
select
c.name,
sum(quantity * acquisition_cost) as total_cost,
extract(year from ship_date) as year
from data as d
join codes as c on d.federal_supply_category = c.code
group by c.name, year
order by year desc
Rand Paul: "Mr. Estevez, in the NPR investigation of 1033 program they list that 12,000 bayonets have been given out. What purpose are bayonets being given out for?"