Part 3 - Usecases
You have read part 1 and part 2 of this series and wonder whether you will be able to actually apply all those new and complicated commands for anything practical? Wonder no more: In the third and last post I compiled a list of real world use cases to nudge your inspiration.
Automation of continuous reports
Automation of continuous reports
Problem: Revenue report for team needs to be generated and emailed every week.
Solution: Use mysql, mail and a cron job
$ vim revenue_cron.sh # create bash script that cron job can call echo "select * from RevenueStats_TeamFoo;" | mysql some_database -B > ~/$(date +%y%m%d_rev_report.tsv) #pull data from mysql and call the tsv current_date_rev_report mv ~/$(date +%y%m%d_rev_report.tsv) '~/jvdh/Google Drive/rev_report/' #move report into Google Drive folder where it will be uploaded automatically echo "Hi Team, The latest revenue reporting tsv is ready for download at Google Drive. Greetings, jvdh" | mail -s "marketview report is ready" team@foobar.com" #exit vim $ crontab -e 0 3 * * 4 ~/code/revenue_cron.sh #every Thursday at 3am, assuming you saved the script in folder ~/code/revenue_cron.sh
Client Reportings
Problem: We need to quickly generate a opportunity reporting for every client in our portfolio
Solution: Use a for loop and copy the mysql output to the Google Drive folder
$ vim opp_packs.sh #!/bin/bash for i in 'client a' 'client b' 'client c' 'client ...' do client=$(echo $i | sed 's/\s//g') #drive file name must not have spaces echo $client echo "select division_name as client, customer_id, account_name, campaign_name, sum(rev_potential_usd/1.2696) as revenue_potential_eur from Rev_Opportunities where division_name = '${i}' group by 1,2;" | mysql some_database -B > '~/jvdh/Google Drive/rev_opps/${i}.csv'done$ chmod 740 opp_packs.sh #set permissions for file $ ./opp_packs.sh #execute script
Data Cleaning
Problem: We need to clean a csv (consolidated.csv) file with keywords (> 2mio lines)
csv looks like this:
kw, primary-category, secondary-category
foobar, /Business & Industrial/Business Management/Technology Consulting, /ApparelHaftbefehl, /Arts & Entertainment/Music & Audio, /Law & Government
pottery, /Hobbies & Leisure/Antiques & Collectibles/Collectible Bric-A-Brac, /Hobbies & Leisure/Antiques & Collectibles/Antique Ceramics & Pottery
...
Solution: Use a combination of sed, cut and split
$ sed /\Adult*/d consolidated.csv > clean.csv #remove adult category $ cut -f1,5 clean.csv > pre_final.csv # should be only primary category and keyword $ grep $'\t' pre_final.csv > final.csv # remove keywords without category; use $ because bash only knows \t is tab when we type “$” before argument $ sort -t$'\t' -k2 final.csv | less # sort csv
Those are only a few useful appliances of command line scripting for data analysis. Due to my current job, the examples are probably pretty reporting & automation heavy, but rest assured that there are also many applications for much deeper analytical undertakings: See some examples in the excellent blog posts of Greg Rega or Bernd Zuther.
No comments:
Post a Comment