Spreadsheets are terrific - we've all used them. I particularly like Google spreadsheets - I use them all the time to collect data, usually from students.
- Go to Google Drive
- Make a form
- Send the form out to the students
All the data gets dumped into a Google spreadsheet. The trouble is, what to do with it once it's in the spreadsheet.
The other day, I wrote on a few basic stats for our upcoming SHIP program. The data I reported on was all collected in a spreadsheet. I also collected participant and parent emails in the spreadsheet.
So, here's the task, compute some simple numbers form the spreadsheet and also extract and use the email addresses.
I'm sure one could use some fancy spreadsheet magic to get the job done, but I'm a command line wonk – here's how I take care of tasks like these.
First, I downloaded the spreadsheet as a csv (comma separated value) file. Each line looked something like this:
First question, how many applicants did we have:
cat cstuy.csv | wc -l
The |, or pipe means take the output of the first command and send it to the next one. Cat just outputs the original file and wc -l counts all the lines in the file.
Next, how many young ladies:
cat cstuy.csv | grep female | wc -l
How many schools? Well, that's a little trickier. Here, I use a few extra commands:
- cut - this will cut out one column from the csv file - in this case the school column (the -d, says use a comma as delimiter and -f7 for field 7).
- sort - takes the lines and sorts them.
- uniq - eliminates duplicate lines in a sorted file
Putting it all together:
cat cstuy.csv | cut -d, -f7 | sort | uniq | wc -l
Thirty different schools.
Finally I needed the emails - here I wanted to be able to paste them into Gmail's bcc field. I could have just used cat and cut and then used the mouse, but instead:
cat cstuy.csv | cut -d, -f3 | xclip -sel clip
Then I can just do a Ctrl-v in Gmail and I'm good to go.
The cool thing is that the tools here - cut, sort, uniq, grep - are all general purpose tools that do simple text manipulations. Once you know them and a few others, you can really quickly and efficiently do all sorts of data processing without even thinking about it. I still go to the spreadsheet for data collection ad also for when I need more hardcore formulas but for day to day manipulations, I'll take the command line.