19.1 CSV Basics
20220112
A verb is required to let mlr know what action to undertake. There are many, but some common verbs include
verb | action |
---|---|
cat | output the supplied input |
cut -f name,pid | retain selected columns |
filter ‘\(id == "u1234567"'|retain rows that meet some conditions| |put '\)mark = round($total)’ | construct new variables |
rename ‘Patient ID’,pid | rename variables |
sort -f mark | sort rows on a particular field |
then | to construct a pipeline |
To get the column names for the csv file:
A verb is required to specify an action. Here we simply cat the file:
$ mlr cat example.csv
1=color,2=shape,3=flag,4=index,5=quantity,6=rate
1=yellow,2=triangle,3=1,4=11,5=43.6498,6=9.8870
1=red,2=square,3=1,4=15,5=79.2778,6=0.0130
1=red,2=circle,3=1,4=16,5=13.8103,6=2.9010
1=red,2=square,3=0,4=48,5=77.5542,6=7.4670
1=purple,2=triangle,3=0,4=51,5=81.2290,6=8.5910
1=red,2=square,3=0,4=64,5=77.1991,6=9.5310
1=purple,2=triangle,3=0,4=65,5=80.1405,6=5.8240
1=yellow,2=circle,3=1,4=73,5=63.9785,6=4.2370
1=yellow,2=circle,3=1,4=87,5=63.5058,6=8.3350
1=purple,2=square,3=0,4=91,5=72.3735,6=8.2430
Specify that the input is a csv file using --icsv
:
$ mlr --icsv cat example.csv
color=yellow,shape=triangle,flag=1,index=11,quantity=43.6498,rate=9.8870
color=red,shape=square,flag=1,index=15,quantity=79.2778,rate=0.0130
color=red,shape=circle,flag=1,index=16,quantity=13.8103,rate=2.9010
color=red,shape=square,flag=0,index=48,quantity=77.5542,rate=7.4670
color=purple,shape=triangle,flag=0,index=51,quantity=81.2290,rate=8.5910
color=red,shape=square,flag=0,index=64,quantity=77.1991,rate=9.5310
color=purple,shape=triangle,flag=0,index=65,quantity=80.1405,rate=5.8240
color=yellow,shape=circle,flag=1,index=73,quantity=63.9785,rate=4.2370
color=yellow,shape=circle,flag=1,index=87,quantity=63.5058,rate=8.3350
color=purple,shape=square,flag=0,index=91,quantity=72.3735,rate=8.2430
And that both input and output are csv with --csv
:
$ mlr --csv cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0