Generally, I won’t.
But I just wanted to tell you about the cat command.
For the use case, when you want only the top/bottom n lines of your data, you will generally use the head/tail commands.
You can use them as below.
csvhead -n 3 Salaries.
csvtail -n 3 Salaries.
csvNotice the structure of the shell command here:CommandName [-arg1name] [arg1value] [-arg2name] [arg2value] filenameIt is CommandName followed by a couple of argnames and argvalues and finally, the filename.
Generally, the shell commands have a lot of arguments.
You can see the list of all the arguments a command supports by using the man command.
You can think of man as help.
wc:Count the Lineswc is a reasonably useful shell utility/command that lets us count the number of lines(-l), words(-w) or characters(-c) in a given file.
wc -l Salaries.
csvwc -w Salaries.
csvwc -c Salaries.
grep:Sometimes you might want to look for a particular line in the file.
Or you may wish to print all the lines in your file which have a specific word.
Or you might like to see the salaries for the team BAL in 2000.
grep is your friend.
In this case, we have printed all the lines in the file which contain “2000,BAL”.
grep "2000,BAL" Salaries.
csv| headYou could also use regular expressions with grep.
Piping — This makes shell usefulSince we now know the basic commands for the shell, I can now talk about one of the essential concepts of Shell usage — piping.
You won’t be able to utilize the full power the shell provides without using this concept.
And the idea is simple.
Remember how we had used the head command earlier to see top few lines of a file.
Now you could have also written the head command as below:cat Salaries.
csv | headMy Advice: Just read the “|” in the command as “pass the data on to”So I would understand the above command as:cat(print) the whole data to stream, pass the data on to head so that it can just give me the first few lines only.
So did you understood what piping did?It is providing us a way to use our basic commands consecutively.
There are a lot of commands that are relatively basic, and it lets us use these basic commands in sequence to do some reasonably non-trivial things.
Now let me tell you about a couple of more not so basic commands before I show you how we can chain them to do reasonably advanced tasks.
Some Intermediate Commands1.
sort:You may want to sort your dataset on a particular column.
sort is your friend.
Say you want to find out the top 10 maximum salaries given to any player in your dataset.
We can use sort as follows.
sort -t "," -k 5 -r -n Salaries.
csv | head -10So there are indeed a lot of options in this command.
Let’s go through them one by one.
-t: Which delimiter to use?.“,”-k: Which column to sort on?.5-n: If you want Numerical Sorting.
Don’t use this option if you wish to do Lexographical sorting.
-r: I want to sort Descending.
Sorts Ascending by Default.
And then obviously pipe — Or pass the data on to head command.
cut:This command lets you select specific columns from your data.
Sometimes you may want to look at just some of the columns in your data.
As in you may want to look only at the year, team and salary and not the other columns.
cut is the command to use.
cut -d "," -f 1,2,5 Salaries.
csv | headThe options are:-d: Which delimiter to use?.“,”-f: Which column/columns to cut?.1,2,5And then obviously pipe — Or pass the data on to head command.
uniq:uniq is a little bit tricky as in you will want to use this command in sequence with sort.
This command removes sequential duplicates.
For Example: 1,1,2 will be converted to 1,2.
So in conjunction with sort, it can be used to get the distinct values in the data.
For example, if I wanted to find out ten distinct teams in data, I would use:cat Salaries.
csv| cut -d "," -f 2 | sort | uniq | headThis command could also be used with an argument -c to count the occurrence of these distinct values.
Something akin to count distinct.
csv | cut -d "," -f 2 | sort | uniq -c | headSome Other Utility CommandsHere are some other command line tools that you could use without going in the specifics as the specifics are pretty hard.
Just bookmark this post.
Change delimiter in a file:You might sometimes need to change the delimiter in the file as a certain application might need a particular delimiter to work.
Excel needs “,” as a delimiter.
Find and Replace Magic.
: You may want to replace certain characters in the file with something else using the tr command.
Sum of a column in a file:Using the awk command, you could find the sum of a column in a file.
Divide it by the number of lines(wc -l), and you can get the mean.
awk is a powerful command which is a whole language in itself.
Do see the wiki page for awk for a lot of good use cases of awk.
Find the files in a directory that satisfy a specific condition:Sometimes you will need to find a file in a directory that contains a lot of files.
You can do this by using the find command.
Let’s say you want to find all the .
txt files in the current working dir that start with A.
To find all .
txt files starting with A or B we could use regex.
Lastly > and >>Sometimes you want your data that you got by some command line utility(Shell commands/ Python scripts) not to be shown on stdout but stored in a text file.
You can use the ”>” operator for that.
For Example, You could have stored the file after replacing the delimiters in the previous example into another file called newdata.
txt as follows:cat data.
txt | tr ',' '|' > newdata.
txtI got confused between ”|” (piping) and ”>” (to_file) operations a lot in the beginning.
One way to remember is that you should only use ”>” when you want to write something to a file.
”|” cannot be used to write to a file.
Another operation you should know about is the ”>>” operation.
It is analogous to ”>” but it appends to an existing file rather than replacing the file and writing over.
ConclusionThis is just the tip of the iceberg.
Although I am not an expert in shell usage, these commands reduced my workload to a large extent.
Try incorporating them in your workflow.
I usually end up using them in the jupyter notebook itself(you can write shell commands in a jupyter code block if you start the command with ! ).
Always remember:The mechanic that would perfect his work must first sharpen his tools.
— ConfuciusSo impress some folks now.
If you would like to know more about the command line, which I guess you would, there is The UNIX workbench course on Coursera which you can try out.
I am going to be writing more of such posts in the future too.
Let me know what you think about the series.
Follow me up at Medium or Subscribe to my blog to be informed about them.
As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz.
.. More details