Unix Pipeline

This exercise teaches you how to use Unix command pipelines and stream redirections to analyze and manipulate text files efficiently.

🛠️

You can do this exercise locally (in the macOS Terminal or the WSL on Windows) or on your cloud server through SSH. Both will work.

On this page

Legend

Parts of this exercise are annotated with the following icons:

A task you MUST perform to complete the exercise
An optional step that you may perform to make sure that everything is working correctly, or to set up additional tools that are not required but can help you
The end of the exercise
The architecture of the software you ran or deployed during this exercise.
Troubleshooting tips: how to fix common problems you might encounter

Setup

Download the exercise file to your computer with the following command:

$> curl -L https://git.io/fAjRa > rainbow.txt

Display the file:

$> cat rainbow.txt
Somewhere over the rainbow
...

The exercise

Use command pipelines and stream redirections to:

Count the number of lines and characters in the text

Solution

$> cat rainbow.txt | wc -l
50
$> cat rainbow.txt | wc -m
1284

Counting the number of characters excluding new lines:

$> cat rainbow.txt | fold -w 1 | wc -l
1242

Print the lines of the text containing the word rainbow

Solution

$> cat rainbow.txt | grep rainbow
Somewhere over the rainbow
Somewhere over the rainbow
Somewhere over the rainbow
The colors of the rainbow so pretty in the sky
Oh, somewhere over the rainbow

Note that this looks for occurrences of the work “rainbow” exactly like this, in lowercase. If you wanted to make a case-insensitive search, you would use the grep command’s -i or --ignore-case option.

Do the same but without any duplicates

Solution

$> cat rainbow.txt | grep rainbow | sort | uniq
Oh, somewhere over the rainbow
Somewhere over the rainbow
The colors of the rainbow so pretty in the sky

Print the second word of each line in the text

Solution

$> cat rainbow.txt | cut -d ' ' -f 2
over
up
the
in

over
fly
...

Compress the text and save it to rainbow.txt.gz

Solution

$> cat rainbow.txt | gzip -c > rainbow.txt.gz

Count the number of times the letter e is used (case-insensitive)

Solution

$> cat rainbow.txt | fold -w 1 | grep -i e | wc -l
131

Count the number of times the word the is used (case-insensitive)

Solution

$> cat rainbow.txt | \
     tr '[:upper:]' '[:lower:]' | \
     tr -s '[[:punct:][:space:]]' '\n' | \
     grep -i '^the$' | \
     wc -l

Instead of using a regular expression (^the$) with grep, you could also use its -w (word regexp) option which does the same thing in this case: grep -i -w the.

Challenge

Answer the question: what are the five most used words in the text (case-insensitive) and how many times are they used?

Solution

cat rainbow.txt | \
  tr '[:upper:]' '[:lower:]' | \
  tr -s '[[:punct:][:space:]]' '\n' | \
  sort | \
  uniq -c | \
  sort -r | \
  head -n 5

By luck, simply sorting alphabetically works because the numbers are correctly aligned. But if you want a more robust solution, you can add the -b or --ignore-leading-blanks option and the -n or --numeric-sort option to the sort command: sort -bnr.

Example

For example, the following command counts the number of words in the text:

$> cat rainbow.txt | wc -w
255

Your tools

Here are a few commands you might find useful for the exercise. They all operate on the data received from their standard input stream, and print the result on their standard output stream, so they can be piped into each other:

Command	Description
`cut -d ' ' -f <n>`	Select word in column `<n>` of each line (using one space as the delimiter)
`fold -w 1`	Print one character by line
`grep [-i] <letterOrWord>`	Select only lines that contain a given letter or word, e.g. `grep foo` (`-i` to ignore case)
`grep "^<text>$"`	Select only lines that contain this exact text (e.g. `grep "^foo$"`)
`gzip -c`	Compress data
`sort [-bnr]`	Sort lines alphabetically (`-b` to ignore leading blanks, `-n` to sort numerically, `-r` to reverse the order)
`tr '[:upper:]' '[:lower:]'`	Convert all uppercase characters to lowercase
`tr -s '[[:punct:][:space:]]' '\n'`	Split by word
`uniq [-c]`	Filter out repeated lines (`-c` also counts them)
`wc [-l] [-w] [-m]`	Count lines, words or characters

Tip

Remember that if you want to know more about any of these commands or their options, all you have to do is type man <command>, i.e. man cut.

What have I done?

You have seen that text can be passed through several programs and transformed at each step to obtain the final result you want.

In essence, you have constructed complex programs by piping simpler programs together, combining them into a more powerful whole. You have applied the Unix philosophy.

ArchiDep

Architecture & Deployment

Unix Pipeline

Legend

Setup

The exercise

Example

Your tools

What have I done?