More cat, less trouble, some redirection

Output, look at, and redirect text

Learn |

Photo of cat sleeping on a chair in a thrift shop

Tech writers deal with lots of plain text, lines and lines, and more lines, of it. You’re in luck: text files are the command line’s specialty. But the key to successfully slicing, dicing, and otherwise processing text on the command line is to change your mindset about it. So this lesson is partly about learning a couple of command line utilities. It’s mostly about learning how to use your imagination to combine tools on the command line.

What you’ll learn

By the end of this lesson you’ll have these in your bag of skills:

You’ll learn a few techniques for outputing and looking at text with cat and less. Each technique has a purpose.

Redirection is a hugely useful way of working with files on the command line. The idea of it is something you don’t really see in a GUI. It’s such a big idea that it might take you some mental gymnastics to wrap your head around it.

Let me apologize now for the horrible regret you’ll feel after the elation of “getting” redirection. You’ll wish you could have taken advantage of it sooner in your life.

Text is everywhere

The command line world depends a lot on plain text organized into lines. It’s the fundamental data format for most of the world’s IT infrastructure.

Besides just, y’know, plain text composed of words, sentences, and paragraphs, there’s also CSV, Markdown, JSON, HTML, CSS, SVG, Javascript, and Python source code, in fact, source code for every programming language.

And the command line tools you learn here can process all of them.

Step 0: Before you start

Well, technically you’ve already started this lesson. But you should do and know a few things before you continue.

Do this: Navigate directories in the command line.

Step 1: Get the files

You’ve been asked to review lists of diseases that affect shakshouka. You’ve been given a set of CSV files, one file per type of disease.

Let’s download some example text files to work with.

Do this: In the shell, enter wget -qO- https://egopontem.com/lessons/cat-less.tgz | tar -xz.

techwriter:~$ wget -qO- https://egopontem.com/lessons/cat-less.tgz | tar -xz
techwriter:~$ 

What happened: You downloaded some files to work with and put them in a directory named cat-less. Let’s dive in to see what you downloaded.

Do this: Enter cd cat-less.

techwriter:~$ cd cat-less
techwriter:~/cat-less$ 

Do this: Enter ls.

techwriter:~$ ls
bacteria.csv  fungus.csv  insect.csv  nematode.csv
README.txt  virus.csv
techwriter:~/cat-less$ 

What happened: You’re now in the directory of files that you just downloaded and there are some CSV files and a file named README.txt. You’ll be working with the CSV files.

Step 2: Cat a file

Just what’s in these CSV files? There are a few ways to look at a file’s contents. There’s the direct route, which is just to output all of a file’s contents to the terminal.

To take this direct route we’ll use cat. The cat command does something so specific and simple that it’s almost not a command and yet you’ll use it a lot. The cat command outputs its input. That’s it, nothing else.

Let’s see which nematodes ruin our domatosalata.

Do this: Enter cat nematode.csv.

techwriter:~/cat-less$ cat nematode.csv
common,scientific
root-knot,Meloidogyne
sting,Belonolaimus longicaudatus
stubby-root,Paratrichodorus
techwriter:~/cat-less$ 

What happened: The cat command outputs the nematode.csv file, as promised.

Ok, now let’s see what all of our bruschetta buzz kills are.

Do this: Enter cat *.csv.

What happened: All of the contents of all of the files that end with .csv get dumped into the terminal. But it’s too fast and too much for you to read.

So what’s the point of using cat if it spews its output like a firehose? Turns out there’s a good reason for using cat.

The cat command is short for concatenate which means “to join together”. In other words, you can use cat to glue together all of its input into a single stream of output.

For example, imagine how you could use cat to merge chapter files into a single book file.

Step 3: Combine all the diseases

Now let’s put all diseases together into a single file. This way we can create a handy reference of everything that harms our precious pa amb tomàquet.

I bet you know where this is going. We just learned how to combine multiple input files into a single output to the terminal. What we need next is a way to capture the output from cat into a file. Let’s name this file tomeases.csv.

Let me guess what you’re thinking: “There must be a option in cat that lets me capture its output into a file instead of the terminal.” Good for you for starting to “get” the command line!

Your thinking is pretty close to how this is done. What we’re actually going to do is redirect the output from cat. Redirection isn’t unique to cat, it’s actually more universal, available for any command. Redirection is a command-line superpower.

Enough with the hype, let’s try it out.

Do this: Enter cat *.csv > tomeases.csv.

techwriter:~/cat-less$ cat *.csv > tomeases.csv
techwriter:~/cat-less$ 

What happened: You got a prompt.

Before we look at what’s in tomeases.csv, let’s take a look at what we added to our previous cat command. This addition has something new to you, a > and tomeases.csv.

In the shell, the > tomeases.csv means “redirect standard out to a file named tomeases.csv”. Let’s unpack this.

By default on the command line, output from commands goes to the terminal for you to see. Here’s the thing: commands don’t send their output to the terminal, at least not directly. What a command actually does is write its output to standard out.

Diagram of standard out

And what is standard out itself? Think of standard out by itself as just a channel that connects a command’s output to something that receives that output.

And what does standard out connect to? The terminal. By default, standard out points to the terminal when you’re working on the command line.

Diagram of standard out pointing to the terminal

For example, when you enter ls, the ls command sends its directory listing to standard out, which ends up in the Terminal app for you to see it.

The operating system provides each command with its own standard out. Every major operating system provides standard out, including Windows, macOS, and Linux. If you’re used to working only in a GUI, you’ve probably never seen standard out. But it’s always been there, waiting for you to taking advantage of it.

Here’s the amazing thing you can do with standard out. You can redirect it to something else besides the terminal. And that’s just what you’ve asked the shell to do with > tomeases.csv. Think of > as an arrow pointing standard out to a file that you specify.

Diagram of standard out redirected to a file

Step 4: Less is more for people

Now let’s see what’s in tomeases.csv. Considering that tomeases.csv probably has lots of text, and cat is bad at showing us big files, let’s use less instead.

Do this: Enter less tomeases.csv.

What happened: A screenful of text appears in the terminal. It shows the beginning of the tomeases.csv file. The less command is like cat, but for people.

The less command is meant only for the terminal. You use it to get a GUI-like experience for looking at files. It’s especially useful for looking at files that are longer than your terminal window, which cat is horrible at.

For example, cat doesn’t let you scroll around a long file like less does.

Do this: Press Page Down and Page Up to see next and previous parts of tomeases.csv.

Do this: When you’re done, press q to quit less and return to the shell prompt.

I know I that earlier I explained that commands send their output to standard out. The less command is an exception. Because it needs to give you an interactive view of a file, less actually sends its output directly to the terminal. That way it can control what you see. Besides, it doesn’t make sense to redirect output from less, it’s not meant to be used that way.

Step 5: A redirection detour

You can redirect standard in with a file, too. In other words, > has a counterpart in the shell, and that’s <.

Wait, standard in? Yep, standard in. This is standard out’s counterpart. Just like there’s a conventional place for a command to send its output, the operating system also provides a conventional place for the command to get its input.

Diagram of standard in

Standard in connects to the terminal by default.

Diagram of standard out pointing to the terminal

For example, less can also take its input from standard in rather than files.

Do this: Enter less < tomeases.csv.

What happened: The less command shows the contents of standard in, which happens to be from tomeases.csv. We redirected the input for less from the terminal to the tomeases.csv file.

Do this: Press q to quit less.

Like less, a lot of commands behave this way, giving you the choice of where to get input:

The same applies for standard out. Some commands have options to output to a specific file instead of standard out (and optionally redirecting with >).

Step 6: Pipes don’t need a redirection detour

To collect all of the horrible fates that could happen to our pico de gallo, we redirected to a file named tomeases.csv with the > symbol in the shell. And that’s great, there are plenty of uses for that.

Sometimes we need to capture output only once, temporarily, so that we can feed it as input into another command. This is what we did with these commands:

techwriter:~/cat-less$ cat *.csv > temp.csv
techwriter:~/cat-less$ less temp.csv

Instead of redirecting standard out to a file then using that file as input for another command we can pipe directly to another command.

Do this: Enter cat *.csv | less.

What happened: The less command shows the output from cat *.csv, just like it did when we used tomeases.csv as the input for less.

Do this: When you’re done, press q to quit less.

Let’s go over what we added to cat *.csv. We added a |, the vertical bar. This is redirection again, but this time we skipped the detour via a file.

In the shell world, we have a name for this redirection with |: pipe. It means “pipe standard out from the preceding command to the standard in of the proceeding command.

So instead of doing this, which takes a couple of steps:

Diagram of output and input redirection with a file

You can use a pipe in one shot, and avoid having to create a file just to handle the output and input:

Diagram of standard out redirected to a file

A pipe doesn’t create a file, it directly connects standard out and in. Not only does it save you time in the shell, it’s also pretty efficient on speed and memory too.

You did it!

Congratulations, you know how to do some pretty powerful things now.