Tech writers can code too

You can solve a lot of problems with a few skills

Staying in touch. Kyoto, Japan. August 2023.

Tech writers sometimes need to get their hands dirty. Sometimes they want to but can’t. How would you a convert a list of regional data centres in CSV to an unordered list in markdown? CSV and markdown are both text formats, so you could use an editor to search and replace and copy and paste. That’s especially tedious if you have to do all that work again when the CSV file changes.

Or you could ask a developer to rig up a script for you. You’d have to bribe the developer’s manager for it and wait a few weeks. But then you could run the script yourself whenever you need to.

Or you could write your own damned script. It’s not that hard and once you learn some basics you’ll be surprised how useful your new skills are in other situations.

You’ll need a command line

First, you’ll need access to a computer that has a command line and related tools.

You get a command line by running a terminal application. For example, on macOS and Windows it’s called, well, Terminal. On Linux and Unix-y systems, run whatever terminal thing it has.

Do this: Install Tech Writer Tools.

Tech Writer Tools comes with a bunch of tools. Here are the ones you’ll be using:

The Tech Writer Tools sandbox

For this lesson, don’t worry about breaking things because there’s nothing to break. By default, Tech Writer Tools acts like a sandbox. Anything you do in it doesn’t affect the rest of your computer.

Be careful, though. This also means that whatever you do in Tech Writer Tools gets wiped clean when you stop Docker Desktop.

There’s an easy way to save your work automatically, but we’ll keep Tech Writer Tools as a sandbox for now.

Step 1: Make a directory for your project

Do this: Start Tech Writer Tools.

When the command line is ready for your command, it shows a prompt. The shell shows a prompt to let you know that it’s ready for your next command.

The prompt ends with a dollar sign, $. When you’re inside Tech Writer Tools, your username is techwriter and your prompt will look like this:

techwriter:~/$ 

For the rest of this lesson, we’ll just show the $ prompt, not the full prompt.

For your first command, let’s make a folder for your project. You’ll put the files you’ll work on in there.

Do this: Type mkdir datacenters then press Enter.

$ mkdir datacenters
$ 

The mkdir command makes a folder with the name you specify. As you can see, it doesn’t give any indication about its work unless there’s an error. Since there was no problem making the new folder, you just get another prompt. You can see the result of its work by seeing what’s in the current directory.

Do this: Enter ls. (From now on, when you’re asked to “Enter“ a command, type the command then press Enter.)

$ ls
datacenters welcome.txt
$ 

The ls command outputs the names of the files and directories in the current directory. The other name, welcome.txt in this case, is another other file in the same folder as your new datacenters folder.

Let’s go to the new directory, which makes it the new, current directory.

Do this: Enter cd datacenters.

$ cd datacenters

Do this: Enter pwd.

$ pwd
/home/techwriter/datacenters
$ 

The cd command changes the current directory. The pwd command outputs the current directory’s full path, which is /home/techwriter/datacenters.

Step 2: Get the CSV file

Now lets get some CSV to work with. Your spreadsheet might look like this:

Screenshot of a spreadsheet with data centres

The eventual result we want from this spreadsheet is a file named output.md:

There's a data centre ready to serve storage,
compute, and fresh donuts for our customers
around the globe:

- Catania
- Geneva
- Kyoto
- La Plata
- Montreal

Please contact our Sales department for more info.

I’ve exported this spreadsheet as a CSV file:

city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902

Do this: Copy the CSV text above into the clipboard.

Do this: Enter nano input.csv.

This starts the nano editor and opens a file named input.csv.

Use the terminal app to paste the clipboard into nano.

Do this: Press Control-S. In other words, hold down the Control key, press the S key, then let go of both keys.

You’ve just saved your CSV data in the input.csv file.

Do this: Press Control-X to leave nano.

You’ll be back at the prompt again.

$ 

You can check to see that your CSV file is in your project directory and has the correct contents.

Do this: Enter cat input.csv.

$ cat input.csv
city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902
$ 

The cat tool outputs the contents of the file you specify. You could have also used the less tool. It shows a file’s contents one screenful at a time. You can go up or down a screenful with the Page Up and Page Down keys. You can return to the prompt by pressing the q key (lowercase), for “quit”.

Step 3: Let’s get awking

We’ll write our csv-to-markdown script incrementally. This is a natural way to do it, the command line makes it easy to interact and iterate.

Let’s create the simplest awk program, an empty file.

Do this: Enter touch datacenters.awk.

What happened? Nothing, except that the touch command created a new, empty file named datacenters.awk.

Do this: Enter ls -l.

$ ls -l
total 4
-rw-r--r--  1 techwriter  techwriter    0 Nov 17 10:51 datacenters.awk
-rw-r--r--  1 techwriter  techwriter  259 Nov 17 10:51 input.csv
$ 

The -l, a hyphen followed by a lowercase L, in ls -l is an option. This option tells the ls tool to list files in long format. You can ignore most of this output, but take a look at the column with 0 and 259. This is the column for file sizes. Notice how datacenters.awk has 0 bytes, it is indeed empty.

Now let’s see this script in action.

Do this: Enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
$ 

Excellent. Nothing happened. That’s what we expected, after all, because our awk script is empty.

Let’s take a closer look at the gawk command you entered. You can probably figure out what it means:

A couple of things to keep in mind:

Step 4: The identity script

Now we’ll edit our awk script to make it do something, more or less. Well, more less than more. We’ll create an identity script.

In mathematics, the identity function returns the value that you give it. In other words, it doesn’t do anything more than repeat what you tell it. In the command line, an identity script outputs its input. How is that useful? It isn’t immediately useful, but it’s a good starting point to build on.

Do this: Enter nano datacenters.awk then copy and paste the following. It’s just a single line. Make sure you end it by pressing Enter.

{ print; }

Do this: Press Control-S then Control-X to save datacenters.awk and quite nano.

It’s a simple script, right? Let’s unpack it.

Awk works in a srtaightforward way. It reads its input one line at a time. For each line, it checks to see if there’s anything to do. If there is, awk does it.

How does awk know what to do? That’s what an awk script is for. An awk script is pretty straightfoward. It’s organized into pattern-action pairs. For each input line, awk checks the script for any patterns that match the input line. For each pattern that is true for the line, awk performs the pattern’s action.

What we’ve done is create a single pattern and action in our script.

{ print; }

Actually, you can’t see the pattern because we’re relying on the default pattern, also called the empty pattern. The empty pattern is always true for every line.

An action is wrapped in { and }. The default action is to do nothing. But we want our action to repeat the line that we’re currently processing. That’s what the print statement does. The default for the print statement is to print the matching line. The statement ends with a semi-colon (;). We use semi-colons to separate statements in an action. This is optional when there’s only one statement in an action, but we put it here as a good habit.

So our simple datacenters.awk script has a single pattern-action that matches all lines and outputs them.

Now let’s run our command again to see if it really is the identity script.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902
$ 

There you go. Our identity script does what we expect.

Step 5: Pick a specific column

We want our script to only output contents of the city column, which is the first column. To do this, we give the print statement an argument that specifies this column.

Do this: Enter nano datacenters.awk, make the following change, then save and quit nano:

{ print $1; }

The $1 argument for the print statement specifies the first column, our city column.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
city
Catania
Geneva
Kyoto
La Plata
Montreal
$ 

Step 6: Format for markdown

We’re getting closer! Let’s format our output as an unordered list in markdown. Each list item in an unordered list starts with a hyphen, followed by a space, then the text for the item.

Do this: Open datacenters.awk in nano, make the following change, then save and quit.

{ print "- " $1;}

We’ve given 2 arguments to print, a string containing the beginning of a list item in markdown, a hyphen and space. Notice that we wrapped the string in double quotes. The next argument is the value of our first column.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
- city
- Catania
- Geneva
- Kyoto
- La Plata
- Montreal
$ 

Look at that, you’ve converted CSV to markdown! Now we can put some finishing touches to get the final output we’re after.

Step 7: Ignore the first line

You’ve probably been annoyed by it by now, the column name, city, is in the first line of our output. Want we want to do is ignore this first line in the input so it doesn’t show up in the output. We can do this with a new pattern-action.

Do this: Edit datacenters.awk with the following, then save and exit nano:

NR == 1 { next; }
{ print "- " $1; }

You already know that the 2nd line in our script does. Let’s take a look at the new, first line. Unlike the 2nd line in our script, this new pattern-action has an explicit pattern, NR == 1. It uses awk’s built-in variable named NR. Its value is the number of the input line that awk is currently processing. For the first line of input, NR’s value is 1. So that’s what we check for. NR == 1 means “Is NR’s value equal to 1?” When this pattern is true, awk does its action.

The action for this pattern is the next statement, which tells awk to stop looking for more matching patterns for this line and move on to the next line. Notice that we put this pattern-action at the beginning of our script. We don’t want awk to process any other actions when NR is 1.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
- Catania
- Geneva
- Kyoto
- La Plata
- Montreal
$ 

There. Our markdown output shows just the cities without the first line.

Step 8: Beginning and end

We want our output to have text before and after the list of cities. For that we add a couple of new pattern-action pairs. Take your time with this one, it’s our biggest change to our script so far.

Do this: Edit datacenters.awk in nano with the following, then save and quit.

# Convert datacenter csv to markdown

BEGIN {
        print "There's a data centre ready to serve storage,"
        print "compute, and fresh donuts for our customers"
        print "around the globe:";
        print "";
}

END {
        print "";
        print "Please contact our Sales department for more info.";
}

NR == 1 {
        next;
}

{
        print "- " $1;
}

There are a few new things going on here:

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
There's a data centre ready to serve storage,
compute, and fresh donuts for our customers
around the globe:

- Catania
- Geneva
- Kyoto
- La Plata
- Montreal

Please contact our Sales department for more info.
$ 

Someone changed their mind

The Product Manager’s barber’s plumber’s cousin wants datacenters listed only if they offer fresh donuts. You’ll have to update the markdown.

Guess what, you can do that easily. Let’s take look at our CSV data to figure this out.

Step 9: Filter for donuts

Do this: Enter head -1 input.csv.

$ head -1 input.csv
city,state-prov,country,storage,compute,donut,id
$ 

The head tool outputs the first lines of its input. In this case, we use the option -1 (that’s a hyphen with a number 1) to specify just the first line, which contains the names of the columns.

The donut column is the 6th column. We’ll use this to update our script with a new pattern for outputting markdown list items.

Do this: Edit datacenters.awk in nano with the following, then save and quit nano:

# Convert datacenter csv to markdown

BEGIN {
        print "There's a data centre ready to serve donuts";
        print "for our customers around the globe:";
        print "";
}

END {
        print "";
        print "Please contact our Sales department for more info.";
}

NR == 1 {
        next;
}

$6 == "TRUE" {
        print "- " $1;
}

Here’s what we did:

Let’s see if we get what we expect.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
There's a data centre ready to serve donuts
for our customers around the globe:

- Catania
- Geneva
- Kyoto
- La Plata

Please contact our Sales department for more info.
$ 

Step 10: Generate an output file

So far we’ve seen our output show up on the terminal. That’s handy because we can see immediately if our script is doing what we want it to. You can redirect this output to a file instead, ready to copy or send to whoever or whatever needs it.

Do this: Enter gawk --csv -f datacenters.awk input.csv > output.md.

$ gawk --csv -f datacenters.awk input.csv > output.md
$ 

Notice the > output.md we’ve added to the end of our command. The greater-than sign (>) tells the command line to redirect output from the terminal to a file named output.md.

Do this: I’ll leave it to you to figure out if output.md contains what you expect it to.

Actually, > is redirecting from standard output. Standard output is the name for, well, the usual output of a command-line program. By default, standard output goes to the terminal. But you can use > to redirect it to a file.

Surprise! There’s also standard input. Standard input is the usual input to a command-line program, typically that’s you at the keyboard. Some commands, like awk, use standard input if you don’t specify an explicit input file. You can tell the command line to redirect standard input from a file with <.

Do this: Enter gawk --csv -f datacenters.awk < input.csv > output.md.

$ gawk --csv -f datacenters.awk < input.csv > output.md
$ 

Notice how we don’t tell gawk which file to use to get its input. Instead, we’ve redirected standard input from input.csv. Since awk has no explicit input file to work with, it uses standard input, which in this case comes from input.csv.

In practice, you’re right to think that this doesn’t make any difference to our input or output in this case. But redirection is a command-line superhero-level power for lots of things that are beyond the scope of this little page.

Where to go from here

Here’s what you can do now:

Just these skills can solve quite a few problems. Awk alone is quite the Swiss army knife for tabular data.

The Linux and Unix world has a lot of other text-processing tools besides awk. You’ve already used cat and head. There are many others, including tail, sort, and uniq.

Go ahead, explore with your new skills. Try these exercises: