CSV to markdown

You can make a script that converts CSV to markdown

Staying in touch. Kyoto, Japan. August 2023.

Tech writers sometimes need to get their hands dirty. Sometimes they want to but can’t. How would you a convert a list of regional data centres in CSV to an unordered list in markdown? CSV and markdown are both text formats, so you could use an editor to search and replace and copy and paste. That’s especially tedious if you have to do all that work again when the CSV file changes.

Or you could write your own damned script. It’s not that hard and once you learn some basics you’ll be surprised how useful your new skills are in other situations.

You’ll need a command line

First, you’ll need access to a computer that has a command line and related tools.

Do this: Install Tech Writer Tools.

Is this your first time?

Is this your first time on the command line? Take a guided tour first.

You won’t break anything, I promise

For this lesson, don’t worry about breaking things because there’s nothing to break. By default, Tech Writer Tools acts like a sandbox. Anything you do in it doesn’t affect the rest of your computer.

That also means that whatever you do in Tech Writer Tools gets wiped clean when you stop Docker Desktop. There’s an easy way to save your work automatically, but we’ll keep Tech Writer Tools as a sandbox for now.

Step 1: Make a directory for your project

What we call a folder in a GUI is called a directory in the command line. Folder and directory are synonymous, so we’ll stick with directory.

Let’s make a directory for your project. You’ll put the files you’ll work on in there.

Do this: Start Tech Writer Tools.

What happened: The shell shows a prompt to let you know that it’s ready for your next command:

techwriter:~$ 

The prompt shows your username (which is techwriter in this sandbox), the current directory (which we’ll get into in a moment), and ends with a dollar sign, $.

The shell’s shorthand notation for your home direcory is a tilde (~). According to the prompt, then, your current directory (~) is your home directory.

What exactly is your home directory? Conceptually, it’s the directory where you keep your documents, photos, and other files. Your computer has other directories, like the directory for your computer’s operating system. It’s always a good idea to keep your personal and professional files in your home directory.

Where exactly is your home directory? Let’s find out.

Do this: Type pwd then press Enter.

techwriter:~$ pwd
/home/techwriter
techwriter:~$ 

What happened: The pwd command tells the shell to output the full path of your present working directory, also known as the current directory.

In this case, you’re in your home directory, /home/techwriter. Note that this is your home directory in Tech Writer Tools, not your regular home directory on your computer.

Now let’s create a new directory in your home directory.

Do this: Type mkdir datacenters then press Enter.

techwriter:~/$ mkdir datacenters
techwriter:~/$ 

What happened: The mkdir command makes a directory with the name you specify. As you can see, it doesn’t give any indication about its work unless there’s an error. Since there was no problem making the new directory, you just get another prompt.

Also notice that, according to your prompt, your current directory is still your home directory. The mkdir command creates directories but doesn’t change the current directory.

Let’s see the what’s in the current directory.

Do this: Enter ls. (From now on, when you’re asked to “Enter” a command, type the command then press Enter.)

techwriter:~/$ ls
datacenters welcome.txt
techwriter:~/$ 

What happened: By itself, the ls command outputs the names of the files and directories in the current directory. The other name, welcome.txt in this case, is another other file in the same directory as your new datacenters directory.

Let’s go to the new directory, which makes it the new, current directory.

Do this: Enter cd datacenters.

techwriter:~/$ cd datacenters
techwriter:~/datacenters$ 

What happened: Your current directory is now datacenters, as shown by the prompt. You don’t have to, but you can confirm that with pwd. Let’s do it anyway.

Do this: Enter pwd.

techwriter:~/datacenters$ pwd
/home/techwriter/datacenters
techwriter:~/datacenters$ 

For the rest of this lesson, we’ll just show the $ prompt, not the full prompt.

Step 2: Get the CSV file

Now lets get some CSV to work with. Your spreadsheet might look like this:

Screenshot of a spreadsheet with data centres

The eventual result we want from this spreadsheet is a file named output.md:

There's a data centre ready to serve storage,
compute, or fresh donuts to our customers
around the globe:

- Catania
- Geneva
- Kyoto
- La Plata
- Montreal

Please contact our Sales department for more info.

I’ve exported this spreadsheet as a CSV file:

city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902

Do this: Copy the CSV text above into the clipboard.

Do this: Enter nano input.csv.

This starts the nano editor and opens a file named input.csv.

Do this: Use the Terminal app to paste the clipboard into nano.

What happened: Your Terminal app’s window should look a lot like this:

Screenshot of nano editor with csv

Do this: Press Ctrl + S. In other words, hold down the Control key, press the S key, then let go of both keys.

What happened: You’ve just saved your CSV data in the input.csv file.

Do this: Press Ctrl + X to leave nano.

You’ll be back at the prompt again.

$ 

You can check to see that your CSV file is in your project directory and has the correct contents.

Do this: Enter cat input.csv.

$ cat input.csv
city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902
$ 

What happened: The cat tool outputs the contents of the file you specify.

Step 3: Let’s get awking

We’ll write our csv-to-markdown script incrementally. By “incrementally” I mean that we’ll start at the simplest possible place then gradually build up a solution that solves our problem. This is a natural way to do it, the command line makes it easy to interact and iterate.

Let’s create the simplest awk program, an empty file.

Do this: Enter touch datacenters.awk.

What happened: Nothing, except that the touch command created a new, empty file named datacenters.awk. Don’t believe me? Look for yourself.

Do this: Enter ls -l.

$ ls -l
total 4
-rw-r--r--  1 techwriter  techwriter    0 Nov 17 10:51 datacenters.awk
-rw-r--r--  1 techwriter  techwriter  259 Nov 17 10:51 input.csv
$ 

What happened: The -l, a hyphen followed by a lowercase L in ls -l, is an option. This option tells the ls tool to list files in long format. You can ignore most of this output, but take a look at the column with 0 and 259. This is the column for file sizes. Notice how datacenters.awk has 0 bytes, it is indeed empty.

Now let’s see this script in action.

Do this: Enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
$ 

What happened: Excellent. Nothing happened. That’s what we expected, after all, because our awk script is empty.

Let’s take a closer look at the gawk command you entered. You can probably figure out what it means:

A couple of things to keep in mind:

Step 4: The identity script

Now we’ll edit our awk script to make it do something, more or less. Well, more less than more. We’ll create an identity script.

In mathematics, the identity function returns the value that you give it. In other words, it doesn’t do anything more than repeat what you tell it. In the command line, an identity script outputs its input. How is that useful? It isn’t immediately useful, but it’s a good starting point to build on.

Do this: Enter nano datacenters.awk then copy and paste the following. It’s just a single line. Make sure you end it by pressing Enter.

{ print; }

Do this: Press Ctrl + S then Ctrl + X to save datacenters.awk and quit nano.

It’s a simple script, right? Let’s unpack it.

Awk works in a straightforward way. It reads its input one line at a time. For each line, it checks to see if there’s anything to do. If there is, awk does it.

How does awk know what to do? That’s what an awk script is for. An awk script is pretty straightfoward. It’s organized into pattern-action pairs:

What we’ve done is create a single pattern and action in our script.

{ print; }

Actually, you can’t see the pattern because we’re relying on the default pattern, also called the empty pattern. The empty pattern is always true for every line.

An action is wrapped in { and }. The default action is to do nothing. But we want our action to repeat the line that we’re currently processing. That’s what the print statement does. The default for the print statement is to print the matching line. The statement ends with a semi-colon (;). We use semi-colons to separate statements in an action. This is optional when there’s only one statement in an action, but we put it here as a good habit.

So our simple datacenters.awk script has a single pattern-action that matches all lines and outputs them.

Now let’s run our command again to see if it really is the identity script.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
city,state-prov,country,storage,compute,donut,id
Catania,Sicily,IT,TRUE,FALSE,TRUE,32848
Geneva,New York,US,FALSE,FALSE,TRUE,28342
Kyoto,Kyoto,JP,FALSE,TRUE,TRUE,81283
La Plata,Buenos Aires,AR,TRUE,TRUE,TRUE,90123
Montreal,Quebec,CA,TRUE,FALSE,FALSE,17902
$ 

What happened: There you go. Our identity script does what we expect.

Step 5: Pick a specific column

We want our script to only output contents of the city column, which is the first column. To do this, we give the print statement an argument that specifies this column.

Do this: Enter nano datacenters.awk, make the following change, then save and quit nano:

{ print $1; }

The $1 argument is a dollar sign ($) followed immediately by the number 1. This argument for the print statement specifies the first column, our city column.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv.

$ gawk --csv -f datacenters.awk input.csv
city
Catania
Geneva
Kyoto
La Plata
Montreal
$ 

Step 6: Format for markdown

We’re getting closer! Let’s format our output as an unordered list in markdown. In markdown, each list item in an unordered list starts with a hyphen, followed by a space, then the text for the item.

Do this: Open datacenters.awk in nano, make the following change, then save and quit.

{ print "- " $1; }

We kept the $1 but we’ve preceded it by a new argument, "- ". This new argument is a text string, often called just a string. A string is a piece of text that should be used literally. In awk, we specify a string by wrapping it in straight quotes.

In this case, we want each line of output to start with a hyphen and a space followed by the value of the first column in the input line.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
- city
- Catania
- Geneva
- Kyoto
- La Plata
- Montreal
$ 

Look at that, you’ve converted CSV to markdown! Now we can put some finishing touches to get the final output we’re after.

Step 7: Ignore the first line

You’ve probably been annoyed by it by now, the column name, city, is in the first line of our output. Want we want to do is ignore this first line in the input so it doesn’t show up in the output. We can do this with a new pattern-action.

Do this: Edit datacenters.awk with the following, then save and exit nano:

NR == 1 { next; }
{ print "- " $1; }

You already know that the 2nd line in our script does. Let’s take a look at the new, first line.

Unlike the 2nd line in our script, this new pattern-action has an explicit pattern, NR == 1. It uses awk’s built-in variable named NR. Its value is the number of the input line that awk is currently processing. For the first line of input, NR’s value is 1. So that’s what we check for. NR == 1 means “Is NR’s value equal to 1?” When this pattern is true, awk does its action.

Yes, that’s a double equal sign (==), which is a test for equality. A single equal sign means something else entirely, which we won’t get into here.

The action for this pattern is the next statement, which tells awk to stop looking for more matching patterns for this line and move on to the next line. Notice that we put this pattern-action at the beginning of our script. We don’t want awk to process any other actions when NR is 1.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
- Catania
- Geneva
- Kyoto
- La Plata
- Montreal
$ 

There. Our markdown output shows just the cities without the first line.

Step 8: Beginning and end

We want our output to have text before and after the list of cities. For that we add a couple of new pattern-action pairs. Take your time with this one, it’s our biggest change to our script so far.

Do this: Edit datacenters.awk in nano with the following, then save and quit.

# Convert datacenter csv to markdown

BEGIN {
        print "There's a data centre ready to serve storage,"
        print "compute, or fresh donuts to our customers"
        print "around the globe:";
        print "";
}

END {
        print "";
        print "Please contact our Sales department for more info.";
}

NR == 1 {
        next;
}

{
        print "- " $1;
}

There are a few new things going on here:

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
There's a data centre ready to serve storage,
compute, or fresh donuts to our customers
around the globe:

- Catania
- Geneva
- Kyoto
- La Plata
- Montreal

Please contact our Sales department for more info.
$ 

Step 9: Filter for donuts

The Product Manager’s barber’s plumber’s cousin wants datacenters listed only if they offer fresh donuts. You’ll have to update the markdown.

Guess what, you can do that easily. Let’s take look at our CSV data to figure this out.

Do this: Enter head -1 input.csv.

$ head -1 input.csv
city,state-prov,country,storage,compute,donut,id
$ 

What happened: The head tool outputs the first lines of its input. In this case, we use the option -1 (that’s a hyphen with a number 1) to specify just the first line, which contains the names of the columns.

The donut column is the 6th column. We’ll use this to update our script with a new pattern for outputting markdown list items.

Do this: Edit datacenters.awk in nano with the following, then save and quit nano:

# Convert datacenter csv to markdown

BEGIN {
        print "There's a data centre ready to serve donuts";
        print "to our customers around the globe:";
        print "";
}

END {
        print "";
        print "Please contact our Sales department for more info.";
}

NR == 1 {
        next;
}

$6 == "TRUE" {
        print "- " $1;
}

Here’s what we did:

Let’s see if we get what we expect.

Do this: In the command line, enter gawk --csv -f datacenters.awk input.csv

$ gawk --csv -f datacenters.awk input.csv
There's a data centre ready to serve donuts
to our customers around the globe:

- Catania
- Geneva
- Kyoto
- La Plata

Please contact our Sales department for more info.
$ 

You did it!

Congratulations, you wrote a script that generates markdown from CSV.

What you learned