Why and How I Write Scientific Documents in Plain Text

As a scientist, I spend much of my time writing. Over the years I’ve experimented with a panoply of writing tools. This week I thought I’d share what I’ve learned.

While you can ultimately write on any device that records your words, a good tool makes writing less painful. For a lot of people, the writing tool is Microsoft Word (or some similar word processor). If it works for you, great. But for my scientific writing, I’ve found that Word falls short.

Outside of Word, there’s a growing ecosystem of tools that use plain text as their input. That’s text with no formatting — the stuff you see when you open a text editor (like Notepad below).

Notepad — a Windows text editor.

At first, it seems bizarre that you’d want to give up the rich text formatting of Word for the plain text of Notepad. But there are actually many advantages. Let’s dive in.

WYSIWYG vs. WYSIWYM

The difference between Word and plain text is fundamentally about two different philosophies of how to format a document. Word adopts the WYSIWYG philosophy — ‘what you see is what you get’. What you see on the page is what your final document will look like. To achieve the formatting, you use a variety of point-and-click menus. The result looks like this:

This is italic. This is bold.

When you write in plain text, however, you use the WYSIWYM philosophy — ‘what you see is what you mean’. To format your text, you use a markup language — a syntax that tells your computer what to do. To format the above text using the markup language Markdown you’d write:

To many people, it seems odd that you’d choose to type your formatting semantically when you could just see it. It’s like leaving your Tesla in the garage and choosing to drive a Model T.

I know the feeling, because I remember when I first tried a WYSIWYG office suite. It was in the mid 1990s. My family had just replaced our Apple IIe with a fancy new Macintosh LC 550. The Mac came with ClarisWorks — an office suite (now defunct) that blew my mind. As I typed my essays for school, I spent hours playing with fonts and formatting. I remember submitting essays to my Grade 7 teacher in a torturous-to-read cursive font … just because I could. (I’m part of the reason all essay assignments now come with the boilerplate request to ‘use Times New Roman font’.)

Fast forward 3 decades. Today, I rarely open an office suite. Instead, I do almost all of my writing in a simple text editor. Why have I given up the WYSIWYG formatting that I once found so revelatory? Here’s a list of reasons.

Why I abandoned Word

Reason 1: Word has a cluttered interface

One of the reasons that I now dislike word processors is a reason that I initially loved them — you can see all the formatting options in menu bars. These menus make formatting easy. But they make the interface cluttered. Below, for instance, is a screenshot of Word. I count 5 rows of buttons (including the ruler) before you get to your actual text.

Microsoft Word — too much clutter for me.

I find this clutter aesthetically displeasing. But more than that, I find that it distracts from my writing. To write something in Word, I have to actively not look at all of the formatting options. Think about it this way. Because we read from the top to the bottom of a page, our eyes naturally gravitate to the top. But in Word, the top quarter of the screen consists of buttons that, while writing, you have to ignore. To me, that seems like unneeded mental effort.

Today, I favor a distraction-free interface — something you can’t really get using a word processor. (WYSIWYG formatting requires point-and-click menus.) To get distraction-free writing, I use an old-fashioned text editor. There are hundreds of different editors, but I use gedit. Like modern internet browsers, gedit’s top panel is bare bones. The lack of distracting buttons puts the focus on your writing.

The text editor gedit has a minimalist interface.

Reason 2: In Word, it’s difficult to update figures and tables

The usefulness of a tool depends on the task at hand. If you want to do something a few times, you don’t use the same tool as if you were to do the same task many times. Here’s an example. If someone asks you to move one box across a room, you’d move it by hand. But if someone asked you to move a thousand boxes … you’d use a conveyor belt.

Computers are like conveyor belts. Their raison d’être is to do repetitive tasks with blazing speed. Your word processor, however, gives you the illusion of using a hand-held tool. To put an image into your document, your word processor allows you to ‘cut and paste’ it … just like you would with real paper.

If you’re inserting a single image once, this illusion makes the job easier. I remember discovering cut and paste (and drag and drop) on our first Mac. It was miraculous. Suddenly adding images to my essays was easy.

The problem comes when you want to update the image(s) as you write. Then the illusion (created by your word processor) that your computer is a hand-held tool becomes a liability. Every time you want to update an image, you have to manually insert the new one, format it to the desired size and position, and so on. This repetition is a needless make-work project because it’s not letting your computer do what it does best — be a conveyor belt.

In contrast, writing in plain text lets the computer do the repetitive work. Rather than drag and drop an image, you tell the computer where to find it. Suppose that you wanted to insert an image called image.png. To do so using HTML markup, you’d write:

The cost here is that it’s more difficult to write the above code than to drag and drop the image file. You have to learn what the code does, and remember the syntax when you want to insert an image. This initial investment of time, however, pays dividends when you start updating images. No matter how many times you change the file image.png, the newest version will be automatically included in your document. As you start to do more complicated analysis that’s updated many times, this feature is essential.

Reason 3: Word is clunky for large documents

In a much-lambasted study published in PLOS ONE, Markus Knauff and Jelica Nejasmic compared Microsoft Word to the markup language LaTex. They claimed that Word was objectively ‘superior’. But as many critics have observed, Knauff and Nejasmic designed the study so that their results were a forgone conclusion.1

To compare Word to LaTex, Knauff and Nejasmic asked volunteers to reproduce the text and formatting of a short document. The problem is that this is exactly the task where you’d want to use Word — formatting a short document once. Remember that Word turns your computer into a hand-held tool. When you want to do something once (move one box across the room), that’s the tool you want.

The markup language LaTex, in contrast, is a conveyor belt. To format a new document from scratch, you have to write a lot of code — you have to build the conveyor belt. You’d never do that to move a single box. The payoff comes only when you move thousands of boxes. For writing, that happens when you want to format a long, complex document. Unfortunately, Knauff and Jelica Nejasmic didn’t test such a task.

I have … at least informally.

I consider my Masters thesis an n = 1 study of the perils of using a word processor to format a long, complex document. It was a nightmare. The thesis had dozens of figures, some added mid-way through the writing process. Each time I added a new figure, it would wreck the formatting in half the document. And references were a disaster. I used the Zotero plugin for Libre Office, which was clearly not designed with volume in mind. My friend James McMahon had a similar experience while writing his PhD thesis. As he puts it, ‘adding a new reference meant getting a cup of coffee’. Even reading my thesis was tedious. The screen would hang at every figure.

The second half of my n = 1 study came during my PhD. In the summer of my first year, I turned much of the content of my Masters thesis into the book Rethinking Economic Growth Theory from a Biophysical Perspective. (A big thinks to Charlie Hall for inviting me to write this book.) I wrote the book using LaTex. The process was so much easier than writing my Masters thesis that I vowed never to go back to Word. Updating or adding new figures was easy. So were citations and cross references. All the tedious stuff that I had to do by hand in Word just happened automatically. (It helped that the publisher Springer provided a LaTex template, so I didn’t have to build one from scratch.)

Yes, I spent many hours on Stack Exchange learning how to use LaTex. But the ease of the resulting typesetting was worth it. (And I never have to spend this time again.) Since the torture of my Masters-thesis experience, I’ve used LaTex to write all of my scientific documents. If you’re interested, here is my LaTex thesis template.

Reason 4: In plain text, open source is king

I’m an avid supporter of open source software, and that’s a big reason why I switched to writing in plain text. Yes, there are open source office suites like Libre Office (which is quite good). Still, this open source software is a bit player in an industry dominated by the proprietary Microsoft Word.

This fact annoys me. It means that open source word processors are forever playing catch up to maintain compatibility with Word documents. They never manage to do it perfectly. Just the other day I had to fill out a form sent to me in Microsoft’s docx format. When opened in Libre Office, the form was unusable. I had to open the file on my wife’s computer (which has Word) to use it.2

Yes, this was a minor inconvenience. But it’s one that never happens in the plain text world. There are two reasons why. First, every operating system comes with a text editor that can display plain text files without compatibility issues. That’s because plain text is the basis for computer programming. Second, the dominant tools for rendering plain text markup into a final document are all open source.

When you switch to plain text, the paywalls come down. There are no fees, and no vendor lock in.

Problem 5: Format lock in

Speaking of lock in, that’s another reason I use plain text. I increasingly treat my writing as a source code that will eventually be rendered in multiple formats (PDF, HTML, epub). When you write in plain text, there’s a growing ecosystem that makes this ethos a reality. (I’ll discuss it shortly). And because plain text is readable by all coding software, you can easily write code to do custom conversions.

In contrast, when you write in Word you’re mostly stuck in Word. Yes, you can export to other formats. But the conversion is rarely seamless. And if you want to do the conversion well, you’ll probably need proprietary software.

The plain-text landscape

Okay, enough about the problems with word processors. Let’s talk about plain text.

We’ll start with the most banal of things … file extensions. Unlike Word documents, which mostly come as .docx files, plain text files can have thousands of different extensions. In all cases, the file can be read by a simple text editor. What the extension does is tell you about the purpose of the file.

Let’s start with the most basic form of plain text — the .txt file. This format is common for software README notes. It’s designed for humans to read and write, nothing more. I use txt files for taking notes and making lists. It’s quick and easy.

Beyond the .txt file, there’s a huge range of plain text extensions. Most of them relate to computer programming. There’s the .py extension for python files, the .R extension for R files, the .cpp extension for C++ files, and so on. The extension doesn’t change the fact that the file is in plain text. Rather, it tells the computer what software to use when interpreting the file.

That brings us to markup languages. Other than for taking notes, your goal when writing in plain text is to have the file rendered into some final document. To do that, there are three main file extensions, shown below:

Language File extension Use
LaTex .tex PDF
HTML .html Web document
Markdown .md Any format

Let’s talk about what each one does.

LaTex

When it comes to markup languages, LaTex is the grandfather. It was created by Leslie Lamport in 1984. However the basis of LaTex — the Tex typesetting system created by Donald Knuth— dates back even earlier to 1978.

Back then, the only reason to typeset something was to print it. For that reason, LaTex is geared for making PDF documents. If that’s all you want, LaTex is the tool for the job. It typesets printable documents better than any other tool. And because it’s been around for so long, LaTex comes with a huge array of packages to meet your needs. There’s also a great help community on Stack Exchange.

Because it’s old, however, LaTex is burdened by a rather ugly syntax. To write This is bold, you’d type:

Newer languages like Markdown have a much cleaner syntax. What’s exciting is that you can now get the best of both worlds. You can type with the simplicity of Markdown while rendering your document using the power of LaTex. More on this shortly.

Back to using LaTex. Because it’s plain text, you can write a LaTex document in any text editor. You then render it to a PDF from the command line. If that doesn’t appeal to you, there are also dedicated apps for writing LaTex. TeXstudio (below) is one of the best.

TeXstudio — one of the best LaTex editors.

What I like about editors like TeXstudio is that they give you separate windows for writing and reading. Your markup text is on the left, while your final document is on the right. I like this visual separation because the tasks of writing and reading are conceptually distinct. When you write, you should let your thoughts flow freely. When you read your work, you should use an analytic lens. It’s nice to have these different tasks separated on the screen.

That said, LaTex editors like TeXstudio have a default interface that is as clunky as Microsoft Word. (In TeXstudio, you can customize the interface to get rid of most of the clutter.) Yes, the many menus make typesetting easier. But these menus distract from your actual text. Because of that, today I write LaTex in a simple text editor. Actually, I mostly write in Markdown and then, if I want a PDF, have my computer create a LaTex document. More on that shortly.

HTML

HTML — short for ‘hypertext markup language’ — was developed by physicist Tim Berners-Lee to allow scientists at CERN to share their work. Today, HTML does something far more expansive — it runs the web.

If you want to see HTML code, it’s not hard. Go to any web page and right click View page source. You’ll see the HTML used to render the site. It will look something like this:

An example of HTML — the source code for a draft of this post.

In all likelihood, the code will look God awful. That’s because modern websites are really complicated. (And because your browser won’t wrap the lines of writing.) Most of the code is boilerplate stuff that tells your computer how to make the site look pretty.

Despite the complexity of modern websites, basic HTML syntax is quite simple. To write This is bold, you’d type:

For the first year of this blog, I wrote it in HTML. It was fun … for a while. One of the biggest annoyances of HTML is that you need to surround paragraphs with <p></p>, like this:

That gets old really fast. And so from the very beginning of the internet, people have looked for ways to generate HTML without actually writing it. Today, every blogging platform has a WYSIWYG editor that allows you to write like you would in Word. It then generates HTML under the hood. So you can write prolifically for the web without knowing anything about HTML.

That said, I think it’s useful to know the basics of HTML syntax. That’s because it now forms the basis not just of the web, but of all documents that have dynamic text (text that flows with the page size). Ebooks, for instance, are based on XHTML — an extended verion of HTML.

If you want to have your writing on the web or in an ebook, it’s worth learning basic HTML syntax.

Markdown

Now we’re getting to the most exciting part of the plain-text movement — writing in Markdown. Created in 2004 by blogger John Gruber and internet activist Aaron Swartz, Markdown was designed to be easy to read and write. The goal was to make blogging more pleasant. You’d write your text in Markdown, and then have your computer generate HTML.

In creating simple syntax, Gruber and Swartz certainly succeeded. To write This is bold in Markdown, you’d type:

That’s much simpler than either LaTex or HTML:

LaTex HTML Markdown
\textbf{bold} <b>bold</b> **bold**

Initially, Markdown was designed for a limited purpose: to generate HTML. But today, Markdown forms the basis for a growing ecosystem dedicated to writing documents in plain text. You can write in Markdown, and than use the power of LaTex and HTML to render your document in any form you want.

The Markdown ecosystem

At the center of the Markdown ecosystem is a document converter called Pandoc. Created by philosopher John MacFarlane, Pandoc is a command-line tool for converting between different document formats (examples here).

While it can do almost any conversion, Pandoc is designed with Markdown in mind. That means you can write in Markdown, and then use Pandoc to render your file in any format. If you want a web document, it will generate flawless HTML. If you want a print document, Pandoc can generate a PDF.

Here’s where things get interesting. To generate a PDF, Pandoc uses LaTex. That means you can take advantage of the simplicity of Markdown, but use the power of LaTex to render your document.

But wait, there’s more. You can now use Markdown to integrate your writing with your analysis. How? By using R Markdown, which allows you to embed R code in your document. When you render your document, the computer will automatically run your code and output the result in your document.

Yihui Xie, a software engineer at Rstudio, has upped the anti. He’s created an entire publishing toolbox based on Markdown. For writing blogs, there’s the Blogdown package. For writing books, there’s the Bookdown package. I recently used Bookdown to render Nitzan and Bichler’s seminal book Capital as Power into an ebook and online book. It was an enjoyable experience … so much so that I’ve vowed to right my next book in Bookdown.

In short, Markdown has transformed the landscape of plain-text writing. It allows you to use simple syntax to write complex documents. And unlike older tools like LaTex, your Markdown document can be rendered seamlessly in any format you want. It’s a writer’s dream.

If you want to learn how to write scientific documents in Markdown, I’ve listed resources at the end of the post.

Join the plain text movement

There are many reasons for writing in plain text. But for me, the most important is that it integrates my writing with the rest of my scientific toolbox (i.e. code), all of which is in plain text. Yes, the publishing industry is built around Microsoft Word. But that doesn’t mean science must be.


Support this blog

Economics from the Top Down is where I share my ideas for how to create a better economics. If you liked this post, consider becoming a patron. You’ll help me continue my research, and continue to share it with readers like you.

patron_button


Stay updated

Sign up to get email updates from this blog.


Resources for scientific writing in Markdown

If you’re new to writing in plain text, I recommend reading The Plain Person’s Guide to Plain Text Social Science by Kieran Healy. He takes you through the philosophy of plain-text writing and then dives into the specifics of writing in Markdown.

Some resources to learn basic Markdown syntax:

Try out Markdown with this web app:

When you’re ready to start writing technical documents, here’s some short primers:

Finally, here are book-length manuals for writing with R Markdown:

Notes

  1. Some critiques of Knauff and Nejasmic’s study:

  2. I don’t have Word on my computer for three reasons. First, I don’t want to pay for it. Second, I run Linux … and Microsoft does not make a Linux version of Office. To install Office on Linux, you have to use the WINE compatibility layer. Third, I don’t want proprietary software on my computer, which is otherwise completely open source.

3 comments

  1. I have done nearly all of my writing in Gedit for many years. Its just so much faster and easier compared to MSword or LibreOffice.

    I hope your blog post inspires more people to venture into plain text.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s