The origin and virtues of semicolons in programming languages
Monday, April 15, 2024
While working on the grammar for my programming language, Lilac, I was exploring different choices for statement terminators.
.
is very appealing, or !
.
Ultimately, I might make the "boring" choice of using either ;
or significant whitespace.
But that had me asking: why is it that so many languages use semicolons for their statement terminators1? I found some good reading about why we have statement terminators at all, but little discussion on the specific merits of semicolons over other choices.
To get to the origin of semicolons in our programming languages, I turned to history. There were very few programming languages in the early days, so it's relatively easy to trace forward and look at all the early languages. If we do this, we find the first language that included semicolons as a statement separator: ALGOL 58.
Before ALGOL, languages typically used whitespace to mark statements, with each being on its own line (or punch card). ALGOL introduced a statement separator which gave the programmer more flexibility to put multiple statements on one line, or spread one statement across multiple lines. Unfortunately, when we dig into why the semicolon was used, there's not much of an answer! The original papers about it just describe that is the statement separator but not why.
And where does that leave us? To good old-fashioned speculation!
Speculation time
There are a few reasons why we would have picked up the semicolon, or why it wound up somewhere in our languages. This is all speculation, but the reasoning is sound.
It's available. Early computers had very limited character sets, and the semicolon was often available. Some early input devices were adapted from Remington keyboards, and those (based on the pictures I can find) did include a semicolon and colon. This makes sense, because if you want to enter English text you may run into semicolons occasionally! It's not the most oft used punctuation, but it's useful2. Since it was there, it was bound to wind up somewhere in a language, when we have few characters to choose from.
It's convenient.
The semicolon is on the home row without shift on modern keyboards, which I suspect is part of why it continues to be used a lot.
(That, and momentum.)
Being on the home row makes it super easy to type, so in contrast to something like !
, which requires two keystrokes and a stretch, you can get a ;
with just your right pinky.
Speaking of which, isn't it odd that the semicolon is the main one and the more-used colon requires a shift?
The usage is similar to in English.
One of the jobs of the semicolon in English is to delimit independent clauses; these are parts of a sentence which could stand alone but are closely related.
This is very similar to what a statement separator does.
More similar would be a .
, as each statement could be thought of as a sentence, but that brings us to another reason to prefer semicolons.
It's unlikely to conflict.
If you use a period, the humble .
, you can run into difficulties in parsing if you're not careful.
As my friend put it to me recently, the period is such a high value symbol that you have to be choose wisely what you use it for.
In modern languages, we use it for accessing fields and methods, and for defining floating point literals, and it's in range operators and spread operators.
In contrast, the semicolon is... nowhere else, except occasionally when used to start comments.
These are all pretty compelling reasons together to choose a semicolon for a statement separator!
What could you choose instead? Running through all the candidates, !@#$%^&*,./;:|-_
, I can't think of one of these that's a clearly better choice!
My personal preference is probably for .
if you can resolve the parsing issues, and !
can be really fun if you want a very excited language, but the humble ;
seems to have stuck around for being a solidly good decision instead of just continuity.
As for what I'm doing in my programming language, Lilac? I'm not entirely sure yet! The semicolon is the safe choice, but other choices (or not having one at all) have aesthetic appeal. I'd love to hear what you would choose in your dream world!
Thank you to Mary for the feedback on this post!
You said it doesn't have enough semicolons in it.
Here they are: ;;;;;;;;;;;;;;;;;;;;
.
In some languages, like Pascal, these are statement separators. I'm just going to say "terminators" here for ease, but this pose applies to both.
On one paper in high school, my chemistry teacher told me I was using too many commas. Truly, I had far too many, averaging maybe five per sentence. Joke was on him, though: I used fewer in the next paper by using semicolons instead (entirely grammatically correctly).
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts and support my work, subscribe to the newsletter. There is also an RSS feed.
Want to become a better programmer?
Join the Recurse Center!
Want to hire great programmers?
Hire via Recurse Center!