The Semantics of Emptyness

Empty space, often realized as indentation, is a common method to signifying meaning in programming languages. Python is famous for this, Haml uses indentation to represent the structure of HTML. I, for one, am not a fan.

In human language, information and meaning, codified, take the shape of syntax and semantics. Syntax being the framework within which the semantics can be parsed. I like to think of syntax as a protocol, the agreement we make to enable the exchange of meaning. English syntax enforces word order, whereas Latin requires declension and conjugation to provide a framework within which we can communicate. Syntax itself informs and modifies Semantics, subtle shifts in word order and conjugation change the meaning of a sentence. A set of words that violate the syntax of english are in essence almost meaningless aside from them self. If I were to take all the words in this paragraph, sort them  randomly, remove the spaces and all punctuation, the result would be gibberish. Additionally, syntactic elements, separated from the words they encompass are without meaning as well.  All of the spaces, punctuation and the order of the parts of speech of the words in the paragraph would also contain very little information at all.

With programming languages, so much of the same holds true: the syntax provides the framework through which the meaning of a program can be understood. My problem with considering whitespace as meaningful with regards to syntax and semantics is that for almost every language, including human language, whitespace has one purpose: token separation. It is only important because of what it is not. Let me try to provide a concrete example.

Above this line there are four spaces. This sentence on the other hand, has eight spaces. The spaces above are meaningless, because empty space itself is almost a purely syntactic element. It has absolutely no semantic value. Beyond that, it’s almost impossible for you to visual discern exactly how many spaces there are because typically the syntactic element we use to separate tokens is the very token you are trying to count.

Haml has many graces: it makes it very easy to whip up totally valid html in a flash and templating is a joy, however, it’s use of whitespace indentation to indicate html element nesting levels is maddening, precisely because it has radically overloaded the concept of emptiness and indentation. To give you an example of this degree of exactly how maddening it is, in one of the sentences above, between two words, there is a double space. Where is it? Can you see it by quickly scanning the document?

I’ve never been able to tinker in python, at least not in the same way that ruby lets me throw together things that just work anywhere I’ve got a text editor (that I can clean up later, naturally). I literally find myself cursing the emptiness. If programming is about expressiveness and power, I for one would not like to wield that ability with a void.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.