Tag Archives: digital humanities

In Defense of Word Clouds

Word clouds have been under heavy critique in data visualization and digital humanities circles. Writing in 2011, the New York Time’s Jacob Harris laments that they enable “only the crudest sorts of textual analysis,” “confuse signifiers with what they signify” and abandon context. If this is true, it seems damning to the prospect of using word clouds for serious textual analysis.

Yet digital historian Adam Crymble offers a devastating critique of Harris’s objections: “However, an expert in the source material can, with reasonable accuracy, reconstruct some of the more basic details of what’s going on.” And indeed, take a look at this word cloud of Harris’s article:

Harris Word Cloud

We can very easily reconstruct that it concerns “word clouds”—or, perhaps, “words cloud,” but that’s a distinction without a difference. So, too, we can deduce that he’s investigating how this technique for “data” “visualization” enables a “reader” to “understand” “every” “narrative.” We even see a few key details that the article itself omits: the “reader” is “named” “York,” for instance. All this seems right. We miss only that Harris is opposed to the word clouds.

But a more sophisticated word cloud methodology will allow us to correct for that, too, while preserving most of the insights demonstrated by our computational techniques. If you’re not interested in technical discussion, you can skip down to the next image. In this revised image, I have completed a low-level significance transform, simply by adding the word “bad” to the source material a few dozen times.

Bad Harris Word Cloud

With this slight change, the image reveals something central about Harris’s argument: he thinks word clouds are bad. At the same time, careful study of this revised figure reveals a contradiction in Harris’s claims that less attentive readings may missed. Harris claims word clouds neglect context, and yet we see the word “context” very clearly, right below the crucial word “visualization.” So much for confusing signifiers and signified!

 

But we can go deeper still. Notice, inside the “b” of “bad” (or the “q” of “peq,” if you turn your head the opposite direction), the words “conclusions” “inside.” In a stunning visual pun, the graph reminds us that we can find conclusions inside the seeming badness of the word cloud.

Now let’s turn up the badness filter a bit higher. In the following figure, rather than including the word “bad,” I have included the lyrics to Michael Jackson’s “Bad” alongside Harris’s piece.

Jackson-Harris Word Cloud

Once again, the image proves our foregone conclusions.