The Reddit post earned thousands of upvotes and comments. Strangers pointed me to it; friends and family pointed me to it. The data was right there in a graph, the trend unmistakable. And yet the name story wasn’t true. The baby name Karen did not just plummet into obscurity, pushed off a cliff by the force of a million mocking memes.
Join me for a cautionary tale of names, charts, and seeing what you expect to see, even when the truth would have been more interesting.
The Original Post
It all started with a scatterplot in Reddit’s “Data is Beautiful” forum that looked something like this.
It’s hard to miss the downward swoop of that curve—and easy to draw conclusions from it, especially since the name in question is Karen.
Karen is the most maligned name of our era. As Namerology chronicled when we crowned Karen the 2019 Name of the Year, the name has turned into an all-purpose insult for perceived failings of middle-aged white women. Who could be surprised that such a flood of derision should send a previously popular name into freefall?
But it didn’t happen. Not the recent popularity, and not the dramatic plunge in the past few years. Take a look at the actual usage history of the baby name Karen:
That graph tells an entirely different tale. As you can see, Karen blasted off from obscurity during the 1930s-‘50s then fell back to Earth in the 1970s. It held on at a modest level in the 1990s-2000s era of -n names like Kaitlyn and Megan, then fell again along with those names starting about a decade ago.
How can two charts give such wildly different accounts of the same events? The clue lies in something prosaic but essential missing from the original Reddit graph: axis labels.
The Root of the Problem
It turns out that the Redditor was plotting popularity rankings. Egad, danger! Readers, don’t try this at home! Seriously, don’t. It’s a terrible idea that produces wildly distorted charts.
Plotting name popularity rankings in that linear way breaks a promise to the viewer. The promise is, “The squares on my graph paper are all equivalent. The distance from 1 to 5 is the same as the distance from 901 to 905.” That promise holds true if you plot name frequency. It’s a lie when you plot rankings.
Baby name popularity is severely top-loaded. At the top of the popularity charts, moving down one ranking means a drop of hundreds or even thousands of names. Lower down, the difference between one rank and the next becomes negligible. For instance, last year the #1 girls’ name, Olivia, was given to 4,459 more babies than the #5 name Sophia. The popularity difference between #901 Aspen and #905 Valery, meanwhile, was…zero. There was a 5-way tie, and the official rankings just listed the names in alphabetical order. Yet if you used the Reddit ranking graph, those values of 4,459 babies and zero babies would appear identical: 4 ranking spots apart.
To get a sense of what popularity rankings actually represent, here’s a plot of this year’s top 1000 girls, with rankings on the x axis and frequency on the y axis.
The big swoop downward in the Karen rank graph turns out to say more about rankings than about Karen. Once a name descends to the lower rungs of rankings it will start moving down faster and faster, even if its usage decline is steady. And the rankings movement will mean less and less.
This may all seem technical and fiddly, but it completely determines the conclusions the viewer draws, in this case to disastrous effect. In fact, I’d argue that the viral poster got the story backwards. Think of it this way: despite four years of unrelenting insults and derision, Karen is still a top-1000 U.S. baby name. In fact, it’s more popular than most other ‘50s favorites, like Susan, Deborah, Patricia and Donna. Isn’t that surprising? But by making the wrong kind of chart, the poster turned the surprising true story into a predictable-looking false story. Name data, and Karen, deserve better.