Here at Floatingsheep, we've spent the last several years trying to demonstrate the potentials, as well as the pitfalls, of using user-generated internet content for geographic research [1]. A key focus has been how the online world of social media at times reflects, and at other times distorts, our understandings of the offline, material world.
One way we've shown this previously is through mapping linguistic differences in online content. We've looked at everything from the way references to beer and soccer in different languages match up neatly with national borders, to differences in the spatial patterns of certain terms in English and Thai in Bangkok, and inter-regional differences in language in places like France, Belgium, Canada and Israel/Palestine.
With all the recent hoopla around the geographies of language, we wanted to return to this topic, using a relatively straightforward example: the geography of y'all. No, not the geography of each and every one of you, the geography of the word "y'all" (see definition below).
Rather than conducting a survey to measure the term's usage, we decided (after careful thought and rigorous debate) to do something new and use geotagged tweets [2]. Searching all of the geotagged tweets in the United States from July 2012 through March 2014 for variations of "yall" (the most commonly used y'all, as well as ya'll and yall to capture typos or alternative spellings), we found a total of 1,870,687 tweets using this folksy second-person plural pronoun, more than enough to make some definitive conclusions (or at least some maps).
Using only the absolute number of tweets with references to y'all to begin, this is clearly a geographically-specific phenomenon. While some places are extremely saturated with references (we'll get to these in just a sec!), there are 250 counties in the United States with no y'all tweets whatsoever, and approximately 60% of the country's 3,143 counties had fewer than 100 y'all tweets in the nineteen month period from which our data originates.
Still using only these absolute numbers of tweets referencing y'all, Texas, Georgia, Florida, North Carolina and California make up the Top 5 states, while the cities of Dallas, Houston, Chicago, Philadelphia and Los Angeles make up the Top 5 metro areas. And while not exactly mimicking population distribution, there is something clearly suspect about believing that folks in Los Angeles, Chicago and Philadelphia say y'all more than good old fashioned Southerners do. So, to make the map below, we instead normalized the county-level data by the total number of tweets originating in those counties during the same time period.
Geotagged Tweets Referencing Y'all, July 2012 - March 2014
On the broadest level, all suspicions and previous research on the matter is confirmed using our normalized tweet dataset: y'all is much more likely to be uttered (or tweeted) in the South than in any other part of the United States... or even the world, for that matter, as there are approximately sixteen times more references to the term in the USA than in the rest of the world combined [3]! But even still, there are some interesting anomalies worth commenting on...
Using these normalized values, we can see a new hierarchy emerge at the state level, with Louisiana, Alabama and Georgia having the highest relative number of tweets, much more in line with what one would expect. At the county level, 97 of the top 100 normalized values are located within the south (by practically any definition). The only three counties outside of this region in the top 100 are Boundary County, Idaho, Dawson County, Montana and Goshen County, Wyoming, the first two of which surprisingly rank #2 and #3 overall in these normalized rankings, led only by Talbot County, Georgia, the epicenter of y'all-related tweeting [4]. But even the South isn't homogenous when it comes to the usage of y'all, as the central Appalachian region of eastern Kentucky, West Virginia and southwest Virginia remains relatively untouched by Twitter references to y'all, despite being more-or-less surrounded by them. Indeed, Kentucky (spiritual homeland of Floatingsheep) is relatively sparse in references to y'all, despite selling these extremely expensive sweatshirts that attempt to capitalize on the state's southern charm.
Apart from some of these slight anomalies, much of this should come as no surprise to anyone who has spent much time in -- or even knows somebody from -- the South. So we thought it might be interesting to compare our own map to a handful of similar maps that have been circulating around the internet recently.
Some Other Maps of Y'all
The first map shows a stark north/south divide between the places that say "you guys" and those that say "y'all" (and, well, Pennsylvania, the western portion of which is also known for its use of "yinz"). The second map, taken from the New York Times interactive dialect quiz, developed by Joshua Katz, largely resembles our own map, but seems to place the epicenter of y'all much further west than our own, in southeast Louisiana, bleeding over somewhat into Mississippi.
So while there is some general agreement that Louisiana, Mississippi, Alabama, Georgia and Texas form the territorial heart of y'all, our work, along with the data from the Times' dialect survey, disputes the cut-and-dry story told by the first map. While it shows significant portions, if not all, of Missouri, Oklahoma, Arkansas, Kentucky, West Virginia and Tennessee, among others, firmly in y'all country, the dividing line appears to be both quite a bit further south, and quite a bit more squiggly [5] in nature. While some conventionally Southern states have only relatively confined pockets of references to y'all in our dataset (as well as in the Times' data), it's equally important to recognize that there are pockets of y'all densely concentrated in some more far flung areas of the country as well.
But ultimately as long as you have a group of friends worth using a second-person plural pronoun -- contracted or otherwise -- in reference to, we imagine you're doing just fine.
Y'all come back now, y'hear?!
-----[1] Wait, wait, wait... there are pitfalls to this?!?!?!?!
[2] This was also the most convenient data to use, since we had them lying around.
[3] The Bahamas and South Africa come in at #2 and #3 globally in references to y'all.
[4] We suspect that Talbot County, Georgia is the epicenter of exactly nothing else. Although we fully expected that someone from there will angrily correct us very shortly.
[5] That's a technical cartographic term.