Thanks to everyone (well, almost everyone) for their comments and constructive critiques on our Geography of Hate map. In light of all of the different directions these comments have come from, we wanted to respond to some of the more common questions and misunderstandings all at once. Before commenting or emailing about the map, please keep the following in mind...
1. First, read our original post. Second, read through this FAQ. Third, read the "Details about this map" section included in the interactive map, itself. We specifically spent time on these things in order to explain our approach, and they go into some detail about the methods we used. Nearly all of the critiques of our map are already included in one of these venues. We're happy to engage and confident in our methodology (not that any approach is perfect), but please, use the skills your first teacher gave you and take the time to read.
2. If you are offended by these words, and we sincerely hope that you are, remember that they are the object of a research project. As such, we felt compelled to reproduce the words in full in order to be as clear as possible about our project. While we agree that the use of these slurs can be hurtful to some, especially the groups that they are targeted at, we believe that there is a difference between including them as the object of our study and using them as they are 'meant' to be used.
3. The map is based solely on geocoded data from Twitter, and does not reflect our personal attitudes about a given place. The map represents real tweets sent by real people, and is evidence that the feeling of anonymity provided by Twitter can manifest itself in an ugly way. If you feel that the place you live is more or less racist than somewhere else and this isn't reflected in the map, please start a conversation with your community about these issues.
4. In order to produce this map, we took the number of geotagged hateful tweets, aggregated them to the county level and then normalized this count by the overall number of tweets in that county. This means that the spatial distributions you see for the different variables are decidedly NOT showing population density. As we mentioned above, this is clearly stated in all of the previously written material accompanying the map. And because we are specifically looking at the geographic patterns of Twitter activity, it makes more sense to normalize by overall levels of Twitter activity than by population.
Were that not enough, however, the fact that there is so little activity on the map in California - home to an eighth of the entire US population, including the cities of Los Angeles, San Francisco and San Diego - should be a clue that something else besides population is at work in explaining these distributions. While we share with the infamous xkcd cartoon a distaste for non-normalized data, just because you thought for a second that maybe it was relevant in this case doesn't make it so. There are many possible explanations for some of the distributions that you can see, and we don't pretend to have all of the explanations. But population just isn't one.
5. This map includes ALL geotagged tweets for each of these words that were determined as negative. This is not a sample of tweets containing these words, but rather the entire population that meets our criteria. That being said, only around 1.5 % of all tweets are geotagged, as it requires opting-in to Twitter's location services. Sure enough, that subset might be biased in a multitude of ways when compared with the the entire body of tweets or even with the general population. But that does not mean that the spatial patterns we discover based on geotagged tweets should automatically be discarded - see for example some of our earlier posts on earthquakes and flooding.
6. 150,000 is in no way a "small" number. Yes, it is less than the total population of earth. Yes, it is less than the number of atoms in the universe. But no, it is not small number, especially as it is the total population of the phenomenon rather than a sample (see #5). And were one to extrapolate out that, considering these 150,000 geotagged hateful tweets are only around 1.5% of the total number of hateful tweets, the actual number of tweets (both geotagged and not) containing such hateful words is quite a bit larger. Regardless, we think that 150,000 is a sufficiently large number to be quite depressed about the state of bigotry in our country.
7. Furthermore, given that each and every geotagged tweet including the words listed was read and manually coded by actual human beings (if you consider undergraduates to be human beings!), rather than automatically by a piece of software, 150,000 isn't an especially small number. For students to read just these 150,000 tweets, it took approximately 150 hours of labor. This isn't insignificant.
Were that not enough, however, the fact that there is so little activity on the map in California - home to an eighth of the entire US population, including the cities of Los Angeles, San Francisco and San Diego - should be a clue that something else besides population is at work in explaining these distributions. While we share with the infamous xkcd cartoon a distaste for non-normalized data, just because you thought for a second that maybe it was relevant in this case doesn't make it so. There are many possible explanations for some of the distributions that you can see, and we don't pretend to have all of the explanations. But population just isn't one.
5. This map includes ALL geotagged tweets for each of these words that were determined as negative. This is not a sample of tweets containing these words, but rather the entire population that meets our criteria. That being said, only around 1.5 % of all tweets are geotagged, as it requires opting-in to Twitter's location services. Sure enough, that subset might be biased in a multitude of ways when compared with the the entire body of tweets or even with the general population. But that does not mean that the spatial patterns we discover based on geotagged tweets should automatically be discarded - see for example some of our earlier posts on earthquakes and flooding.
6. 150,000 is in no way a "small" number. Yes, it is less than the total population of earth. Yes, it is less than the number of atoms in the universe. But no, it is not small number, especially as it is the total population of the phenomenon rather than a sample (see #5). And were one to extrapolate out that, considering these 150,000 geotagged hateful tweets are only around 1.5% of the total number of hateful tweets, the actual number of tweets (both geotagged and not) containing such hateful words is quite a bit larger. Regardless, we think that 150,000 is a sufficiently large number to be quite depressed about the state of bigotry in our country.
7. Furthermore, given that each and every geotagged tweet including the words listed was read and manually coded by actual human beings (if you consider undergraduates to be human beings!), rather than automatically by a piece of software, 150,000 isn't an especially small number. For students to read just these 150,000 tweets, it took approximately 150 hours of labor. This isn't insignificant.
bitch
nigger
fag*
homo*
queer
dyke
Darky OR darkey OR darkie
gook*
gringo
honky OR honkey OR honkie
injun OR indian
monkey
towel head
Wigger OR Whigger OR Wigga
wet back OR wetback
cripple
cracker
honkey
fairy
nigger
fag*
homo*
queer
dyke
Darky OR darkey OR darkie
gook*
gringo
honky OR honkey OR honkie
injun OR indian
monkey
towel head
Wigger OR Whigger OR Wigga
wet back OR wetback
cripple
cracker
honkey
fairy
fudge packer
tranny
tranny
A * indicates a list of lexeme variations was used, which accounts for alternate spellings of words. For example, "fag" was not just "fag," but also "fags", "faggot", "faggie", and "fagging", among other things. All geotagged tweets containing these terms were examined. All tweets that were not used in a derogatory manner were discarded during coding, and as a result some words no longer achieved a minimum number to be displayed on the map. For example, honky/honkey/honkie was discarded, as most of the tweets were positive references towards honky-tonk music and not slurs aimed at white people.
In the end we were also constrained to words that could be manually coded, and words that could not. For instance, the 5.5 million tweets with reference to "bitch" were excluded from the list. Students were paid roughly $10 per 1000 coded tweets, and therefore including the word "bitch" alone would have cost roughly $55,000 to manually check for sentiment. Tranny/tranney would have been under $200. While we're obviously interested in including a wider range of hateful terms in our analysis, our research funds, and thus the scope of this project, are extremely limited. It's not like we have billions of dollars in funding lying around. If you feel strongly, feel free to donate to http://humboldt.edu/giving. and enter "The Geography of Hate Project" in your comments.
9. If you are a disgruntled white male who feels that the persistence of hatred towards minority groups is a license to complain about how discrimination against you is being ignored, just stop. You can refer to all of our previous commentary on this issue from November. Though we have typically refrained from deleting asinine comments to this effect - those who choose to make these comments do more to prove themselves to be fools than we ever could - we fully reserve the right to delete any and all comments we believe to be unnecessary.
9. If you are a disgruntled white male who feels that the persistence of hatred towards minority groups is a license to complain about how discrimination against you is being ignored, just stop. You can refer to all of our previous commentary on this issue from November. Though we have typically refrained from deleting asinine comments to this effect - those who choose to make these comments do more to prove themselves to be fools than we ever could - we fully reserve the right to delete any and all comments we believe to be unnecessary.