February 25, 2011

Progress on the Autocomplete Data

We want to pass along a big FloatingSheep thanks to those who have posted their autocomplete search data. (See the instructions below if you'd like to contribute). Thanks to...

  • Carpiediem-Hong Kong Island
  • Katharine-Cardiff, UK
  • Daniel-Montreal, QC
  • Sarah-Karlsruhe, Germany
  • Megan-Ithaca, NY
Your participation has been a great help and has allowed us to take the first look at some of the spatial differences between autocomplete. Click on the image below to get a bigger image of some the results so far. The cells in yellow are ones that stand out as different/interesting.

  • The U.S. remains very focused on the 50 year old events of the Cuban missile crisis while other places are clearly not. We're particularly amused that Cuba Gooding Jr. is the top result in Hong Kong.
  • While we feel a bit bad for the country of Israel we can't help but love the fact that Israel Kamakawiwo'ole seems to be the top ranked result for the keyword "Israel" the world over.
  • We also suspect that Macau is likely not thrilled with its association with Macaulay Culkin, although the results for Hong Kong (ferry) show that proximity can win out over celebrity.
  • The connection between Kazakhstan and adoption in the U.S. and Canada is interesting as we've previous discussed.
  • The India, Canada, Nicaragua and Hong Kong results are all good examples of localization in search results.


We encourage all of you to help us continue to flesh out our Autocomplete map of the world by contributing your own search results using Autocomplete. If you follow the link you will find an open spreadsheet to record the results. But before you do, please follow the steps below to make sure that your results are not being affected by your own search patterns.

It's important that your searches not be tailored to you as an individual, but be reflective of your general location, the time at which you searched and Google's suggested search results. In order to control for these factors (and a whole host of others), please follow these directions if you want to contribute:

  • Be sure to be signed out of your Google Account while you search on Google. If you are signed in to your Google Account, your search experience may be customized based on your own personal past searches (which would no doubt be fascinating to your friends and family) but is NOT what we're after. Learn how to turn off these customizations.
  • Remove particular searches from your Web History at www.google.com/history, or by clicking the "Remove" links that appear beside personalized predictions.
  • Remove Web History from your Google Account.
  • Type the country name without a space at the end. If you're curious, compare how "Australia" and "Australia " (notice the space) give different results.
  • Copy the FIRST search term that comes up in to the online spreadsheet.
  • There are 250 or so countries names in total. If you can't do all of those, we've prioritized the top 150 for you to complete.
  • We're primarily interested in people outside of the US and the UK but we'll take anyone's help, no matter where they are. Also we're limiting ourselves to the English name of countries for now.
Good luck and thanks!

February 18, 2011

The Ephemerality of Search

Google announced yesterday that search was becoming more social. We won’t go into the technical details in this post (the NYT provides a useful overview), but the basic point behind the tweaking of their interface was to allow results to incorporate information that your friends and contacts find relevant and share on platforms like Twitter, Linkedin and Facebook.


This seems to be Google’s final move to ensure the ephemerality of the search experience. Google has already made search a highly personalised experience in both space and time.

Search results have always been temporally unfixed (a search for the same topic last month, yesterday, today, and tomorrow all can yield different results). However, this trend is speeding up to the point that Google will maintain a real-time index of the Web. What is important here is that both the algorithms used and the information that they harvest and rank are constantly changing.

More recently the geography of results has also become unfixed. Our work analysing Google’s autocomplete in different locations tries to highlight some of these differences. The same search at the same time from two different locations can yield dissimilar results.

The search experience has also been personalised, not only through the memory of links that we highlight or star, but by triangulating results with other personal information that Google knows about us. The happy birthday doodle below is just one example of how this sort of personalisation is enacted.



And now, not only are results individually, temporally and geographically targeted, but also socially specific. My results are now no longer just dependent on my positionality in time and space, but also the time and space positionalities of my entire social network.

This is important due to the powerful links between representation and repetition. We are served information, we act on it, and we thus reproduce and reinforce those representations. This cycle opens up possibilities for a path-dependence of the powerful to be enacted and re-enacted.

Google has received (often warranted) criticism over the ways that it represents, ranks, structures and sorts. Yet despite its general opacity, it had a knowable presence of sorts. Its actions could be observed, and thus criticised and challenged.

However, it is now increasingly difficult to know how Google is “organizing the world’s information.” How do we map and measure, study and critique this increasingly ephemeral tool that so many of us rely on for our informational needs? This will be an increasingly central question for those of us concerned about representations, rankings, and our ability to recreate and challenge them.


See also:

- Ethan Zuckerman on "Listening to Global Voices"
- Zook and Graham on "Google and the Privatization of Cyberspace and DigiPlace."
- Thanks to Monica Stephens for the link to the story.

February 17, 2011

Autocomplete Part III: The Automatic Completion of Place(names)

The results of Google autocomplete results gathered in Lexington, KY USA and Oxford, United Kingdom (and hopefully other locations submitted by Floatingsheep readers) show how different Google's autocomplete suggestions are from place to place. Having highlighted examples in an earlier post we now want to go a bit further and think about what these results mean relative to some of the broader processes we've outlined in our research.

One of the fundamental questions (that we note and has come up in various places over the web) is why, when attempting to replicate the searches in Dorothy Gambrell's map of the United States, the results come back different for different people.

Some have argue that this is flaw in the mapping but actually this difference is inherent in the function of Autocomplete. Moreover, it potentially is a means of getting a better understanding of (1) how search varies over space and (2) how Google's search algorithm works (at least until they tweak it again).

In the interest of not reinventing the wheel, it's best to simply let Google explain the idea of Autocomplete themselves:
“As you type, Google's algorithm predicts and displays search queries based on other users' search activities. These searches are algorithmically determined based on a number of purely objective factors (including popularity of search terms) without human intervention. All of the predicted queries shown have been typed previously by Google users. The autocomplete dataset is updated frequently to offer fresh and rising search queries. In addition, if you're signed in to your Google Account and have Web History enabled, you may see search queries from relevant searches that you've done in the past.”
Although Google's Autocomplete feature isn't inherently spatial, the parallels to our concept of "DigiPlace" are significant. The three central characteristics of DigiPlace, as outlined in the Zook and Graham 2007 article in Geoforum are...
  1. DigiPlace is automatically produced.
  2. DigiPlace is highly individualized.
  3. DigiPlace is dynamic.
Although it's not entirely necessary to rehash the arguments Matt and Mark make in that article, the implications for an analysis of Autocomplete are plentiful. Using DigiPlace as something of a framework for thinking about Autocomplete, the differences between Dorothy's original map and the many attempts to reconstruct it, including our own, are not bugs, but features. It is the 'automatic production' of these results - that is, the fact they are generated by a complex software algorithm - that makes this process hard for many to understand. The differentially produced search results based on a combination of a user's location, time of search and search history are intentional. They are meant to provide for a highly individualized series of search results based on what Google knows, or thinks they know, about you.

In some sense, however, the idea of Autocomplete runs counter to the individualization of experience online. Indeed, the entire idea is to suggest things that you may be searching for based on what others have already searched for. So Autocomplete is simultaneously guiding users along a particular search path that has been made by others, but one that is also constructed based on the individual's own interests. The fact that this path can be continually redrawn over time, however, further complicates the process. Sudden surges of interest in a certain topic, perhaps based on recent events or news items, may cause Autocomplete to generate an entirely new set of suggested searches than were previously available. However, Autocomplete doesn't allow one to go back in time to view what searches were suggested prior to the present circumstances.

To summarize, the extent to which Google's Autocomplete can be explained by DigiPlace is probably unknown. Regardless, we think DigiPlace provides a pretty good heuristic for thinking about the social implications of new web applications like Autocomplete and how the spatio-temporal context of internet activity is very much important in producing our experience of these technologies.

Editor's Note: Taylor apologizes for the lame attempt at a play on words using the title of a seminal Thrift and French paper, "The Automatic Production of Space". He has been appropriately shamed and has been tasked with compiling the autocomplete results for the entire Oxford Unabridged Dictionary in an effort to keep him out of trouble.

February 14, 2011

Autocomplete Part II: Crowdsourcing the Geography of Autocomplete

We encourage all of you to help us continue to flesh out our Autocomplete map of the world by contributing your own search results using Autocomplete. If you follow the link you will find an open spreadsheet to record the results. But before you do, please follow the steps below to make sure that your results are not being affected by your own search patterns.

It's important that your searches not be tailored to you as an individual, but be reflective of your general location, the time at which you searched and Google's suggested search results. In order to control for these factors (and a whole host of others), please follow these directions if you want to contribute:

  • Be sure to be signed out of your Google Account while you search on Google. If you are signed in to your Google Account, your search experience may be customized based on your own personal past searches (which would no doubt be fascinating to your friends and family) but is NOT what we're after. Learn how to turn off these customizations.
  • Remove particular searches from your Web History at www.google.com/history, or by clicking the "Remove" links that appear beside personalized predictions.
  • Remove Web History from your Google Account.
  • Type the country name without a space at the end. If you're curious, compare how "Australia" and "Australia " (notice the space) give different results.
  • Copy the FIRST search term that comes up in to the online spreadsheet.
  • There are 250 or so countries names in total. If you can't do all of those, we've prioritized the top 150 for you to complete.
  • We're primarily interested in people outside of the US and the UK but we'll take anyone's help, no matter where they are. Also we're limiting ourselves to the English name of countries for now.
Good luck!

February 09, 2011

Autocomplete Part I: Mapping the World of Autocomplete

Building on the recent fascination with the United States of Autocomplete map, we thought we'd expand its premise to look at the entire world. In short, we'd type the name of every country into Google and record the top ranked autocomplete, i.e., Google's guess on what you are looking for. Once we started working, it quickly became apparent that the results we were getting in the U.S. sometimes differed dramatically from the results we found in the United Kingdom.

Suddenly what had been a simple mapping exercise became an exciting means of better understanding the geographic differences in search patterns. Cool! You gotta love it when stuff like that happens.

Because it's hard to fit so much data in a static map, we've created a mashup that you can download as a KMZ file and view in Google Earth. (By the way, we hope you like the iconography. We've been looking for a good excuse to use it). As the map is a bit complicated a few words of explanation.
  • We used a list of countries maintained by the CIA World Factbook. Obviously this exercise can be replicated with any other list of place names.
  • We conducted the searches in January 2011.
  • The icons are generally centered over the capital city of a country.
  • The blue icons represent the autocomplete results obtained in the U.S. (specifically Lexington, KY) and the red icons (offset a bit for readability) represent the results from Oxford, UK.
  • The label for each icon contains the search term, the location of the search and the top ranked autocomplete result. For example, the label "India (UK): indian visa" indicates that the first autocomplete entry in Oxford was "Indian visa".



Take it out for a spin and see what you find. What we've noticed from this exercise is that the location of the searcher clearly matters. We're not exactly sure how Google decides what other searches to include in its autocomplete (nor do we think they will tell us) the differences in our results provide some clues.

  1. Google autocomplete is incorporating geocoded data. The best example of this is that in Lexington searches on terms China and Nicaragua return "China Star Lexington KY" and "Nicaraguan Grill Lexington KY"; two local restaurants in the city (by the way, the Nicaraguan Grill makes a great Nacatamale). This same geocoded effect does not show up in Oxford but the Lexington results show that there is a blending of regular search and spatial search.
  2. Second, the autocomplete suggestions appear to be shaped (in part) on the makeup of other user searches in geographic proximity. The example of the restaurants above support this idea as well as does the results for India in the U.K. and U.S. Whereas, "indian visa" is the first suggestion in the U.K. (reflecting the long colonial and migration connections) the first suggestion in Lexington is "Indianapolis Colts", a football team based only a few hundred miles away. Likewise a search for Panama in Lexington results in "Panama City Beach" (located in Florida) rather than "canal" as found in the U.K.
  3. Third, and perhaps most intriguing, is the way these differences illuminate the varying ways in which countries are conceived of (at least in terms of search queries) in separate locations. For example, in Lexington, both Kazakhstan and Bulgaria generate the suggestion of "adoption" (decidedly different that the U.K. results) perhaps linking these countries in the minds of near-Lexington based searchers with international adoption. While these countries are not the largest source of adopted children (China and Russia are 1 and 2) Bulgaria and Kazakhstan (in particular) are connected to the U.S. via adoption and moreover are less likely to have other competing searches. Hence adoption is the first suggestion. In a similar vein, a search for " British Indian Ocean Territory" in Lexington suggests "flag" while in Oxford "holiday" is the top result.
  4. There is also a clear element of temporal closeness. The search for North Korea results in "bombs South Korea" which was an important news story during our searches.
  5. It is also clear that correctly interpreting a user's intent based on limited input remains a challenge. A search for Turkey results in the suggestions of "brine" and "cooking time".
  6. Finally it seems that autocomplete suggestions are susceptible to spamming efforts the strong presence of commercial/business representations online. For example, "tractor parts" is the top result for a search on the term Belarus in Oxford most likely because the domain Belarus.com is for tractor manufacturer. Again, the low level of Belarusian references online is likely also contributing to this.

While these results are really enlightening getting a larger sample of searches from a range of locations is important to help explore this phenomenon. And this is where you dear reader come in. Stay tuned for the next post when we work on crowdsourcing the geography of autocomplete.

February 03, 2011

Wikipedia Demographics

We've written a fair amount about the geographic and linguistic clusters of Wikipedia authors but were reminded today (via New York Times "Room for Debate" forum") that there are plenty of other clusters along social and economic dimensions. Last year a survey of Wikipedia users was conducted which highlights some interesting fissures within the user group.



One of the most provocative findings (and the one highlighted by the New York Times forum) is that less than 15 percent of the regular contributors to Wikipedia are women. This really grabs one's attention but a closer look at the data report (see also here and here) makes us wonder if this figure accurately reflects the Wikipedia community. Some of the questions are:

  • What was the sampling method used? Nothing is listed in the reports.
  • What is the bias in the sample? For example, Russia and Russian speakers are the largest language and country groups represented in the survey even though the Russian section of Wikipedia is only the 8th largest linguistic group. (English, German, French, Italian, Polish, Japanese and Spanish are all larger).
  • Did women have a lower participation rate then men in the survey? There were three times as many male respondents as female respondents. Does this accurately reflect the makeup of the Wikipedia audience? Given the unexpected results for language and country, it is not clear if there might be gender bias as well.
All this said, we find the question of an imbalance in gender participation very intriguing and important. We just don't know if the survey methods used are such that we can be confident in the magnitude of the highlighted differences. Anyone who can shed some light on this would be more than welcome to comment.

February 01, 2011

Problem Points on new UK Police Maps

Today's launch of police.uk by the Home Office provides the highest resolution mapping of crime data available in the UK to date. The website supports searches at the level of unit postcode (similar to a zip code) and returns results mapped at the street level. In previous UK crime maps these have typically focused on area aggregations using administrative or census geography (e.g., the London MET Police website). However, this new website appears to place points on maps at locations of where crimes have occurred... or does it? I will not argue here for the general merits of releasing crime data to the public in , or what does or should constitute a “crime”, nor those problems with how these events are recorded and georeferenced. Far better treatment of these issues is given by my PhD student Paul Richards over on his blog.

However, there appear to be some serious representational issues in this new mapping system which are not clearly documented and could be very misleading for the ill informed. Very generally, crimes will typically happen at a specific location, for example, a house could be burgled, or a person mugged. Ideally, this location would be represented as a point on a map where the event was recorded as happening.

In a US equivalent system (e.g. http://chicago.everyblock.com/) it is entirely possible to map these very precise locations as there are different privacy laws related to the disclosure of these sensitive data. However, in the UK, law requires more aggregate representations to be used, such as areas, and most typically being represented as choropleths. For example, you could show the frequency of burglaries or muggings that have occurred in a specific area.
Although the documentation on launch was scant, it appears that the locations of crimes have been linked and aggregated by their nearest road segment, and that these have then been subsequently displayed as a point on the map. It is unclear whether this point is a randomly chosen along the road, or, whether this is the centroid of the street segment. Either way, it is a very poor representation of the data. Outside of issues related to how you appropriately position a point for very long road, if the street is going to be the aggregating unit for the data, then this should also be used for the visualization. For example, roads could have been variably colored for different rates of crime (rates not counts... this is another representation issue entirely!!). Systems are not a limitation here, using the combination of OpenStreetMap, Mapnik and OpenLayers it is entirely possible to build customized and bespoke online cartography. We do not have to rely on putting points on maps any more as our only representational option.

The problem with this website as it stands is that crimes are easily misinterpreted as happening at a very specific locations. If your house happens to be located next to one of these points it may suddenly appear to an uninformed user that there is a lot of crime in this specific area. For example, perhaps public order offensives related to a pub on a street are returned as occurring at a residential location. How might this effect a house price? Would household insurance rise?

These basic representational issues are typically covered in an undergraduate syllabus with a GIS component. To me at least this perfectly illustrates why Geography and GIS training is as important as raw technical skills when developing online mapping portals. This type of issue will not go away as these types of website become more prevalent as the open data movement grows; and more typically this are built by or without consultation with Geographers.
---
This guest post is written by Alex Singleton

Celebrating the One Year Anniversary of America's Beer Belly

Today is an important day, but you probably don't know why. In the lore of Floatingsheep, February 1st is a very important day...

One year ago today, the wonders of America's Beer Belly, as discovered by the Floatingsheep Collective, were announced to the world-at-large. By far the most popular single post in our relatively short history, the Beer Belly of America was eventually featured everywhere from The New York Times and Andrew Sullivan's blog on The Atlantic to Strange Maps, FlowingData and the Consumerist.

The Beer Belly of America
Our extrapolation that the prevalence of bars as compared to grocery stores in the American upper Midwest (using directory listings from Google Maps) was indicative of some cultural characteristic may or may not have been especially daring. But one thing is clear: in addition to the official statistics from the Census Bureau, the innumerable comments generated on this blog and many others served as corroboration for our claims.

Whether it took us 'discovering' it, or just giving it a name, we now know that Wisconsin, Illinois and much of the Great Plains are the true Beer Belly of America.