Pages

November 17, 2009

Mapping Wikipedia

The following maps are the first of a series that will be made in order to map out the distinct geographies of Wikipedia. Many Wikipedia articles (about half a million) are either about a place or an event that occurred within a place, and most of these geographic articles handily contain a set of coordinates that can be imported into mapping software.

The map below displays the total number of Wikipedia articles tagged to each country. The country with the most articles is the United States (almost 90,000 articles), while most small island nations and city states have less than 100 articles. However, it is not just microstates that are characterised by extremely low levels of wiki representation. Almost all of Africa is poorly represented in Wikipedia. Remarkably there are more Wikipedia articles written about Antarctica than all but one of the fifty-three countries in Africa (or perhaps even more amazingly, there are more Wikipedia articles written about the fictional places of Middle Earth and Discworld than about many countries in Africa, the Americas and Asia).


When examining the data normalised by area, an entirely different pattern is evident. Central and Western Europe, Japan and Israel have the most articles per square kilometre, while large countries like Russia and Canada have low ratios of Wikipedia articles per area.


Finally, the data were also mapped out against population. Here countries with small populations and large landmasses rise to the top of the rankings. Canada, Australia and Greenland all have extremely high levels of articles per every 100,000 people. Smaller nations with many noteworthy features or geotaggable events also appear high in the rankings (e.g. Pitcairn or Iceland).

Presences and absences play a fundamental role in shaping how we interpret and interact with the world. The fact that the geographies of Wikipedia content are so uneven therefore leads to worrying conclusions. As we increasingly rely on peer produced information, large parts of the world remain a digital 'terra incognita' (in a similar manner to the ways in which many of those same places were represented on European maps before the 19th Century).

More maps examining the distribution of content in specific languages, and looking in more detail at specific regions will be uploaded soon.

9 comments:

  1. Check out this map too:
    http://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png

    ReplyDelete
  2. One thing that's slightly confounding for the results is that there's a bigger bias in *tagging* than in actual coverage--- almost all U.S. and European articles about a specific location are geotagged, while many for the rest of the world don't have coordinates yet.

    ReplyDelete
  3. I agree that not all articles about a location (or an event that took place in a location) are geotagged, but I haven't yet seen any evidence to support the idea that there is a systemic bias towards the tagging of North American and European articles. Many article stubs in the rest of the world seem to have geotags, even when they contain very little text.

    ReplyDelete
  4. Yeah, it'd need an actual survey to determine one way or another. My impression is mainly from working on articles about architecture, where most of the US, Canadian, and European buildings (and some Japanese) seem to be tagged with a geotag that shows their exact location, while very few of the South American ones have a geotag. Geotagging of towns/cities seems to be more uniform.

    ReplyDelete
  5. Another bias that skews these maps -- although only slightly, I'll admit -- is that some geographcial units do not readily lend themselves to geotagging. Assigning a latitude & longitude to an object works better the smaller it is: I have written articles on the 500-odd local administratives units in Ethiopia called woredas, & had these been geotagged the values for that African country would have been higher on at least one of the maps presented here.

    Geoff

    ReplyDelete
  6. Thanks Geoff. That is a very good point. I notice that counties in the UK are all geotagged, but that Irish provinces (generally being at a different scale) are not. Just to confuse things, some US states are geotagged despite being much larger than UK counties or Irish provinces. This is definitely something I'll look into in more detail.

    ReplyDelete
  7. You relate tags to population size to neutralise the effect of the country size on the probability of a Wikipedia article. A better standard to compare against seems the Internet population to me. There will be no large differences in the results in high-diffusion countries but quite a lot in African or other low diffusion countries.

    But I guess that the visibility of locations in wikipedia is quite influenced by the attention geography of tourists. So relating the no of geo-referenced wikipedia articles to the no of visitors might well help to explain the visibility of a place in wikipedia.

    And there is of course the quetsion of language. Did you work about place tags in the English-language wikipedia or in other language wikipedias? If was for the English version of Wikipedia only the blind spot in French-language Africa is not surprising.

    ReplyDelete
  8. @ft: I agree about the need to compare against Internet population. I'm preparing a file containing exactly that data and hope to make the maps soon.

    These maps display the number of tags in all language versions of Wikipedias combined.

    ReplyDelete
  9. Very interesting maps, I have actually proposed them to analyze for a cartogrphic exam! Just one thing, colors can be misleading when representing absolute numbers, as far as I am concerned, I usually use proportional circles. Good job though!

    ReplyDelete

Note: only a member of this blog may post a comment.