Yearly Archives: 2011

Paper “Wikipedia research and tools: Review and comments”

The draft paper “Wikipedia research and tools: Review and comments” by Finn Arup Nielsen (dated March 17, 2011) is a very useful 56-pages resource highlighting key areas of research for Wikipedia (with citations to relevant work already published). The key areas identified are in the following. The cited papers (with annotations!) are 236! Even if this is draft paper, it is a super valuable resource! Check the pdf file.

Identified key research areas. Quality, Factual errors, Coverage and bias, Actuality, Sources, Accessibility, Size across languages, Network analysis, matrix factorizations and other operations, Genre, Article feedback, Vandalism reversion, Biased editing, Use of Wikipedia in court, User contributions, User characteristics, Organization, Popularity, Why do people edit?, Why do people leave?, Why does it work?, Serving content, Using categories, Thesaurus construction, Translation, Trend spotting and prediction, Searching with Wikipedia, Databasing the structured content, Geography, Extending Wikipedia, Quality assessment, certification and rating, Automatic creation of content, Tables and databases, Semantic wikis, Form-based editing, Markup, Extended Authoring, Geographical extension, Extending browsing, Graphic extensions, Video extensions, Real-time editing, Distributed and disconnected Wikipedia, Wiki and programming, Using Wikipedia and other wikis in research and education, Attitude towards Wikipedia, Use of Wikipedia, Citing Wikipedia, Special wikis, Censorship, Carl Hewitt vs. Wikipedia, Wikipedia and wikis as a teach- project, Wikiversity, serves the purpose of building a ressource for teaching and learning tool, Using wikis for course communication, Textbooks, Future.

Abstract: I here give an overview of Wikipedia and wiki research and tools. Well over 1,000 reports have been published in the field and there exist dedicated scientific meetings for Wikipedia research. It is not possible to give a complete review of all material published. This overview serves to describe some key areas of research.

Credits: Image by XKCD released under a Creative Commons Attribution-NonCommercial 2.5 License.

The Joy of Stats by Hans Rosling: 4 minutes to show evolution of 200 countries over 200 years.

“I kid you not, statistics is now the sexiest subject on the planet” – Hans Rosling
In this spectacular section of ‘The Joy of Stats’ by BBC, using augmented reality animation, Rolsing tells the story of the world in 200 countries over 200 years using 120,000 numbers – in just four minutes. Plotting life expectancy against income for every country since 1810, Hans shows how the world we live in is radically different from the world most of us imagine.
More incredibly amazing videos by Rosling at GapMinder.

Video of evolution in time of the Wikipedia page about London bombings

History unfolding from phauly on Vimeo.

7 July 2005
08.50 London is struck by three bombs.
09.18 (just 28 minutes later) on Wikipedia, the user Morwen creates the page “7 July 2005 London bombings”.
10.38 76 different Wikipedians made 250 edits to this page already, trying to make sense of reality in realtime …
By the end of the day the Wikipedia page “7 July 2005 London bombings” have been edited 2581 times!

The video “History unfolding” shows the evolution in time of the Wikipedia page “7 July 2005 London bombings”. Technically, I extracted from the API all the revisions of the Wikipedia page and I got a screenshot of each of them using Firefox with Page Saver extension running on an X virtual framebuffer (I tried khtml2png but I was unable to install it). Then I put together all the screenshots with mencoder and added the audio.
Wikipedia pages are released under the Creative Commons Attribution-ShareAlike License. The soundtrack I added is Unfinished History by Johaness Gilther, released on Jamendo as Creative Commons Attribution-NoDerivs. So my video is released under Creative Commons Attribution-ShareAlike License. Enjoy!

The video is just one example of history unfolding under your eyes as it develops, of how people create their collective memories in real time.
We can now investigate how we, as a society, create our world, our perceptions of the past.
Now we can research past, present and future! And control it together!

“Who controls the past, controls the future; who controls the present, controls the past.”
Nineteen Eighty-Four – George Orwell

Wikipedia mentioned in books in 1975

UPDATE: Dami, in a comment to this post, says “if a word appears in a newer edition of an older work (e.g. in the introduction section of cheap reprints of public domain books) Google will count it as an appearance at the time the original work was published.” I checked and this is true, thanks Dami!

I was playing with Google Books Ngram Viewer, which allows you to check how frequently certain phrases occurred in books published since 1950 up to 2008.
Curiously the following graph reports that some books (only 0.0000011% but greater than zero anyway!) were containing the work “wikipedia” (and “wiki”) already in 1950 and in 1975. Maybe there is a small bug even in mighty google services?

The following graph instead shows the increase (as expected) of mentions to “wikipedia” and “wiki” in books since 2003.

Percentage of men and women on different social networking sites (Facebook, Twitter, Linkedin, …)

Lots of debate arose around the fact almost 87% of Wikipedia editors are male. This is not necessarily true since the survey on which this “fact” is based has some biases (for example, people self-elected to answer).
However, a query run on the Wikipedia database showed that more than 83% self-identified as male.
While these numbers are not 100% representative of reality, it is probably true that most of editors are male. This is acknowledged also on a Wikipedia page about the systemic bias of Wikipedia (yes, I know this very page has been written by people whose bias we are trying to interpret but, going to the extremes, it’s turtles all the way down ;)

So the question could be: what is the ratio male/female on other social networking sites?

Just, for comparative reasons (and a bit for fun too), I compiled the following table based on the Social Network Analysis Report by Ignite Social Media. The table is sorted so that first lines are sites in which there are relatively more females than males. I’m not familiar with all the sites but it seems that sites more populated by women are the very social and playful (such as Haboo, Bebo, Myspace, Xanga, Facebook). On the other side of the spectrum there are sites populated most by males: sites showing what’s interesting right now thanks to social bookmarking such as Reddit, Digg, Identi.ca, and “professional” network sites such as Linkedin and Plaxo.
This table is not “scientific” in any way as well (for instance, percentages in the report are gathered from Google Ad Planner and Google Insights for Search).
Consider the following table just as more food for thought. Does it confirm your intuitions? Or should I say prejudices? ;)

  Social network site Percentage of females
Habbo 66%
Bebo 62%
Myspace 62%
Xanga 62%
Facebook 55%
Ning 55%
Hi5 52%
Meetup 52%
Tribe 52%
Twitter 52%
Yelp 52%
Flixster 50%
Foursquare 50%
Friendster 50%
Flickr 48%
Last.fm 48%
Livejournal 48%
Metafilter 48%
Multiply 48%
Plaxo 45%
Stumbleupon 45%
Badoo 43%
Mixx 43%
Linkedin 40%
Netlog 40%
Newsvine 40%
Plurk 40%
Identi.ca 34%
Digg 32%
Indianpad 24%
Reddit 24%

Credits: Icons by socialshift, elegantthemes and WpZoom.

Percentage of men and women on different Wikipedias

Few days ago there was an interesting article on NYTimes about the small percentage of women on Wikipedia.
Today on the gendergap mailing list at wikipedia there is a very interesting ongoing discussion. Some preliminary statistics from the discussion are:

Wikipedia in specific language Number of users who specified gender in preferences Percentage of users who specified gender in preferences How many men How many women Percentage of women
English
http://en.wikipedia.org
13959842 2.01% 233312 46973 16.76%
German
http://de.wikipedia.org
1167708 3.47% 35726 4800 11.84%
French
http://fr.wikipedia.org
998668 2.16% 18556 3054 14.13%
Serbian
http://sr.wikipedia.org
78180 2.66% 1666 414 19.90%
Russian
http://ru.wikipedia.org
620393 16.80% 80491 23750 22.78%
Polish
http://pl.wikipedia.org
414511 3.64% 12106 2999 19.85%
Dutch
http://nl.wikipedia.org
368815 2.92% 8977 1781 16.56%
Commons
http://commons.wikimedia.org
1464442 2.26% 27980 5070 15.34%

Interesting to note how on Russian Wikipedia, users tend to express their gender much more (16.80%!). Do you have ideas if (1) this is a cultural issue specific of Russians, (2) it depends on the practices of the specific Wikipedia in Russian or (3) it depends on the user interface, for example it might be that when you register you are redirect to an HTML page in which you can specify also your gender?
Also interesting is the fact that in this Wikipedia the percentage of women is the highest (22.78%). Probably the reason is that in a place in which gender is more represented, it is more normal for women to represent it as well. While where gender it is not represent, it is in general foolish for women to explicitly say “Hey, I’m female!” in order not to attract (additional) unwanted messages. Or put in other terms, OMG Girlz Don’t Exist on teh Intarweb!!!!1.


Img by nojhan, under Creative Commons

Professor: What is an encyclopedia? Student: Is it something like Wikipedia?

I was viewing the presentation by Steven Walling titled “Why Wikipedians are the Weirdest People on the Internet” (embedded below) and the second slide was a twit by alisonclement which says:

Yesterday I asked one of my students if she knew what an encyclopedia is,
and she said, Is it something like Wikipedia?

Amazing! Changing times indeed, I remember when I was a kid and one of the most valuable things in our house was a 20-something volumes encyclopedia, admiringly and respectfully placed at the center of our best cupboard … ;)

Larry Wall talk in Povo

On February 17, 2011 at 11:00, Larry Wall, creator of the Perl programming language, will give a talk in Povo (where I work), organized by CoSBI (The Microsoft Research – University of Trento Centre for Computational and Systems Biology).
The title is “That Goes Without Saying (or Does It)” and the abstract is:
Linguist Roman Jakobson famously said, ‘Languages differ essentially in what they must convey and not in what they may convey’ Contrary to the Whorf-Sapir hypothesis, your language of choice does not generally prevent you from thinking certain thoughts, but your language can certainly make it easier or harder to express those thoughts. Lately I’ve enjoyed playing with various Perl examples on rosettacode.org, and have noticed this principle in action. In this talk we’ll look at some of the ways a language can make your life more miserable than it needs to be.
The seminar is free of charge but for logistics reasons you need to confirm your attendance on CoSBI site.