Tag Archives: Wikipedia

Amazing visualizations of activity on Wikis

Warning: this webpage loads many processor-intensive animations. It might break your browser and probably you will have to close browser window (tab) after use.

The first visualization is made by Erik Zachte and available at stats.wikimedia.org.
The animation (embedded below) shows 4 aspects of the development of different Wikipedias in different languages (en, it, fr, …): X-axis: Age of a project, Y-axis: Number of articles per project, Circle size: Number of editors per project, Color: Maturity of content (blue=mostly stubs, violet=mostly larger articles)

Interactive version, all projects (requires Firefox 3+, Safari 4+ or Chrome)

Static version, Wikipedia only (8 Mb Flash)

The other 3 visualisations are made by Matt Ryal with JavaScript (Processing.js and RaphaëlJs). They are about activity on wiki and blogs of Atlassian’s Extranet.
I embed them here but you can check Matt’s post for more details and better visualization.

Activity — a rippling visualisation of comment activity on the wiki. Based loosely on the Apple Arabesque screensaver.

Comments — a falling bar-graph visualisation of comments by blogpost. Based very much on a Flash visualisation by Digg, but reimplemented in JS (this is about blog and not wiki).

Contributors — a tree graph visualisation linking commenters and blog post authors. (this is about blog and not wiki)

Google helps Wikipedia helping the world … maybe.

In 2008, Google opened a project competing with Wikipedia: Knol. The project at January 2009 had grown to 100,000 articles, something it is hard to define a success.
Wikipedia - Cancer Survivor Since then it seems the attitude of Google towards Wikipedia have changed a bit, more like “Ok, you (Wikipedia) can become the de facto monopolist in the user-generated creation of knowledge, we have other and more challenging competitors to defeat now, we will incorporate you later on down the way”.
Two example of this new attitude (according to my view of course) are the Kiswahili Wikipedia Challenge and the Health Speaks Wikipedia pilot project.

The Kiswahili Wikipedia Challenge was a challenge launched in November 2009 by Google. The task was to translate English Wikipedia articles into Kiswahili or to write Wikipedia articles from scratch. Participants received prizes such as laptops, mobile phones, prepaid internet access modems, Google T-shirts. Google stated goal: “We hope to make the online experience richer and more relevant for 100 million African users who speak Kiswahili.”

The results might not be that great. The Wikipedia Signpost of 2010-07-26 quotes from the blog post what happened on the Google Challenge @ the Swahili Wikipedia:

Nearly all of them are gone now and left a lot of articles which often are not really state of the art formally and also linguistically … they don’t care because they were there for laptops and other prizes (no need to be rude, but it hurts me pretty bad).

An article in New York Times is similarly not exalted. The last paragraphs of the article comments on Google-generated content in Wikipedias in languages of India.

However, the surge in content created by Google’s project to improve these sites still needs work, according some local site administrators. For example the Wikipedia in Tamil – one of the underrepresented South Asian languages – the entries covered “too many American pop stars and Hindi movies, which Tamils may not need as a priority.” There was also sloppiness in language and coding.

Despite these concerns, Tamil Wikipedia plans on working with Google to continue the additions. The Bengali Wikipedia, however, took greater umbrage and simply deleted the Google-generated content. The Bengali Wikipedians explained that the material simply did not meet their standards.

The Health Speaks Wikipedia pilot project was announced yesterday and is focused on increasing the quantity and quality of online health information in languages spoken in developing countries. They started a pilot project to support community-based, crowd-sourced translation of health information from English Wikipedias into Arabic, Hindi and Swahili Wikipedias.
They have chosen hundreds of good quality English language health articles from Wikipedia that they hope will be translated with the assistance of Google Translator Toolkit, made locally relevant, reviewed and then published to the corresponding local language Wikipedia site. They have also funded the professional translation of a small subset of these articles. And they are additionally providing a donation incentive to encourage community translators to participate. For the first 60 days, they will donate 3 cents (US) for each English word translated to the Children’s Cancer Hospital Egypt 57357, the Public Health Foundation of India and the African Medical and Research Foundation (AMREF) for the pilots in Arabic, Hindi and Swahili, respectively, up to $50,000 each. This means that community translators will help their friends and neighbors access quality health information in a local language, while also supporting a local non-profit organization working in health or health education.

74 Errors in the Encyclopædia Britannica that have been corrected in Wikipedia

While reading “Can History be Open Source? Wikipedia and the Future of the Past” (review soon!) by Roy Rosenzweig, founder and ex-director of the Center for History and New Media (which also created Zotero and Omeka!), I got across the mention to the list of 74 Errors in the Encyclopædia Britannica that have been corrected in Wikipedia.
Lovely! ;)

Wikipedia power structure: Anarchy, Bureaucracy, Despotism, Democracy, Meritocracy, Plutocracy, Technocracy … and everything in between

There is an interesting essay over at meta.wikimedia about Wikipedia power structure: Wikimedia’s present power structure is a mix of anarchic, despotic, democratic, republican, meritocratic, plutocratic, technocratic, and bureaucratic elements.
Wikipedia - VeteranWow! The entire self-reflection of the Wikipedia community is amazing and the topic is very interesting.
Personally I find interesting how much these policies and ethos are created by the community (the humans) and how much they are created by the socio-technical system (the Mediawiki software). My impression is that the software influences a lot and the same community will perform very differently under different softwares: I think it is often mentioned that Wikis work because it is very easy (easier?) fix things than destroying them, but this is a feature of the software and of the buttons and functionalities (such as rollback) that the software gives to users.
Many of these points resonates in me since I read the glorious book by Lawrence Lessig Code and Other Laws of Cyberspace but now I’m in a position to test them … at least in Wikipedia! I guess I would be classified as a technocratic ;)

The essay is released under the Creative Commons Attribution/Share-Alike License, so, just because I can, I copy and paste the original HTML after the jump (and most links are of course broken). Enjoy!
Continue reading

Larry Sanger on max quality of a Wikipedia article

Larry Sanger in the paper “The Fate of Expertise after Wikipedia”:

Over the long term, the quality of a given Wikipedia article will do a random walk around the highest level of quality permitted by the most persistent and aggressive people who follow an article.

Larry Sanger is co-founder of Wikipedia but left years ago. You can read the hyper-interesting account of his involvement with Wikipedia in “The Early History of Nupedia and Wikipedia: A Memoir” (part 1, part 2).

Review of “What motivates Wikipedians?” Main motivation = Fun!

Paper by Oded Nov, published on Communications of the ACM (November 2007)

A random sample of 370 Wikipedians were emailed a request to participate in a Web-based survey.
A total of 151 valid responses were received (40.8% response rate), of which 140 (92.7%) were from males (first “gosh”!).
The respondents’ mean age was 30.9, and on average they have been contributing content to Wikipedia 2.3 years.
The average level of contribution was 8.27 hours per week.

The Wikipedians were asked to state how strongly they agree or disagree on a scale of 1 to 7 with items.
Items were related to 8 different types of motivations: Protective, Values, Career, Social, Understanding, Enhancement (typical measures about volunteering motivations) and Fun, Ideology (added by authors since relevant for Wikipedia).

Overall, the top motivations were found to be Fun and Ideology. Agreement with Fun was in average 6.10 (in the range 1 to 7!). Ideology was 5.59. The other motivations were inferior to 4.

Each of the six motivations positively correlated with contribution level.

The Ideology case is particularly interesting (…): while people state that ideology is high on their list of reasons to contribute, being more ideologically motivated does not translate into increased contribution.

It would make sense for organizers of user-generated content outlets to focus marketing, recruitment, and retention efforts by highlighting the fun aspects of contributing.
Credit for image: nojhan released under Creative Commons

Review of “Taking up the mop: identifying future wikipedia administrators”

Paper by Moira Burke and Robert Kraut of Carnegie Mellon University, presented at CHI ’08, Conference on Human Factors in Computing Systems.

This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship.

In short, authors compute lots of stats for every single user and then they do regression with the binary variable “election successful, i.e. X became admin”. They separate Request for Adminship pre-2006 and after-2006.

The stats they compute are:
Strong edit history
* Article edits ‡
* Months since first edit
Varied experience
* Wikipedia (policy) edits ‡
* WikiProject edits ‡
* Diversity score
* User page edits ‡
User interaction
* Article talk edits ‡
* User talk edits ‡
* Wikipedia talk edits
* Arb/mediation/wikiquette edits
* Newcomer welcomes
* “Please” in comments
* “Thanks” in comments
Helping with chores
* “Revert” in comments ‡
* Vandal-fighting (AIV) edits
* Requests for protection
* “POV” in comments
* Admin attention/noticeboard edits
* X for deletion/review edits ‡
* Minor edits (%)
Observing consensus
* Other RfAs
* Village pump
* Votes
Edit summaries / comments
* Commented (%)
* Avg. comment length (log2 chars)
Conclusions
Merely performing a lot of production work is insufficient for “promotion” in Wikipedia. Candidates’ article edits were weak predictors of success. They also have to demonstrate more managerial behavior. Diverse experience and contributions to the development of policies and Wiki Projects were stronger predictors of RfA success. This is consistent with findings that Wikipedia is a bureaucracy [1] and that coordination work has increased substantially [8][13].

However, future work is needed to examine more closely what the admins are doing. Future admins also use article talk pages and comments for coordination and negotiation more often than unsuccessful nominees, and tend to escalate disputes less often.

Although this research has shown that judges pay attention to candidates’ job-relevant behavior and especially behavior that suggests the candidate will be a good manager and not just a good worker, it is silent about whether other factors and probit regressions on the likelihood of success in a identified in the organizational literature [9]—social networks, irrelevant attributes, or strategic self- presentation.

Indeed, recent evidence that Wikipedia admins use a secret mailing list to coordinate their actions toward others suggest that sponsorship may also play a role in promotion.

Future research in Wikipedia using techniques like those in the current paper can be used to test theories in organizational behavior about criteria for promotion. An important limitation of the current model is that it does not take the quality of contribution into account. We plan to improve the model by examining measures of length, persistence, and pageviews of edits, which are already being used in more processor intensive models of existing admin behavior [7] and impact of edits [10].

Criteria for admins have changed modestly over time. Success rates were much higher (75.5%) prior to 2006, and collaboration via article talk pages helped more in the past (+15% for every 1000 article talk edits, compared to +6.3% today). The diversity score performs similarly prior to 2006 (+3.7% then, +2.8% now). However, participation in Wikipedia policy and Wiki Projects? was not predictive of adminship prior to 2006, suggesting the community as a whole is beginning to prioritize policymaking and organization experience over simple article-level coordination.

If you want to read the details, you can read the PDF of the paper.
Credit: Picture by inju released under Creative Commons.

Philosophies of Wikipedia: Inclusionists, Deletionists but also Gnomes, Fairies and Trouts

Wow, Wikipedia developed over time a set of internal editing philosophies and users can express their agreement to a certain philosophy simply by adding a specific template in their user page.
So I could extract the following pie chart from the Wikipedians by Wikipedia editing philosophy page. (Update: as HaeB says in a commento “categories are not disjoint (…) a pie chart might not be the best visualization”. A bar chart might be better…)

The main ideological dichotomy is between Inclusionists and Deletionists. Inclusionists favor keeping and amending problematic articles over deleting them, Deletionists favor removing articles that are not encyclopedic. Currently there are 1123 self-declared Inclusionists and 261 Deletionists.
As it is typical of Wikipedia, fun enters the stage and a new philosophy emerges AWWDMBJAWGCAWAIFDSPBATDMTD, acronym for “Association of Wikipedians Who Dislike Making Broad Judgments About the Worthiness of a General Category of Article, and Who Are in Favor of the Deletion of Some Particularly Bad Articles, but That Doesn’t Mean They Are Deletionists”. Currently this the 3rd most frequest philosophy with 434 adherents, denoting how Wikipedians likes to have fun ;)
And in fact the 6th most frequent philosophy is WikiGnome (makes useful incremental edits without clamouring for attention, works behind the scenes of a wiki, tying up little loose ends and making things run more smoothly, fixing things like typos, poor grammar, and broken links) but there are also 265 WikiFairies (beautifies Wikipedia by organizing messy articles, improving style, or adding color and graphics).

Myself, I think I’m a Darwikinist or maybe not … ;)

Below the complete table and the same pie but in 3D.
Well, there’s a lot of Philosophy(ies) in Wikipedia! ;)

 Wikipedian WikiGnomes

2543

 Inclusionist Wikipedians

1123

 Wikipedians in the AWWDMBJAWGCAWAIFDSPBATDMTD

434

 Wikipedian WikiFairies

265

 Deletionist Wikipedians

261

 Wikipedians open to trout slapping

245

 Wikipedians against notability

228

 Eventualist Wikipedians

222

 Mergist Wikipedians

184

 Exopedianist Wikipedians

111

 Darwikinist Wikipedians

109

 Wikipedia users who oppose Flagged Revisions

94

 Structurist Wikipedians

86

 Incrementalist Wikipedians

85

 Exclusionist Wikipedians

77

 Wikipedian WikiElves

74

 Metapedianist Wikipedians

62

 Immediatist Wikipedians

53

 Wikipedia users who support Flagged Revisions

51

 Precisionist Wikipedians

39

 Delusionist Wikipedians

35

 Eguor Wikipedians

34

 Categorist Wikipedians

31

 Hyphen Luddites

19

 Redlinking Wikipedians

18

 Redirectionist Wikipedians

11

 Wikidemocratism Wikipedians

11

 Separatist Wikipedians

4

 Wikipedians open to whale squishing

3

 Transwikist Wikipedians

2

 Unsourced BLP Rescuers

2

TOTAL

6516

Tidbits from Wikipedia presentation at Wikysym by Andrew Lih “What Hath Wikipedia Wrought: Crowds Remaking the News”

The presentation (embedded below) consists of 148 slides. Below I selected few interesting ones.

Slide 42
• Wikitravel: only 5% of those who press “edit” actually save
• Wikipedia: 1/5 to 2/5
• WikiHow: 30% with guided editing
• Wikia: WYSIWYG editor >> 50%
Sources: Jack Herrick, WikiHow; Erik Zachte, Wikimedia Foundation

Slide 91:
An experiment by The Guardian on crowdsourcing journalism.
The Guardian obtained two million pages of explosive documents that outed your country’s biggest political scandal of the decade. They’ve had a team of professional journalists on the job for a month, slamming out a string of blockbuster stories as they find them in their huge stack of secrets.
How do you catch up? If you’re the Guardian of London, you wait for the associated public-records dump, shovel it all on your Web site next to a simple feedback interface and enlist more than 20,000 volunteers to help you find the needles in the haystack.
Your cost for the operation? One full week from a software developer, a few days’ help from others in his department, and £50 to rent temporary servers.

Differences in Wikipedia pages about “Vietnam War” (English vs Vietnamese)

Just a quick play: below I embedded the page about Vietnam war from English Wikipedia and the translation in English of the page about Vietnam war from Vietnamese Wikipedia. (click here to open just the page embedding the 2 pages).

Would be interesting to automatically check the differences in how different communities (in this case defined by the language) represent the same concepts.
For example the beginning of the article from the Vietnamese wikipedia (automatically translated) says: In Vietnam, newspapers still use the name of resistance against American for just this war, [9] as well as to distinguish it from other wars that happened in Vietnam when anti- French , anti- Japanese , anti- Mongolia , against China. Some people [10] feels not name the U.S. invasions of neutrality by the war also reflects elements of a civil war; [10] that some other name for the Vietnam War reflected the views of West rather than the people living in Vietnam. [10] The name of this war is still a matter of controversy. But now scholars in and outside Vietnam have gradually accepted the name “Vietnam War” because of its international nature.

from English Wikipedia

http://en.wikipedia.org
from Vietnamese Wikipedia (translated in English with Google)

http://vi.wikipedia.org