Tag Archives: Wikipedia

Papers about Wikipedia at CSCW 2010

February report of few papers about Wikipedia at CSCW conference by David Karger at Haystack Blog, MIT CSAIL Research.
The paper briefly reviewed are
* Socialization Tactics in Wikipedia and their Effects, by Choi, Alexander, Kraut and Levine: studied how participants early experiences of Wikipedia—whether they were invited or began editing on their own; whether their work was ignored, admired, or critiqued; what kind of advice they received—affected users later participation in and contributions to Wikipedia.
* The work of sustaining order in Wikipedia: The banning of a vandal by Geiger and Ribes
* Readers are Not Free-Riders: Reading as a Form of Participation on Wikipedia, by Antin and Cheshire: the more you know about wikipedia (sampled with a survey), the more you participate
* Egalitarians at the Gate: One-Sided Gatekeeping Practices in Participatory Social Media, by Keegan and Gergle: which breaking news stories are featured on the front page? They studied whether this decision is made in an egalitarian fashion or whether some individuals have significantly more power. Most interestingly, they found that certain ‘elite users’ who participate in the discussion to an unusually high degree do have inordinate power to “spike” stories, preventing them from appearing, but do not seem to have power to push stories they like into appearance.
* Beyond Wikipedia: Coordination and Conflict in Online Production Groups by Kittur and Kraut. Interestingly they studied Wikia.com, a service hosting over 6000 distinct wikis all running on the same Mediawiki platform as Wikipedia. The uniformity of implementation meant that it could be ruled out as a source of different behaviors in different wikis.

First day at Sunbelt

The first day of Sunbelt is finished: it was very hot … meaning there were some problems with conditioning air not working ;)
I met some cool people: in particular
(1) Mathieu Bastian of Gephi, great open source program for visualization of networks,
(2) Jure Leskovec of Stanford, hands-down best talk up to now, who spoke about “Predicting Positive and Negative Links in Online Social Networks”, work on Wikipedia, Slashdot and Epinions signed social networks (they even cited me in the paper and used the Epinions trust network I made available time ago on Trustlet.org!),
(3) Filippo Menczer of Indiana University, whose great Scholarometer widget I recently embedded on my blog and who is doing many different great works.

Some people are using the hashtag #sunbelt on Twitter, you might enjoy posts tagged as #sunbelt as made visible by visibletweets (iframed below)

Last point, I’m at Sunbelt with my colleagues in the SoNet group, Michela Ferron and Asta Zelenkauskaite. Tomorrow we will present two recent works: one about
social networks in Wikipedia, the other about social capital and enterprise2.0 platform usage.

Now back to finish the slides …

My invited talk at Future Networked Technologies event

Few days ago I gave an invited talk at the the Future Networked Technologies event in Graz.
It was organized by FIT-IT, the largest Austrian national public funding programme for research in information technology, for the opening of competitive calls for collaborative research projects, in 3 areas: Semantic Systems and Services, Trust in IT Systems and Visual Computing.

It was not an easy task being inspirational for many different researchers coming from these 3 different backgrounds.
I talked about what I did during my PhD Thesis (work on trust metrics and trust-aware recommender systems), about what we are doing in my research group SoNet (research on social networks in Wikipedia and about Enterprise2.0) and a bit about my research institute, FBK. I used the research lines I work(ed) on as motivating examples for what I advocated today research should be: interoperable on the open web and aimed at creating services for real users.
Examples I pointed at toward the end (all of them related to Semantic Systems and services” call) were: DBpedia, Microformats, RDFa, LinkedData.

The slides of my talk are embedded below:

The meeting was very interesting. There were around 40 or 50 researchers from Austria. I got a chance to talk with some of them after my talk and got interesting feedback and suggestions. I hope I gave them some food for thought.
Among the projects I discovered (funded in the past by FIT-IT) I particularly liked:
* DYONIPOS – DYnamic ONtology based Integrated Process OptimiSation (which is more impressive than the website would make you imagine, and more importantly it was used and evaluated empirically by the Austrian Ministry of Finance).
* Caleydo, an innovative Visualization Framework for Gene Expression Data in its Biological Context (below a demo of it).

What is the status of Wikinews?

Wikinews is a free-content news source wiki and a project of the Wikimedia Foundation, just as the more known Wikipedia. The site works through collaborative journalism.
Some people claimed it failed in its attempt but I was not able to find a report about this (evolution over the years, quantity of editors involved, news produced, … and more interestingly health and diversity of the active community).
Ironically, the only relevant information I found is in a Knol, the Google alternative to Wikipedia.

Wikinews has been in existence for several years now, and yet the English-language version has only 15,000 articles. Considering that Wikipedia has already surpassed three million articles, that is a sad testimony to the effort to keep Wikinews alive. Wikinews for the most part merely regurgitates news already covered elsewhere, and no other news outlet, to my knowledge, quotes Wikinews. Wikinews never fulfilled it’s objective, and should be allowed to die a graceful death.

In addition to that, Wikinews has been allowed to be taken over by a clique of individuals pushing a power play to silence any opposition, either to their own point-of-view or the point-of-view of their e-friends. That is anathema to any free society project. Whenever one group uses power to punish opposition, and that opposition has no actual and effective recourse (there is no appeal process), than the project must be shut down. When a conflict occurs and it is deemed useful to dole out punishment of any sort, the entire conflict must be reviewed and all sides punished in an equitable fashion. Wikipedia learned this rule, only after creating thousands of vandals, some of which are still going strong.

Do you have any experience with WikiNews?

“The Secret Powers of Time”: how to present effectively! And on the absense of future tense in Sicilian dialect…

Beside the content (which is interesting, he has a message), the way of presenting it is fabulous!!! I want to do something like that as well in the future!

An interesting tidbit of information. In the talk Professor Philip Zimbardo mention that in the Sicilian dialect (Sicily in the southern part of Italy) there is no verb tense for future! I checked quickly and what I got was a discussion in the Sicilian Wikipedia pointing to a web site that is now down. Being warned about the source, below you can find the translation in English, I modified some parts but over all Google Translate did a great job. Enjoy!
“THE FUTURE. In Sicilian dialect is missing the future tense of verbs and any statement about future action is constructed with present tense and the word becomes preceded by an adverb of time (eg: Duman vegnu, Tomorrow I come). Paul Messina explains: As you can understand (almost philosophically) this anomaly? Is the starting point for a link between language and culture, ways of being and thinking. This is the historical consciousness of Heideggerian being-here to produce a continuous reduction of the future to present, of ‘hic et nunc’ (‘here and now’) and this occurs having full possession of the past definitely conquered now. Sicilians are masters of time or, to put it in Tomasi di Lampedusa word, are Gods. But to be (or to be believed to be) masters of time can mean mentally dominate life and death, to be sure of its inviolability only in the present, one that appropriates the future time to prevent death, unavoidable shadow existence. What counts is the present. Being and becoming, in short, blend or merge themselves in the metaphysics anxiety”.

As droves: are Wikipedia editors leaving, or are new editors joining?

Logo of the English Wikipedia
Image via Wikipedia

Is “Wikipedia editors are leaving in droves” as the Wall Street Journal wrote, picking up a study by Felipe Ortega?
Or is “New editors are joining English Wikipedia in droves?” as Erik Zachte, Data Analyst at Wikimedia Foundation replies?
The blog post by Erik is very interesting. Basically you can take it as a warning about the fact with the amount of data available nowadays thanks to Web2.0 services you can say almost anything; it really depends on how you define quantities. Just as an example, Felipe counted every person as editor who made one update over the years while Erik (for Wikipedia’s internal statistics) only counts a person as editor who has 5 or more edits in one month.
The second lesson you can take away is: if you want to get picked up by newspapers (such as WSJ) synthesize your huge work (the PhD thesis of Felipe is a PDF of 228 pages) into few catchy and dramatic headlines such as “Wikipedia editors are leaving in droves”.

Reblog this post [with Zemanta]

Review of “Feedback Effects between Similarity and Social Influence in Online Communities”

Today I presented to the other SoNetters a wonderful paper titled “Feedback Effects between Similarity and Social Influence in Online Communities” by David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri of Cornell University, presented at the 2008 KDD conference on Knowledge discovery and data mining. My review just under the slides I used for the presentation.

Besides the points already presented in the slides, here I add few points relevant for our research on Wikipedia.

Social influence: People become similar to those they interact with
Interaction ? similarity
Selection: People seek out similar people to interact with
Similarity ? interaction

They considered registered users to the English Wikipedia who have a user discussion page (~510,000 users as of April 2, 2007). They are responsible for 61% of edits to the roughly 3.4 million articles. They ignore actions by users without discussion pages, who tend to have very few social connections.

User’s activity vector v(t): number of times that he or she has edited each article up to that point in time t.
Similarity(u,v): similarity between activity vectors of user u and v.
Time of ?rst meeting for two users u and v = time at which one of them ?rst makes a post on the user discussion page of the other.

In principle, we could also try to infer social interactions based on posting to the interactions based on posting to the same article’s discussion page. Moreover, we found that using simple heuristics to infer interaction based on posts to article discussion pages produced closely analogous results to what we obtain from analyzing user discussion pages.

They ?nd that there is a sharp increase in the similarity between two editors just before they ?rst interact (selection), with a continuing but slower increase that persists long after this ?rst interaction (social influence).

They also create a model and estimate the unobservable parameters based on maximum-likelihood. The estimates are as follows:
* The parameter ?, the probability of communicating versus editing, was 0.058 (i.e. every 100 actions, 6 are talks while 94 are page edits). We can cite it and we can even verify this across different wikipedias and at different time slots.
* When considering article edits as actions, the article is chosen from one’s own interests with probability ? = 0.35, from a neighbor’s interests with probability ? = 0.081, from the overall interests of Wikipedia editors with probability ? = 0.5, and by creating a totally new article with probability ? = 0.069.
* When considering talks as actions, the user to communicate with is chosen randomly from the overall set of users with probability ? = 0.71, and someone who has engaged in a common activity with probability 1-? = 0.29

They also do some content analysis (30 instances of two users meeting for the ?rst time. We examined the content of the initial communication and any reply, looking for references to speci?c articles or other artifacts in Wikipedia. We also compared the edit history of the two users).
Of the 30 messages, 26 referenced a speci?c article, image, or topic. In 21 cases, the users had both recently worked on the artifact that was the subject of conversation.
The gap between co-activity and communication was usually short, often less than a day, though it stretched back three months in one case.
Informally, communications tended to fall into a few broad categories: o?ering thanks and praise, making requests for help, or trying to understand the editing.behavior of the other person.
This sample of interactions suggests that people most often come to talk to each other in Wikipedia when they become aware of the other person through recent shared activity around an artifact. Awareness then leads to communication, and often coordination.

A really wonderful paper!

Interesting stats on Wikipedia: few females, young, why do they contribute and not.

Interesting statistics based on a web survey with more than 130,000 Wikipedians responding (data from the working draft).

  • The average age is 25.8 years: Wikipedia are pretty young! One question could be “do they have already developed the wisdom needed to crystallize all humans knowledge?”
  • Less then 13% of contributors are women: this is pretty big unbalance! Again, do Wikipedia reflect a gender-balanced perspective?
  • Given the relative young age of contributors, it is interesting to note that 4.59% hold a PhD
  • only 30% of the respondents say they have a partner
  • only 14% of the respondents say they have a children

Well in fact it is well acknowledged that Wikipedia suffers some systemic biases: The average Wikipedian on the English Wikipedia is (1) a man, (2) technically inclined, (3) formally educated, (4) an English speaker (native or non-native), (5) white, (6) aged 15–49, (7) from a majority-Christian country, (8) from a developed nation, (9) from the Northern Hemisphere, and (10) likely employed as a white-collar worker or enrolled as a student rather than employed as a labourer (from a previous survey).

Interesting are also the following facts:

  • On average, contributors spend 4.3 hours per week contributing to Wikipedia
  • Regarding their motivations to contribute, respondents mentioned as their top two reasons that (1) they liked the idea of sharing knowledge, and (2) that they had come across an error and wanted to fix it. This would suggest the strategy of leaving around small errors can be an idea to improve participation. I wonder the results of such a test: the Wikipedia web server servers, to random users who never contributed, the Wikipedia page with automatically inserted small errors and typos in it and record this. Then by analyzing automatically her history of future contributions it would be possible to check: Does the non-contributor-yet become a contributor? Does she remain a contributor? Is this percentage significantly higher than for users who received “normal” pages?
  • I don’t think I have enough information to contribute was reported by 52% of respondents. I am happy just to read it; I don’t need to write by 49%. “I don’t have time” follows with 31%. I don’t know how by 25%.
  • For non-contributors, the most important factor that would make contribution more likely is “I knew there were specific topic areas that needed my help”. This suggests that a recommender system suggesting in a personalized way which wikipedia pages might benefit from YOUR contribution can be a useful tool (see for example the paper “SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia” by Cosley, D.; Frankowski, D.; Terveen, L.; Riedl, J. Anyway I tend to be against this because it will be a strong influence of the machine (the algorithm) on where humans should pose their attention and I think the system is better self-regulated based on the interests of everyone.
  • 42% of the respondents who did not donate to Wikipedia say they don’t know how to do it. Then, for them and for you, http://donate.wikipedia.org

(Photo credits: Foofy. Under Creative Commons.)

Clay Shirky on trust, Web, algorithms, authority.

An insightful essay by Clay Shirky on trust, Web, algorithms, authority. Clay Shirky is able to put in few clear words what I’ve been trying to tell for years.

Khotyn is a small town in Moldova. That is a piece of information about Eastern European geography, and one that could be right or could be wrong. You’ve probably never heard of Khotyn, so you have to decide if you’re going to take my word for it. (The “it” you’d be taking my word for is your belief that Khotyn is a town in Moldova.)
Do you trust me? You don’t have much to go on, and you’d probably fall back on social judgement — do other people vouch for my knowledge of European geography and my likelihood to tell the truth? Some of these social judgments might be informal — do other people seem to trust me? — while others might be formal — do I have certification from an institution that will vouch for my knowledge of Eastern Europe? These groups would in turn have to seem trustworthy for you to accept their judgment of me. (It’s turtles all the way down.)

An authoritative source isn’t just a source you trust; it’s a source you and other members of your reference group trust together.

authority is a social agreement, not a culturally independent fact.

Thanks to the post, I also came to know about “it’s turtles all the way down” (from Wikipedia)

A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: “What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.” The scientist gave a superior smile before replying, “What is the tortoise standing on?” “You’re very clever, young man, very clever”, said the old lady. “But it’s turtles all the way down!”

And you are reading this … because you trust me, I trust Wikipedia, you trust Wikipedia, you trust the fact if I told you that this comes from Wikipedia, you trust this comes from Wikipedia servers, you trust Wikipedia servers don’t change the content of their pages randomly or adhocly, you trust that the link I placed there is a real link to Wikipedia, you trust that what you see on the screen is the result of computers running as they should, you trust that your web browser works the way you think it works in showing you the content from my blog, you trust that the Internet routers long the way did not inserted additional information, …

Reblog this post [with Zemanta]