- White House Visitation Social Network: Wow, that’s a transparent government!
Wow! Whithouse provides records of online! And with a sparql query on data.gov you get the data and then with Gephi you visualize the social network in which visitors and visitees are considered as nodes, while a directed edge from node A to node B is to represent visitor A visits visitee B. Wow!(tags: sna, social, network, data.gov, obama, transparency, whitehouse, gephi, sparql, visualization, sonet)
Yearly Archives: 2010
Facebook social network analysis using Gephi
This great presentation tells you:
* how to use Netvizz, a Facebook application for exporting your Facebook social network or the network of a Facebook group in the form of a .gdf file
* and then how to import the .gdf file into gephi for analyzing and visualizing your network: you can select and parameter layout algorithms, change colors and sizes, etc.
Amazing!
“Send your data, we will discover the hidden patterns” or Google Machine Learning Prediction API
Wow! The Google Prediction API enables access to Google’s machine learning algorithms to analyze your historic data and predict likely future outcomes. Upload your data to Google Storage for Developers, then use the Prediction API to make real-time decisions in your applications. The Prediction API implements supervised learning algorithms as a RESTful web service to let you leverage patterns in your data, providing more relevant information to your users. Run your predictions on Google’s infrastructure and scale effortlessly as your data grows in size and complexity.
Uses
* Language identification
* Customer sentiment analysis
* Product recommendations & upsell opportunities
* Message routing decisions
* Diagnostics
* Document and email classification
* Suspicious activity identification
* Churn analysis
* And many more…
Science of Generosity
Some time ago we wanted to apply for the Science of Generosity funds with people from Akoha but at the end it didn’t work out.
Now I see that the projects winning the 1.4 million dollars have been announced. The Science of Generosity initiative also collected many datasets dealing with generosity. Interesting!
Two of the projects will examine how generosity originates and spreads within social settings. James Andreoni, a behavioral economist at the University of California San Diego, was awarded $250,000 to study the relationship between charitable donors and recipients, with a focus on how empathy affects charitable donation. His project challenges economic approaches that tend to see generosity as a function of individual self-interest; he hypothesizes, instead, that generosity emerges from within social situations and must be understood as inherently social. (…)
Harvard University sociologist and physician Nicholas Christakis was awarded $396,447 to explore how generosity spreads beyond the donor/recipient relationship and creates what he calls “cascades” of generosity within social networks.
Percentage of pie charts which resembles Pac Man (as a Google pie chart)
The URL for generating the following pie chart on the fly via Google charts, containing all the needed parameters (that’s why it is so long), is
http://chart.apis.google.com/chart?chxt=x,y&cht=p&chco=FAFAFA,FFFF00,FAFAFA&chs=600×300&chtt=Percentage%20of%20Google%20Chart%20Which%20Resembles%20Pac-man%20Chart%20title&chd=t:10,80,10&chl=Does%20not%20resemble%20Pac-man|Resembles%20Pac-man which produces
From mattcutts.
Predicting the Future with Social Media (SoNet slides)
During our weekly SoNet internal research meeting, my colleague Napo presented the paper “Predicting the Future With Social Media” by Sitaram Asur and Bernardo A. Huberman, archived on arXiv in March 2010. Using Twitter posts, they are able to forecast box-office revenues for movies, outperforming market-based predictors. They also do sentimental analysis on Twits by asking Mechanical Turk to tag few twits as positive, neutral, negative and then they train LingPipe to predict the positiveness of all the other millions of twits. Read it! Very interesting paper!
Almost 1,000,000,000 edits for Wikipedias!
Your browser does not support iframes.
At the time of this post, 999,607,000 edits! See http://toolserver.org/~emijrp/wikimediacounter/ (which is also iframed above)!
As droves: are Wikipedia editors leaving, or are new editors joining?
Is “Wikipedia editors are leaving in droves” as the Wall Street Journal wrote, picking up a study by Felipe Ortega?
Or is “New editors are joining English Wikipedia in droves?” as Erik Zachte, Data Analyst at Wikimedia Foundation replies?
The blog post by Erik is very interesting. Basically you can take it as a warning about the fact with the amount of data available nowadays thanks to Web2.0 services you can say almost anything; it really depends on how you define quantities. Just as an example, Felipe counted every person as editor who made one update over the years while Erik (for Wikipedia’s internal statistics) only counts a person as editor who has 5 or more edits in one month.
The second lesson you can take away is: if you want to get picked up by newspapers (such as WSJ) synthesize your huge work (the PhD thesis of Felipe is a PDF of 228 pages) into few catchy and dramatic headlines such as “Wikipedia editors are leaving in droves”.
Library of Congress gives to every twit bibliographic status!
Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress, the largest library in the world.
I’m still totally puzzled by how a so simple service (basically you can post 140 chars of text and nothing more) got so widely used! A typical Matthew effect (the rich gets richer)!
See more on official Library of Congress blog post “How Tweet It Is!: Library Acquires Entire Twitter Archive”.
Review of “Feedback Effects between Similarity and Social Influence in Online Communities”
Today I presented to the other SoNetters a wonderful paper titled “Feedback Effects between Similarity and Social Influence in Online Communities” by David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri of Cornell University, presented at the 2008 KDD conference on Knowledge discovery and data mining. My review just under the slides I used for the presentation.
Besides the points already presented in the slides, here I add few points relevant for our research on Wikipedia.
Social influence: People become similar to those they interact with
Interaction ? similarity
Selection: People seek out similar people to interact with
Similarity ? interaction
They considered registered users to the English Wikipedia who have a user discussion page (~510,000 users as of April 2, 2007). They are responsible for 61% of edits to the roughly 3.4 million articles. They ignore actions by users without discussion pages, who tend to have very few social connections.
User’s activity vector v(t): number of times that he or she has edited each article up to that point in time t.
Similarity(u,v): similarity between activity vectors of user u and v.
Time of ?rst meeting for two users u and v = time at which one of them ?rst makes a post on the user discussion page of the other.
In principle, we could also try to infer social interactions based on posting to the interactions based on posting to the same article’s discussion page. Moreover, we found that using simple heuristics to infer interaction based on posts to article discussion pages produced closely analogous results to what we obtain from analyzing user discussion pages.
They ?nd that there is a sharp increase in the similarity between two editors just before they ?rst interact (selection), with a continuing but slower increase that persists long after this ?rst interaction (social influence).
They also create a model and estimate the unobservable parameters based on maximum-likelihood. The estimates are as follows:
* The parameter ?, the probability of communicating versus editing, was 0.058 (i.e. every 100 actions, 6 are talks while 94 are page edits). We can cite it and we can even verify this across different wikipedias and at different time slots.
* When considering article edits as actions, the article is chosen from one’s own interests with probability ? = 0.35, from a neighbor’s interests with probability ? = 0.081, from the overall interests of Wikipedia editors with probability ? = 0.5, and by creating a totally new article with probability ? = 0.069.
* When considering talks as actions, the user to communicate with is chosen randomly from the overall set of users with probability ? = 0.71, and someone who has engaged in a common activity with probability 1-? = 0.29
They also do some content analysis (30 instances of two users meeting for the ?rst time. We examined the content of the initial communication and any reply, looking for references to speci?c articles or other artifacts in Wikipedia. We also compared the edit history of the two users).
Of the 30 messages, 26 referenced a speci?c article, image, or topic. In 21 cases, the users had both recently worked on the artifact that was the subject of conversation.
The gap between co-activity and communication was usually short, often less than a day, though it stretched back three months in one case.
Informally, communications tended to fall into a few broad categories: o?ering thanks and praise, making requests for help, or trying to understand the editing.behavior of the other person.
This sample of interactions suggests that people most often come to talk to each other in Wikipedia when they become aware of the other person through recent shared activity around an artifact. Awareness then leads to communication, and often coordination.
A really wonderful paper!