An email from Zbigniew pointed me to a Tribe.net discussion which pointed me to Personal Web Neighborhood: The Small Web project (very interesting read indeed) which pointed me to The Ultra Gleeper: A Recommendation Engine for Web Pages (pure gold!)
The UltraGleeper paper is the paper I could dream of writing but I will never be able to. Since the paper is released under a creative commons licence attribution/share-alike (and my blog too) I’m going to copy portions of it but of course giving credit to Leonard Richardson . Ooops, i was almost forgetting: Ultra Gleeper is Free software, so you have freedom of study and improving it. I will try to play with it really soon!
UltraGleeper homepage
There’s a lot of interesting stuff on the web. Since the beginning, the hard part has been finding it. In the old days the only tools available were random browsing and directory sites like Yahoo!. These days it’s more efficient to subscribe to weblogs that you’ve found are reliable sources of good links. But the web keeps growing; now it’s hard to find the new interesting weblogs, much less all the other interesting pages.
The Ultra Gleeper takes your weblog subscription list and starts from there. It crawls the web for things you haven’t seen and shows you the pages it thinks you’ll like. Your feedback improves its ability to give accurate ratings. With the Ultra Gleeper can find new pages and new weblogs to read. And if you have your own weblog or use del.icio.us, the links you post there will be automatically turned into ratings.
The Ultra Gleeper solves or avoids the problems that give recommendation engines a bad reputation. It won’t give you a lot of links you’ve already seen, because it knows about your subscriptions and what they’ve posted. It won’t just recap the most popular links of the day, because its indie rock algorithm distrusts excessive popularity. It won’t ask you for a lot of calibration ratings up front: you already gave those ratings by telling it what you subscribe to and pointing it to your weblog and/or bookmark page.
The Ultra Gleeper runs on your server and shows up in your web browser or RSS reader. It’s free software, so you’re free to use and modify it.
The Ultra Gleeper: A Recommendation Engine for Web Pages
Recommendation engines enjoyed a vogue in the mid-90s. They would solve the problem of information overload by matching user preferences against a large universe of data. The ultimate realization of this strategy would be a recommendation engine capable of mining that Northwest territory of data, the World Wide Web.
Recommendation engines were built and run into troubles. Seemingly insurmountable problems emerged and the flame of hype moved elsewhere. Recommendation engines for web pages were not built or successfully launched. To even attempt one would require development of a web crawler and the associated resources. Today, recommendation engines have something of the reputation of a well-meaning relative who gives you gifts you often already have or don’t quite want. Most useful recommendations come from knowledgeable friends or trusted web sites.
But over the years, as people built these web sites, they came up with models and tools for solving the basic problem of finding and tracking useful web sites. The wide adoption of these strategies has not only brought down the cost of building a web page recommendation engine, it’s removed some of the insurmountable problems that still plague recommendation engines for other domains. It’s now possible for someone with a dedicated server to run a recommendation system for themselves and their friends. I’ve done it and I’ll show you how to do it.
The shoulders of giants
A web page recommendation engine is now possible because a lot of the work is done elsewhere on the web and exposed to the public, because web surfers track new types of information, and because new ideas have taken root. Here are the tools and concepts used by my recommendation engine that didn’t exist in the mid-90s, or weren’t the juggernauts they are today:
* Millions more write for the web today
* Hundreds of thousands more have weblogs
* RSS, RSS aggregators, and OPML: structure for reading weblogs
* Google PageRank
* Publicly accessible search engine APIs, in particular, the Technorati and Google web APIs
* Social bookmark sites like del.icio.us
[Ok I stop here, I wanted to copy the interesting parts and … I would have probably ended up copying every single pure-gold line. ;-)
As usual, yours are precious hints :)
I think this will find its way in my thesis (that’s about social software and semantic web)
I will let you know more asap