Rethinking Search and Retrieval for Blogs
The FuzzyBlog!
Marketing 101. Consulting 101. PHP Consulting. Random geeky stuff. I Blog Therefore I Am.Home | FuzzyGroup | About Us | Our Services |
Rethinking Search and Retrieval for Blogs
A friend asked me about helping out on search and retrieval for blogs. He's implemented a system, BlogStreet, which displays "Blog Neighborhoods" or collections of blogs that are related by a dynamic analysis of their blogroll. This is a very cool concept and now he's implemented search and retrieval across their 13,000 blog url database and is finding that it isn't working so well. He initially asked me for a technology recommendation and I pointed him towards mnoGoSearch. Today he came back to me for help and, as a consultant, I asked (immediately) about two things:- Could I officially bid on the work?
- What's the budget?
Goal: To Build a Search and Retrieval System Across Weblogs
And what goes along with any goal is a constraint. Any constraint but here we have a very real one or:Constraint: Don't Break the Bank
If you think about it, indexing blogs is basically the same as indexing web sites with some important constraints as follows:- Permalink awareness is really needed. Finding an entry on a blog home page is pretty much useless unless you search the same day it's indexed or the blog doesn't change very often
- Information that is repeated in the website template really shouldn't be indexed if at all possible since it will dramatically screw up search results if that word or phrase
- Currency is extremely important. Weblogs change much more frequently than webpages so they need to be indexed more regularly. Much more regularly.
- There isn't real money in it (today). This may well change but the current economics of blogging being an amateur thing means that major capital infrastructure investments such as a big data center simply isn't going to happen.
OPML is a wicked cool way to display lightweight hierarchies of information. Its an easy to implement (I did it in less than an hour for a FAQ application), xml based, simple specification. It works and the author should be gosh dang proud of it. Here's the rub: OPML is displayed as XML tags in the browser. Here's what you see in IE: To me, the view in IE is unacceptable. This makes outlining a geek curiousity rather than a mainstream thing. Yes, in a true outliner, the results will be better but we need a way for people to view this in HTML. I'd really like people to see my outlines now but with only Radio users able to get to them, it's a chicken and egg situation. Here's my recommendation. And it isn't all that hard.
This is a Distributed Rendering Problem
Here are the issues as I see it:- Take an OPML url and generate HTML from it for display. XSLT, DHTML, etc. Who cares? Let's get it done so that "Mom" or "GrandPa" can view it. (No disrespect to highly technical Moms and GrandPas out there, this is a metaphor). Edit or view, who cares? Have to start somewhere and View is easier.
- Give a link to the actual OPML url so that if people have a mime compliant OPML editor, it can be edited. OPTIONAL: Let people have a preferences facility to bookmark them and share them.
- Do it without breaking the bank on hardware.
-
Write this in a commonly available web language currently installed on over 3,000,000 hosts world wide that also happens to be network ready, xml capable and really, really easy to get stuff done in. Sure, we'd all love to use Zope or Python or ExoticLangOfTheDayHere. Guess what: PHP's what I recommend. It meets these criteria and more.
It's wicked portable, fast enough and has none of the install problems with Perl scripts (flames to sjohnson@fuzzygroup.com).
- Write a renderer in PHP. Make it smart enough to update its rendering params from a server periodically. Make it accept one parameter, the OPML file to render.
- Write this code so it's drop dead simple to install on a server. Make it "ioview.php", no includes. Copy it into a website and go.
- Let people who download it and install it sign up with UserLand as an "OPML Partner". Award "Karma Points" if they do it.
- Let UserLand operate a redirector service which forks IO rendering requests out at random to different servers all over the globe. This could probably be done with one or two Linux boxes. Sure we could make it fancy but let brute force solve it for now. Heck, all UserLand really has to do is own the DNS entries and a little tiny bit of hardware to jumpstart it.
- Ask the Radio community to help out. I have right now 3 boxes I could register. I don't mind giving up a little cpu and bandwidth.
- Do something with the "Karma Points". Have a pot luck supper or something. Who cares. We'll do it because we're a community and we believe. The karma is just an idea.
This Page was last update: 10/3/2002; 9:05:07 PM
Copyright 2002 The FuzzyStuff
Theme Design by Bryan Bell