As I've commented on my Feedster Blog, which is alas, down right now, Feedster just added Podcast Search. Here are some thoughts on that:

  1. Tap into the public OPML directories.
  2. Expect the public OML directories to all be in slightly different flavors of OPML.
  3. Regularly revise and test your regex's if you use a regex based OPML parser. We've never been comfortable with any of the public OPML parsing libraries so we rolled a regex parser and use it regularly while often griping about it. ;-)
  4. Don't expect sites like Odeo to support a public, parseable directory. Understanding their URL syntax tho will go a long way.
  5. Expect that lots of things won't appear in any directory. I just finished writing a "Deep Discovery" tool that utilizes our database of 17 million plus feed urls and scans it for podcasts. Right now its added another 500 podcasts (which doesn't sound like a lot but when your base is 38,000 or so, its actually statistically significant) (oh and its still running). (Up to 640 more feeds since I started this post)
  6. Realize that there are lots of non-US podcasting sites. We've even found podcasting in russian.
  7. Realize that people may consider themselves podcasters but never have encountered an enclosure tag in their lives. I don't want to name names but this was very, very surprising to me.
  8. Subscription and Metadata standards are confused. Things like the iTunes metadata extensions (example feed using them) versus the iTunes pcast files and the pcast:// psuedo urls.

Right now implementing podcast search is a lot like implementing blog search was when we started Feedster -- confused and emergent. You'll have to do a rat load of work you don't want to and don't feel should be necessary. But, in the end*, it'll be worth it.


*there can be only one. *Insert Geek Pop Culture Referential Chuckle Here*