- Tap into the public OPML directories.
- Expect the public OML directories to all be in slightly different flavors of OPML.
- Regularly revise and test your regex's if you use a regex based OPML parser. We've never been comfortable with any of the public OPML parsing libraries so we rolled a regex parser and use it regularly while often griping about it. ;-)
- Don't expect sites like Odeo to support a public, parseable directory. Understanding their URL syntax tho will go a long way.
- Expect that lots of things won't appear in any directory. I just finished writing a "Deep Discovery" tool that utilizes our database of 17 million plus feed urls and scans it for podcasts. Right now its added another 500 podcasts (which doesn't sound like a lot but when your base is 38,000 or so, its actually statistically significant) (oh and its still running). (Up to 640 more feeds since I started this post)
- Realize that there are lots of non-US podcasting sites. We've even found podcasting in russian.
- Realize that people may consider themselves podcasters but never have encountered an enclosure tag in their lives. I don't want to name names but this was very, very surprising to me.
- Subscription and Metadata standards are confused. Things like the iTunes metadata extensions (example feed using them) versus the iTunes pcast files and the pcast:// psuedo urls.
Right now implementing podcast search is a lot like implementing blog search was when we started Feedster -- confused and emergent. You'll have to do a rat load of work you don't want to and don't feel should be necessary. But, in the end*, it'll be worth it.
*there can be only one. *Insert Geek Pop Culture Referential Chuckle Here*