I thought a lot about 1st projects to learn Python and, well, my first idea was overly ambitious.  So I thought more about computing problems I have every single day and one came to me: Gaim IM Logs, well, suck.  I mean the log format is fine but the search is deadly slow I often need access to them when I'm not at the machine where the log was created.  So why not write a Python app that:

  1. Runs as a service in the background.
  2. Monitors the Gaim log directory structure
  3. When it finds a new log file upload it to gmail adding whatever metadata it can find to make it better searchable
  4. Extract all embedded urls from the IM logs (hey I'm processing them anyway) to a local "IM Catalog", time sorted newest at the top -- how often do you need a url someone sent you and you can't remember exactly who or which account you had but you'd know it if you saw it.  This is a feature I had in my late, lamented only by me, Inbox Buddy product and it rocked.

If you think about it, this project covers:

  1. Overall application structure
  2. Application configuration since you need to know where to look to find the user's specific IM logs, their gmail account name, mail server, etc
  3. Data parsing
  4. Network IO in the form of mail sending
  5. HTML generation

That's a lot of the basic primitives I use every single day in PHP.  It'll be interesting to see how Python compares.

Step 1 - a basic recursive directory scanner looking for log files.