I am just experimenting with crawling some internal applications (specifically, our bug tracker, old bug tracker, and wiki pages—all of which have the typically bad built-in search engines). One thing I notice is that when I search for something like “help link”, I get a huge number of matches because pretty much every app has the text “Help” for a button..
Anyway, as I’m thinking about how to improve the results, I think the easiest way is to use an ATOM feed of tickets/pages/etc. Many apps already can provide ATOM feeds, and it’s pretty easy to output ATOM anyway (and is doable from the native app’s language, if necessary/possible).
Has anyone developed a crawler using ATOM yet?