Thanks to Runar, we can use SD to crawl our internal WordPress sites and it works great.
We have one wrinkle I have yet to resolve and that is to prevent/exclude the indexing of the tag cloud words. A common standard widget on many WP sites is a tag cloud. Since each ‘tag’ in the cloud is a link to all pages with that tag, and every page on the site has a tag cloud - if you search for any word that is the tag cloud, you end up with results for all pages on your site….which isn’t very helpful.
While I can modify the robots.txt to exclude all pages with /tag/ in the url, what I haven’t been able to resolve is not indexing the actual tag words that appear in within the tag cloud widget on every page. Fortunately, the ranking tends to put the more relevant page at the top, I’m just trying to reduce the clutter (though it may not be practical/possible to do so).
We use SquareSpace to host our external website and they use a custom CMS that has similar issues.
Any suggestions?