We are evaluating the VM version of your product to try and create a centralized search of external hosted (anon) sites, intranet secured sites and file shares. Thank you for creating the eval as a VM - very smart move on your part, makes it easy for SMBs to test and confirm your solution will work for them.
No problem with indexing the hosted external sites that do not have access restrictions. However, I have been unsuccessful with our WordPress intranet sites.
We have a multi-site WordPress 3.2 server (Linux) on our intranet that uses an AD integration plug-in.
I can index an ‘open’ site (one that does not require authentication), but cannot index those that require AD authentication, even though I am using a ‘resource authentication’ user account that has domain admin priv.
I’ve tried cloning your intranet connector, modifying the $user variable to look for ‘username’ which is what the AD login dialog uses, but to no avail.
Running the crawl does not result in an error, it shows 2 documents indexed - which are the wp.login.php and wp-signup.php pages (accessible to anon).
Any ideas/suggestions on how I can craw/index these AD integrated WP sites?
Other comments:
‘Stop Crawl’ does not seem to work. I started an index of an external site, created another collection, realized I could only do one of each type at a time, so I pressed ‘Stop Crawl’, but it continues to index the site. Same issue with an SMB collection.
Incremental crawls: It appears your crawl data is not available for search until the crawl completes (and ostensibly the data is then saved in the database). Is this correct?