Future functions - Searchdaimon Open Source Enterprise Search

Introduction

This part of the manual describes functions that are expected to be released soon. Normally these functions will not work on your ES, but we sometimes posted proposed documentation her for further references and feedback.

Add documents with REST api

Resource	GET	POST	Put	DELETE
Collection http://example.com/documents/collection				Delete the entire collection
Document http://example.com/documents/collection/item		Add or Updates a document	Same as POST	Delete the addressed document

Add or Updates a document - POST

Will upload the local file test.png to the ES as test.png into the httpup collection.

	curl --data-binary @test.png http://example.com/documents/httpup/test.png

Download the the addressed document - GET

	# curl -XGET http://example.com/documents/httpup/test.png

Delete the addressed document - DELETE

Delete the test.png from the httpup collection.

	curl -XDELETE http://example.com/documents/httpup/test.png

Delete a collection - DELETE

Delete the httpup collection.

	curl -XDELETE http://example.com/documents/httpup

Add delayed. A faster way to add by batch

When you add a document by just posting it the document get indexed immediately. Indexing is a costly process that is best done in batches. If you are going to add several documents to the same collection it is better to add them using the add delayed function.

The add delayed function writes the document to disk, but don’t do the expensive indexing before you closes the collection.

	curl --data-binary @test.png -X ADDDELAYED http://example.com/documents/httpup/test.png
	curl --data-binary @other.png -X ADDDELAYED http://example.com/documents/httpup/other.png

	curl -XCLOSE http://example.com/documents/httpup

Example: Add a folder with a Windows bat file using Curl

It is easy to make a Windows bat file that uploads a folder to the ES. Copy the code into a file named pushit.bat and run it from the command line with desired folder and ES server as command line:

pushit.bat example.

Usage: pushit.bat folder collection server

@echo off
setlocal enableDelayedExpansion


for /f "usebackq tokens=*" %%f in (`dir /b/s /a:-D %1`) do (
    set "url=%%f"
    set "url=!url: =%%20!"

    curl --data-binary "@%%f" "http://%3/documents/%2/!url!"
)

Require curl for Windows. Available for free her ( you probably want the "Win32 - Generic" or "Win64 - Generic" version).

Example: Uploading a file using libcurl and C

#define _GNU_SOURCE /* For asprintf */
#include <stdio.h>
#include <curl/curl.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
  CURL *curl;
  CURLcode res;
  struct stat file_info;
  double speed_upload, total_time;
  FILE *fd;

  char *file;
  char *url;

  if(argc < 4) {
        fprintf(stderr,"Usage: httpput file server collection\n");
        return 1;
  }
  file= argv[1];
  if(asprintf(&url,"http://%s/documents/%s/%s",argv[2],argv[3],argv[1]) < 0) {
    perror("Building url");
  }


  fd = fopen(file, "rb"); /* open file to upload */
  if(!fd) {
    perror(file);
    return 1; /* can't continue */
  }

  /* to get the file size */
  if(fstat(fileno(fd), &file_info) != 0) {
    perror("fstat");
    return 1; /* can't continue */
  }

  curl = curl_easy_init();
  if(curl) {
    /* upload to this place */
    curl_easy_setopt(curl, CURLOPT_URL,url);

    /* tell it to "upload" to the URL */
    curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);

    /* set where to read from (on Windows you need to use READFUNCTION too) */
    curl_easy_setopt(curl, CURLOPT_READDATA, fd);

    /* and give the size of the upload (optional) */
    curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE,
                     (curl_off_t)file_info.st_size);

    /* enable verbose for easier tracing */
    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);

    res = curl_easy_perform(curl);
    /* Check for errors */
    if(res != CURLE_OK) {
      fprintf(stderr, "curl_easy_perform() failed: %s\n",
              curl_easy_strerror(res));

    }
    else {
      /* now extract transfer info */
      curl_easy_getinfo(curl, CURLINFO_SPEED_UPLOAD, &speed_upload);
      curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME, &total_time);

      fprintf(stderr, "Speed: %.3f bytes/sec during %.3f seconds\n",
              speed_upload, total_time);

    }
    /* always cleanup */
    curl_easy_cleanup(curl);
  }

  free(url);
  return 0;
}

Downlaod the source here.

Compiling

Probably something like this:

gcc fileupload.c -o fileupload -lcurl  -lssl -lcrypto –ldl

Please refer to the libcurl manual for more info.

Additional add parameters

Basic parameters

title	The documents title
acl_allow	Comma separated list of users and groups that has access to the document
acl_denied	Comma separated list of users and groups that shall not have access to the document
documenttype
documentformat

Example:

Will upload the local file test.png to the ES and set the title to "Test image".

	curl --data-binary @test.png "http://example.com/documents/httpup/test.png?title==Test%20image"

Attributes

Attributes are meta-information about a file. For example witch email folder a particular email is stored in, or witch project a file belongs to. You can use attributes to display meta-information and do filtering from the search results.

Attributes are key value pair separate with = and each pair is then separated with a comma. The basic format is:

Key=value, Key2=some value2

You then have to url encode this separately. So the data you will give to curl will be:

Key%3Dvalue, Key2%3D some%20value2

Example:

Will upload the local file test.png to the ES as test.png into the httpup collection and set the attribute “project” to be “test” and “author” “Runar Buvik”.

	curl --data-binary @test.png "http://example.com/documents/httpup/test.png?attributes=project%3Dtest,author%3DRunar%20Buvik"

Searching with the REST api

The ES comes with a well-designed API for returning search results so you can design your own user interfaces. For searching there are 3 different modes, depending on how you want to authenticate your users. All the different modes can be used on the same ES at the same time.

Anonymous

No authentication. You will only be able to search data stored in collections set to anonymous. This is the default if you don't have a user system.

Forward authentication

Users will supply both a username and a password to your application. Your application will forward this to the ES, and the ES will then handle authentication. For example to make an iPhone search app you would prompt the user for his username and password, and then forward it to the ES. The ES then check against the user system, for example Microsoft Active Directory, to verify that the username and password is correct.

Pre authenticated

You will handle authentication and only send the username to the ES. For example in a CRM system, different users have access to different documents, but the CRM system has its own user system with information about each user. There is no need to send passwords to the ES. Instead you will be sending a special key that tells the ES that it can trust that authentication already have been performed. The ES will still handle document level security based on users and groups.

Api url

In the administrator interface there is a page named “Api info” that can help you generate the correct url for api calls.

Basic format

Replace query=example with query=word to search for other words.

Anonymous

Url: http://{hostname}/webclient2/api/anonymous/sd/2.1/search?query=example

Forward authentication

Url: http://{hostname}/webclient2/api/sd/2.1/search?query=example

Pre authenticated

Url: http://{hostname}/cgi-bin/dispatcher_allbb?query=example&user=Everyone&bbkey={secret key}

Additional url parameters

navmenucfg

Base 64 encoded config.

collection

Limit hits to a specific collection.

outformat

Set to "opensearch" to get output xml in the Open Search format or "json" to get JSON output.

maxhits

Maximum number of hits to return in a single response. If there is more hits that can be returned use “page” to get the next set. Also see results paging below.

Set to 0 to only get the result_info header, without any results or navigation menu.

page

See results paging below.

Results paging

Results are return as pages of number of maxhits. For example; if a query has 49 hits and you use a maxhits of 20, a basic api call will give you the first 20 hits. You can then set page=2 to get results 21-40, and page=3 to get results 41-49.

XML result

The results from an API call are returned as XML.

Basic elements

RESULT_INFO

Info about the results.

Element
TOTAL	Total number of results found
SPELLCHECKEDQUERY	If the query was misspelled a suggestion may be here
QUERY	Query as typed by the user. Can be used to show the user what he did search for
TIME	Total time used
FILTERED	Number of results that were removed by filters
SHOWABAL	Number of results returned. May be maxhits or less
CASHE	1 if result was from internal cache, 0 else
NROFSEARCHNODES	Number of backend nodes that was involved in answering your query
XMLVERSION	The version number of the xml. This is not the same as the API version

RESULT

A single result.

Element
TITLE	Title of the result
URL	Uniform Resource Locator
URI	Uniform Resource Identifier. Do not use
FULLURI	Uniform Resource Identifier. Do not use
Attributes	List of attributes. See below
VID	Virtual id. An uniq identifier
DOCUMENTLANGUAGE	Written languages of the underlying document. Currently not in use
DOCUMENTTYPE	Type of the underlying document
POSISJON	Position in the result set
filetype	Filetype of the underlying document
icon	What icon to display
THUMBNAIL	Link to Thumbnail. You must prefix with responding server
THUMBNAILWIDTH	Thumbnail width
THUMBNAILHEIGHT	Thumbnail height
DESCRIPTION_LENGTH	Length of description
DESCRIPTION	Description to show the user. May be plain text or an html table
CRC32	CRC32 of the underlying document
TERMRANK	Dynamic ranking describing how good this query matches this result
POPRANK	Static ranking
ALLRANK	Merge of dynamic and static ranking
NROFHITS	Number of times the query occurred in the result
RESULT_COLLECTION	Name of the collection where the result is stored
TIME_UNIX	UNIX timestamp of last change
TIME_ISO	Iso time of last change
CACHE	Info needed o retrieve an cached version of the document