Does Searchdaimon support utf8 documents?
How do you correctly index documents with special characters in it?
For example this character is a “normal” character in our language (slovenian):
http://www.fileformat.info/info/unicode/char/10c/index.htm
Currently the above character gets replaced by this one:
http://www.fileformat.info/info/unicode/char/e8/index.htm
Here is an example of the html document we want to index:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html>
<head>
</head>
<body>
Pri večini izposojenk je prišlo za izposojo v poštev več stoletij, le redke se je dalo časovno umestiti do stoletja natančno. Zelo važno je bilo ločiti pravila ...
</body>
</html>
How can we index these type of documents so that searching would also be possible.
Thanx.