Windows Live Agents: Matching international characters in XMLs


When we use XMLs, they come with UTF-8 encoding, so there is no problem in storing into them international characters, like for example the spanish word "España".

We may have for example an XML with a list of countries written in spanish, and why not with first capital letter (for writing directly a country from the xml to the conversation window without uppercasing first letter ;).

We create a subpattern and load it with that xml data:

subpattern CountriesSubPattern get Name, Name in CountriesTable {score=MACRO_STRONG_SCORE}

Testing our Agent we will detect that he doesn't understands "España" as a valid country.

The problem is that loading XMLs we might have incorrectly setup the index property of the DataTable's parameter, or how the subpattern needs to receive the user input.

datatable CountriesTable {expire="in 1 day"}
load Name {index=case-insensitive} from datasource
CountriesXMLFileLoader()

subpattern CountriesSubPattern get Name, Name in CountriesTable {style=raw}

For international characters, like spanish ones, we need "style=raw", because otherwise the Agent will use the thawed version (in the 4.3 SDK, "España" transforms into "espan a").

Note: This was a problem we recently had with the 4.3 version of the SDK, and just using XML files.
Using the Beta 5.0 SDK with a non-xml subpattern like the following the matching is perfect without adding properties:

subpattern CountriesSubPattern
+ españa
+ inglaterra

+ COUNTRY=CountriesSubPattern {score=MACRO_STRONG_SCORE}
- COUNTRY is a country

Comments?

Posted by Kartones on 2008-03-10