From keld@dkuug.dk Fri Jun 17 09:03:02 1994 Received: by dkuug.dk id AA19315 (5.65c8/IDA-1.4.4j for i18n@dkuug.dk); Fri, 17 Jun 1994 07:03:07 +0200 Message-Id: <199406170503.AA19315@dkuug.dk> From: keld@dkuug.dk (Keld J|rn Simonsen) Date: Fri, 17 Jun 1994 07:03:02 +0200 In-Reply-To: ALB@immedia.ca "(TC304.190) Full-text searching: don't keep it simple and stupid!" (Jun 16, 17:23) X-Charset: ASCII X-Char-Esc: 29 Mime-Version: 1.0 Content-Type: Text/Plain; Charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Mnemonic-Intro: 29 X-Mailer: Mail User's Shell (7.2.2 4/12/91) To: ALB@immedia.ca, bealle@torolab6.vnet.ibm.com, cpwg-mail@revcan.ca, paref@vm1.ulaval.ca, umavs@torolab6.vnet.ibm.com Subject: Re: (TC304.190) Full-text searching: don't keep it simple and stupid! Cc: i18n@dkuug.dk, sc22wg20@dkuug.dk, tc304@dkuug.dk ALB@immedia.ca writes: > Subject : Full-text search: don't keep it simple and stupid > > >Keld, my company (which produces a full text search product) is > >attempting to establish character classes for various European > >languages. For most such languages, our users prefer that we > >ignore case and accents. > > > >However, Danish seems to have some exceptions to this. An 'O' > >with a slash is treated as a separate letter. Are there others? > >For example, would users be upset if a search for "angstrom" > >ignored the ring, or conversely, would they be upset if a search > >with the ring did NOT find ones without (and vice-versa)? > >What is normal practice in Denmark? > > Keld answered, legitimately and correctly: > > >In Denmark, the letters O WITH STROKE, AE and A WITH RING are genuine > >letters and people would be very upset if it is not handled as such. > > Now I think for French (and perhaps German and other languages too), the answer > is unfortunately not as simple. I agree with Alain, that a number of parameters should be available, so different searches (for example with regards to precision) are possible. The point in my above comment was that a cultural requirement is also needed as a parameter, and that is not listed in Alain's model. Or maybe you could say it is implicitely included, as the comparison is done on a sorting algoritm - which may be cultural dependent, as per the different national POSIX locales available. Keld