Fix relating to issues 2813 and 2578. Make DN string representations more user-friendly when they contain non-ascii characters.
This change is a flag day due to the potential for database format incompatibilities introduced by the change in DN normalized form.
Currently the DN and RDN implementations are very conservative regarding the string representation of DNs that they construct. Any non-ascii characters are escaped using back-slashes. For example, the DN:
uid=Météo.0,ou=People,dc=example,dc=com
Is encoded as:
uid=M\c3\a9t\c3\a9o.0,ou=People,dc=example,dc=com
Which is not very readable in LDAP client applications. It is also much less space efficient - something we should consider if we wish to have non-western users of OpenDS who will be heavy users of multi-byte UTF8 sequences. For example, a single Chinese character would be encoded in UTF8 as 3 or 4 bytes IIRC which would equate to 9-12 bytes or a 3X increase. This would have implications for database performance (substrings) and space efficiency.
The change is not without its minor problems however:
1. LDIF cannot contain non-ascii characters so any DNs or attribute
values must be base-64 encoded in order for the LDIF to be valid.
This is not very user-friendly, but it's easier for inquiring
users to decode base 64 than to manually decode UTF8 byte
sequences. A future change could improve this behavior by making
our LDIF generation tools (e.g. ldapsearch, ldif-export) output
comments before each base-64 encoded DN / value containing the DN
/ value in the client's native character set. This is something
that OpenLDAP clients do and I think it is a nice usability feature
2. the dn2id index and any DN / RDN syntax attribute indexes will be
potentially invalid due to the modified DN / RDN normalization
(hence this change is a flag-day)
3. DNs returned to LDAPv2 clients will potentially contain non-T.61
characters (LDAPv3 uses UTF8 and LDAPv2 uses T.61). However, I
don't think we are bothered by this because we already break
compatibility for LDAPv2 clients for directory string based
attribute values which we also return using UTF8.