Friday, December 26, 2008

You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server

You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server

When dealing with Unicode string constants in SQL Server you must precede all Unicode strings with a capital letter N, as documented in the SQL Server Books Online topic "Using Unicode Data". The "N" prefix stands for National Language in the SQL-92 standard, and must be uppercase. If you do not prefix a Unicode string constant with N, SQL Server will convert it to the non-Unicode code page of the current database before it uses the string.

This notation is necessary to provide backward compatibility with existing applications. For example, "SELECT 'Hello'" must continue to return a non-Unicode string because many applications will expect the behavior of SQL Server 6.5, which did not support Unicode data; the new syntax "SELECT N'Hello'" has been added to allow the passing of Unicode strings to and from SQL Server 7.0.

Any time you pass Unicode data to SQL Server you must prefix the Unicode string with N. If your application is Unicode-enabled and sends data to SQL Server 7.0 as Unicode string constants without the N prefix, you may encounter a loss of character data. When SQL Server converts a Unicode string without the N prefix from Unicode to the SQL Server database's code page, any characters in the Unicode string that do not exist in the SQL Server code page will be lost. Note that this translation is not related to Autotranslation, OemToAnsi, or AutoAnsiToOem conversion, all of which occur on the client at the ODBC, OLEDB, or DB-Library layer.

If your application does not send Unicode data to SQL Server and the client's ANSI code page matches the SQL Server code page, there is no need to prefix string constants with N, and you will not experience data loss as a result of omitting the prefix. However, SQL Server 7.0 allows you to select a Unicode collation during installation that is distinct from the sort order, and in some cases this can cause operations involving strings prefixed with N to have different results from those that do not have the prefix. For example, suppose that when you installed SQL Server 7.0, you selected a binary sort order (sort orders are used when comparing non-Unicode strings), and selected General Unicode as the Unicode collation (the Unicode collation is used for comparing Unicode strings). The expression comparing two non-Unicode strings ("ABC" = "abc") would return False since a capital letter "A" is not equivalent to a lower-case "a" according to a binary sort order. In contrast, the expression (N'ABC' = N'abc') would return True. Because the strings are prefixed with an N, they will be converted to Unicode and the Unicode collation will be used to compare them. Unlike the binary sort order, the General Unicode collation is case insensitive and would regard the two strings as equivalent.

Note that if one of two string constant operands is prefixed with an N and the other is not, the non-Unicode string will be converted to Unicode and the Unicode collation will apply when comparing them. This behavior is explained in the SQL Server Books Online topic "Comparison Operators".

No comments: