Blog Post: Collation: Definition

I had a recent run in with collation problems and it got me started reading about them. As I read I started to realize that for a seemingly simple subject there was a lot to it. So I thought I would go over some of what I found out. First things first of course. What is a collation? Collation is a property assigned to any character field that defines the sort and comparison rules for the field. The information I’m going over is mostly found in BOL under Collation and Unicode Support . It’s something like 10 pages long which just goes to show I wasn’t exaggerating about the depth of the subject. Options There are four options associated with collation. They are specified by appending them to the collation name along with a _. So Collation_CS_AS_KS_WS is case sensitive, accent sensitive, kana sensitive and width sensitive. Case sensitivity (CS/CI) – Will the sort/comparison take case (upper/lower) of the characters into account. If CS (case sensitive) it will and if CI (case insensitive) it will not. Accent sensitivity (AS/AI) – Will the sort/comparison take into account accented characters (a/ấ). If AS (accent sensitive) then it will. If AI (accent insensitive) it will not. Kana sensitivity (KS) – This option is specific to Japanese kana characters. There are two types of kana characters, Hiragana and Katakana. If KS is specified then the sort/comparison will be treat the kana characters differently. If it is not specified then they are treated the same. There is no KI. Width sensitivity (WS) – I find this an interesting option. This determines if the storage (one byte or two) is taken into account in the sort/comparison. There is no WI. If WS is included then the collation is width sensitive. If it isn’t included then the collation is not width sensitive Collation sets SQL Server supports three different collation sets. Windows collations , Binary collations, and SQL Server collations . I honestly found the differences rather confusing so I’ll let you read up on them yourself. Code pages In addition to the sort/comparison options, for non-Unicode data types (char/varchar/text) collation also determines the code page. Because non-Unicode types only have one byte per character they are more restricted in the number of characters available. The code page determines what character set is available. So for example Hebrew collations will use an entirely different set of characters than English ones. Unicode datatypes (nchar/nvarchar/ntext) don’t have the same problem. Because they have the additional byte per character they can hold a large enough number of possible characters that a code page isn’t really necessary. Over the next few weeks I’m going to do a number of different posts around the surprisingly deep subject of collation. Collation: Location Collation: Expression Collation: Correction Collation: Temporary objects & variables: Demonstration Collation: Confusion Filed under: Collation , Microsoft SQL Server , SQLServerPedia Syndication , T-SQL Tagged: code language , collation , language sql , microsoft sql server , T-SQL