MySQL Data Methods | 3 | WebReference

MySQL Data Methods | 3

MySQL Data Methods

Properties of Binary and Non-Binary Strings

String values fall into two general categories, binary and non-binary:

  • A binary string is a sequence of bytes.These bytes are interpreted without respect to any concept of character set. A binary string has no special comparison or sorting properties. Comparisons are done byte by byte based on numeric byte values. Trailing spaces are significant in comparisons.

  • A non-binary string is a sequence of characters. It is associated with a character set, which determines the allowable characters that may be used and how MySQL interprets the string contents. Character sets have one or more collating (sorting) orders.The particular collation used for a string determines the ordering of characters in the character set, which affects comparison operations.Trailing spaces are not significant in comparisons.The default character set and collation are latin1 and latin1_swedish_ci.

Character units vary in their storage requirements. A single-byte character set such as latin1 uses one byte per character, but there also are multi-byte character sets in which some or all characters require more than one byte. For example, both of the Unicode character sets available in MySQL are multi-byte. ucs2 is a double-byte character set in which each character requires two bytes. utf8 is a variable-length multi-byte character set with characters that take from one to three bytes.

To find out which character sets and collations are available in your server as it currently is configured, use these two statements:

As shown by the output from SHOW COLLATION, each collation is specific to a given character set, but a given character set might have several collations. Collation names usually consist of a character set name, a language, and an additional suffix. For example, utf8_icelandic_ci is a collation for the utf8 Unicode character set in which comparisons follow Icelandic sorting rules and characters are compared in case-insensitive fashion. Collation suffixes have the following meanings:

  • _ci indicates a case-insensitive collation.

  • _cs indicates a case-sensitive collation.

  • _bin indicates a binary collation.That is, comparisons are based on character code values without reference to any language. For this reason, _bin collation names do not include any language name. Examples: latin1_bin and utf8_bin.

The sorting properties for binary and non-binary strings differ as follows:

  • Binary strings are processed byte by byte in comparisons based solely on the numeric value of each byte. One implication of this property is that binary values appear to be case sensitive, but that actually is a side effect of the fact that uppercase and lowercase versions of a character have different numeric byte values.There isn't really any notion of lettercase for binary strings. Lettercase is a function of collation, which applies only to character (non-binary) strings.

  • Non-binary strings are processed character by character in comparisons, and the relative value of each character is determined by the collating sequence that is used for the character set. For most collations, uppercase and lowercase versions of a given letter have the same collating value, so non-binary string comparisons typically are not case sensitive. However, that is not true for case-sensitive or binary collations.

Because collations are used for comparison and sorting, they affect many operations:

  • Comparisons operators: , >=, >, and LIKE.

  • Sorting: ORDER BY, MIN(), and MAX().

  • Grouping: GROUP BY and DISTINCT.

To determine the character set or collation of a string, you can use the CHARSET() and COLLATION() functions.

Quoted string literals are interpreted according to the current server settings.The default character set and collation are latin1 and latin1_swedish_ci:

MySQL treats hexadecimal constants as binary strings by default:

Created: March 27, 2003
Revised: April 24, 2006