Understanding Character Sets and Collations in MySQL
Jul 11, 2025 am 02:50 AMThe character set determines which characters are stored in the database, and it is recommended to use utf8mb4; the sorting rules affect comparison and sorting behavior. Commonly used character sets include latin1, utf8, and utf8mb4, among which utf8mb4 supports emoji. Common sorting rules include utf8mb4_unicode_ci (case insensitive), utf8mb4_bin (case sensitive), and utf8mb4_0900_ci (modern language habits). Set the level from high to low to: Connection layer > Table level > Database level > Server global. The configuration methods are: the server level is set in my.cnf or my.ini, specified when the database level is created, defined when the table level is created, and the connection layer uses SET NAMES or mysqli_set_charset. Migrating old data should be exported first and then imported. The fields can be set separately, pay attention to index behavior changes and connection character set settings.
MySQL's character set and collation directly affect data storage and query behavior, especially when dealing with multilingual content. Understanding the role and setting methods of these two can avoid many problems of garbled code and inaccurate query.

What is a character set? What are the commonly used ones?
Character Set is a set of encoding rules used by a database to store and compare characters. Simply put, it determines what texts can be stored in your database, such as English, Chinese, Japanese, etc.

MySQL supports a variety of character sets, common ones include:
-
latin1
: Applicable to Western European languages, but not Chinese -
utf8mb4
: Full support for Unicode, including emoji (emoji) -
utf8
: looks like UTF-8, but it is actually a castrated version, supports up to 3 bytes of characters, and cannot store emoji
utf8mb4
is now recommended because it has the widest coverage and the best compatibility.

What are the sorting rules?
Collation determines the behavior of characters when comparing and sorting. For example, if you execute WHERE name = 'Tom'
or ORDER BY name
, the database will judge whether the case is sensitive and whether the accent affects the comparison based on the sorting rules.
Common rules such as:
-
utf8mb4_unicode_ci
: Based on the Unicode standard sorting rules,ci
means "case-insensitive", that is, case-insensitive -
utf8mb4_bin
: Compare in binary, case sensitive, and also accent characters -
utf8mb4_0900_ci
: New rules introduced by MySQL 8.0 are more in line with modern language habits
If your application needs to be case sensitive or accented, you need to choose the appropriate collation; otherwise, it is generally enough to end with _ci
.
What are the settings levels? How to configure it?
MySQL's character set and collation rules can be set at multiple levels, with priority from high to low: Connection layer > Table level > Database level > Server global settings.
-
Server level : Modify
my.cnf
ormy.ini
file and add it in the[mysqld]
section:character-set-server=utf8mb4 collation-server=utf8mb4_unicode_ci
Database level : Specified when creating the database:
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Table level : Added when creating table:
CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Connection layer : Ensure that the client also uses the correct character set when connecting, for example in PHP:
mysqli_set_charset($conn, "utf8mb4");
Or execute SQL after connection:
SET NAMES 'utf8mb4';
If these levels are not unified, there may be a situation where "there is clearly Chinese, but it cannot be found".
Frequently Asked Questions and Notes
Pay attention to conversion when migrating old data : before migrating
latin1
orutf8
data toutf8mb4
, it is best to export and then import it first. Do not modify the table character set directly, otherwise garbled code may occur.Field-level collation can be set separately : Some fields may want to be case sensitive, and collation can be set separately:
ALTER TABLE users MODIFY name VARCHAR(100) COLLATE utf8mb4_bin;
Pay attention to index behavior changes : different collations will affect how the index matches. For example, fields of type
_ci
,'abc'
and'ABC'
are considered the same, which may cause unique index conflicts.Don't ignore the connection character set : Sometimes the page displays garbled code, which is not a problem with the database, but the connection is not set correctly. Make sure that
SET NAMES utf8mb4
is performed every time you connect.
Basically that's it. After understanding these levels and setting methods, most of the problems encountered in daily development such as garbled code and inaccurate sorting can be solved.
The above is the detailed content of Understanding Character Sets and Collations in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

mysqldump is a common tool for performing logical backups of MySQL databases. It generates SQL files containing CREATE and INSERT statements to rebuild the database. 1. It does not back up the original file, but converts the database structure and content into portable SQL commands; 2. It is suitable for small databases or selective recovery, and is not suitable for fast recovery of TB-level data; 3. Common options include --single-transaction, --databases, --all-databases, --routines, etc.; 4. Use mysql command to import during recovery, and can turn off foreign key checks to improve speed; 5. It is recommended to test backup regularly, use compression, and automatic adjustment.

When handling NULL values ??in MySQL, please note: 1. When designing the table, the key fields are set to NOTNULL, and optional fields are allowed NULL; 2. ISNULL or ISNOTNULL must be used with = or !=; 3. IFNULL or COALESCE functions can be used to replace the display default values; 4. Be cautious when using NULL values ??directly when inserting or updating, and pay attention to the data source and ORM framework processing methods. NULL represents an unknown value and does not equal any value, including itself. Therefore, be careful when querying, counting, and connecting tables to avoid missing data or logical errors. Rational use of functions and constraints can effectively reduce interference caused by NULL.

GROUPBY is used to group data by field and perform aggregation operations, and HAVING is used to filter the results after grouping. For example, using GROUPBYcustomer_id can calculate the total consumption amount of each customer; using HAVING can filter out customers with a total consumption of more than 1,000. The non-aggregated fields after SELECT must appear in GROUPBY, and HAVING can be conditionally filtered using an alias or original expressions. Common techniques include counting the number of each group, grouping multiple fields, and filtering with multiple conditions.

MySQL paging is commonly implemented using LIMIT and OFFSET, but its performance is poor under large data volume. 1. LIMIT controls the number of each page, OFFSET controls the starting position, and the syntax is LIMITNOFFSETM; 2. Performance problems are caused by excessive records and discarding OFFSET scans, resulting in low efficiency; 3. Optimization suggestions include using cursor paging, index acceleration, and lazy loading; 4. Cursor paging locates the starting point of the next page through the unique value of the last record of the previous page, avoiding OFFSET, which is suitable for "next page" operation, and is not suitable for random jumps.

To view the size of the MySQL database and table, you can query the information_schema directly or use the command line tool. 1. Check the entire database size: Execute the SQL statement SELECTtable_schemaAS'Database',SUM(data_length index_length)/1024/1024AS'Size(MB)'FROMinformation_schema.tablesGROUPBYtable_schema; you can get the total size of all databases, or add WHERE conditions to limit the specific database; 2. Check the single table size: use SELECTta

To set up asynchronous master-slave replication for MySQL, follow these steps: 1. Prepare the master server, enable binary logs and set a unique server-id, create a replication user and record the current log location; 2. Use mysqldump to back up the master library data and import it to the slave server; 3. Configure the server-id and relay-log of the slave server, use the CHANGEMASTER command to connect to the master library and start the replication thread; 4. Check for common problems, such as network, permissions, data consistency and self-increase conflicts, and monitor replication delays. Follow the steps above to ensure that the configuration is completed correctly.

MySQL supports transaction processing, and uses the InnoDB storage engine to ensure data consistency and integrity. 1. Transactions are a set of SQL operations, either all succeed or all fail to roll back; 2. ACID attributes include atomicity, consistency, isolation and persistence; 3. The statements that manually control transactions are STARTTRANSACTION, COMMIT and ROLLBACK; 4. The four isolation levels include read not committed, read submitted, repeatable read and serialization; 5. Use transactions correctly to avoid long-term operation, turn off automatic commits, and reasonably handle locks and exceptions. Through these mechanisms, MySQL can achieve high reliability and concurrent control.

Character set and sorting rules issues are common when cross-platform migration or multi-person development, resulting in garbled code or inconsistent query. There are three core solutions: First, check and unify the character set of database, table, and fields to utf8mb4, view through SHOWCREATEDATABASE/TABLE, and modify it with ALTER statement; second, specify the utf8mb4 character set when the client connects, and set it in connection parameters or execute SETNAMES; third, select the sorting rules reasonably, and recommend using utf8mb4_unicode_ci to ensure the accuracy of comparison and sorting, and specify or modify it through ALTER when building the library and table.
