PHP strlen vs mb_strlen for UTF-8 characters
Jul 10, 2025 pm 12:59 PMstrlen is not suitable for counting UTF-8 characters because it calculates the number of bytes rather than characters; 1. For example, "Hello" occupies 6 bytes, but only 2 characters; 2. The mblen function needs to specify UTF-8 encoding to count correctly; 3. Not specifying the encoding or the file is not UTF-8 may cause errors; 4. You need to select strlen or mb_strlen according to actual needs; 5. Pay attention to the explicit declaration of extension loading and encoding when using it.
When processing UTF-8-encoded strings, there is a significant difference in behavior between strlen
and mb_strlen
in PHP. If you only use strlen
to count the number of Chinese, Japanese, or other multibyte characters, the results may confuse you.

Why is strlen
not suitable for counting UTF-8 characters?
strlen
is a native string length function in PHP, which calculates the number of bytes , not the number of characters. In UTF-8 encoding, a character may take up 1 to 4 bytes. For example:
- English letters (such as "a"): 1 byte
- Chinese characters (such as "中"): 3 bytes
- Certain special emojis (such as?): 4 bytes
So when you run:

echo strlen("Hello"); // Output 6
The output is 6, because "you" and "good" each occupy 3 bytes, with a total of 6 bytes, which is not an intuitive number of two characters.
mb_strlen
is the real way to count by character
Use mb_strlen
and specify the encoding as UTF-8 to correctly return the number of characters:

echo mb_strlen("Hello", "UTF-8"); // Output 2
This function will identify the actual boundary of each character according to UTF-8 encoding rules, thereby accurately counting the number of characters.
A few things to note:
- If no encoding is specified, some environments may use non-UTF-8 encoding methods (such as GBK or ISO-8859-1) by default, resulting in an error in the result.
- Make sure your script file is saved in UTF-8 format and avoid the strings in the source code being garbled.
How to choose in actual use?
- Only care about the number of bytes (such as limiting HTTP request size, database field length, etc.) → Use
strlen
- You need to count the number of characters that the user sees (such as inputting up to 10 characters for the user name, intercepting the first 5 Chinese characters, etc.) → Use
mb_strlen
For example:
$username = "Zhang Sanabc"; echo mb_strlen($username, "UTF-8"); // Output 5 (Picture, Three, A, B, C)
Beware of these pitfalls
- Some frameworks or libraries may not load
mbstring
extension by default. At this time, callingmb_strlen
will report an error. - Don't rely on the default encoding settings, it is recommended to pass
"UTF-8"
explicitly in every call. - When doing string interception, you should also use
mb_substr
instead ofsubstr
, otherwise garbled or truncated errors will also occur.
Basically that's it.
The above is the detailed content of PHP strlen vs mb_strlen for UTF-8 characters. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

To merge two PHP arrays and keep unique values, there are two main methods. 1. For index arrays or only deduplication, use array_merge and array_unique combinations: first merge array_merge($array1,$array2) and then use array_unique() to deduplicate them to finally get a new array containing all unique values; 2. For associative arrays and want to retain key-value pairs in the first array, use the operator: $result=$array1 $array2, which will ensure that the keys in the first array will not be overwritten by the second array. These two methods are applicable to different scenarios, depending on whether the key name is retained or only the focus is on

exit() is a function in PHP that is used to terminate script execution immediately. Common uses include: 1. Terminate the script in advance when an exception is detected, such as the file does not exist or verification fails; 2. Output intermediate results during debugging and stop execution; 3. Call exit() after redirecting in conjunction with header() to prevent subsequent code execution; In addition, exit() can accept string parameters as output content or integers as status code, and its alias is die().

The rational use of semantic tags in HTML can improve page structure clarity, accessibility and SEO effects. 1. Used for independent content blocks, such as blog posts or comments, it must be self-contained; 2. Used for classification related content, usually including titles, and is suitable for different modules of the page; 3. Used for auxiliary information related to the main content but not core, such as sidebar recommendations or author profiles. In actual development, labels should be combined and other, avoid excessive nesting, keep the structure simple, and verify the rationality of the structure through developer tools.

There are two ways to create an array in PHP: use the array() function or use brackets []. 1. Using the array() function is a traditional way, with good compatibility. Define index arrays such as $fruits=array("apple","banana","orange"), and associative arrays such as $user=array("name"=>"John","age"=>25); 2. Using [] is a simpler way to support since PHP5.4, such as $color

When you encounter the prompt "This operation requires escalation of permissions", it means that you need administrator permissions to continue. Solutions include: 1. Right-click the "Run as Administrator" program or set the shortcut to always run as an administrator; 2. Check whether the current account is an administrator account, if not, switch or request administrator assistance; 3. Use administrator permissions to open a command prompt or PowerShell to execute relevant commands; 4. Bypass the restrictions by obtaining file ownership or modifying the registry when necessary, but such operations need to be cautious and fully understand the risks. Confirm permission identity and try the above methods usually solve the problem.

The way to process raw POST data in PHP is to use $rawData=file_get_contents('php://input'), which is suitable for receiving JSON, XML, or other custom format data. 1.php://input is a read-only stream, which is only valid in POST requests; 2. Common problems include server configuration or middleware reading input streams, which makes it impossible to obtain data; 3. Application scenarios include receiving front-end fetch requests, third-party service callbacks, and building RESTfulAPIs; 4. The difference from $_POST is that $_POST automatically parses standard form data, while the original data is suitable for non-standard formats and allows manual parsing; 5. Ordinary HTM

To safely handle PHP file uploads, you need to verify the source and type, control the file name and path, set server restrictions, and process media files twice. 1. Verify the upload source to prevent CSRF through token and detect the real MIME type through finfo_file using whitelist control; 2. Rename the file to a random string and determine the extension to store it in a non-Web directory according to the detection type; 3. PHP configuration limits the upload size and temporary directory Nginx/Apache prohibits access to the upload directory; 4. The GD library resaves the pictures to clear potential malicious data.

InPHP,variablesarepassedbyvaluebydefault,meaningfunctionsorassignmentsreceiveacopyofthedata,whilepassingbyreferenceallowsmodificationstoaffecttheoriginalvariable.1.Whenpassingbyvalue,changestothecopydonotimpacttheoriginal,asshownwhenassigning$b=$aorp
