Java implements obtaining the character encoding of a text file
Dec 23, 2019 am 11:49 AM1. Understanding character encoding:
1. The default encoding of String in Java is UTF-8, which can be obtained using the following statement: Charset.defaultCharset( );
2. Under the Windows operating system, the default encoding of text files is ANSI, which is GBK for Chinese Windows. For example, if we use the Notepad program to create a new text document, its default character encoding is ANSI.
3. Text text documents have four encoding options: ANSI, Unicode (including Unicode Big Endian and Unicode Little Endian), UTF-8, UTF-16
4, so we read txt files may sometimes not know their encoding format, so a program needs to be used to dynamically determine the encoding of the txt file.
ANSI : No format definition, for Chinese operating systems it is GBK or GB2312
UTF-8 : The first three bytes are: 0xE59B9E (UTF-8), 0xEFBBBF (UTF-8 inclusive BOM)
UTF-16: The first two bytes are: 0xFEFF
Unicode: The first two bytes are: 0xFFFE
For example: Unicode documents start with 0xFFFE, use The program just takes out the first few bytes and makes a judgment.
5. Correspondence between Java encoding and Text encoding:
Java reads Text files. If the encoding format does not match, garbled characters will appear. Therefore, you need to set the correct character encoding when reading text files. The encoding format of Text documents is written in the file header. In the program, the encoding format of the file needs to be parsed first. After obtaining the encoding format, reading the file in this format will avoid garbled characters.
Free online video tutorial recommendation: java learning
2. For example:
There is a text file: test.txt
Test code:
/** * 文件名:CharsetCodeTest.java * 功能描述:文件字符編碼測(cè)試 */ import java.io.*; public class CharsetCodeTest { public static void main(String[] args) throws Exception { String filePath = "test.txt"; String content = readTxt(filePath); System.out.println(content); } public static String readTxt(String path) { StringBuilder content = new StringBuilder(""); try { String fileCharsetName = getFileCharsetName(path); System.out.println("文件的編碼格式為:"+fileCharsetName); InputStream is = new FileInputStream(path); InputStreamReader isr = new InputStreamReader(is, fileCharsetName); BufferedReader br = new BufferedReader(isr); String str = ""; boolean isFirst = true; while (null != (str = br.readLine())) { if (!isFirst) content.append(System.lineSeparator()); //System.getProperty("line.separator"); else isFirst = false; content.append(str); } br.close(); } catch (Exception e) { e.printStackTrace(); System.err.println("讀取文件:" + path + "失敗!"); } return content.toString(); } public static String getFileCharsetName(String fileName) throws IOException { InputStream inputStream = new FileInputStream(fileName); byte[] head = new byte[3]; inputStream.read(head); String charsetName = "GBK";//或GB2312,即ANSI if (head[0] == -1 && head[1] == -2 ) //0xFFFE charsetName = "UTF-16"; else if (head[0] == -2 && head[1] == -1 ) //0xFEFF charsetName = "Unicode";//包含兩種編碼格式:UCS2-Big-Endian和UCS2-Little-Endian else if(head[0]==-27 && head[1]==-101 && head[2] ==-98) charsetName = "UTF-8"; //UTF-8(不含BOM) else if(head[0]==-17 && head[1]==-69 && head[2] ==-65) charsetName = "UTF-8"; //UTF-8-BOM inputStream.close(); //System.out.println(code); return charsetName; } }
Running results:
Recommended related articles and tutorials: Getting started with java
The above is the detailed content of Java implements obtaining the character encoding of a text file. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

To correctly handle JDBC transactions, you must first turn off the automatic commit mode, then perform multiple operations, and finally commit or rollback according to the results; 1. Call conn.setAutoCommit(false) to start the transaction; 2. Execute multiple SQL operations, such as INSERT and UPDATE; 3. Call conn.commit() if all operations are successful, and call conn.rollback() if an exception occurs to ensure data consistency; at the same time, try-with-resources should be used to manage resources, properly handle exceptions and close connections to avoid connection leakage; in addition, it is recommended to use connection pools and set save points to achieve partial rollback, and keep transactions as short as possible to improve performance.

Use classes in the java.time package to replace the old Date and Calendar classes; 2. Get the current date and time through LocalDate, LocalDateTime and LocalTime; 3. Create a specific date and time using the of() method; 4. Use the plus/minus method to immutably increase and decrease the time; 5. Use ZonedDateTime and ZoneId to process the time zone; 6. Format and parse date strings through DateTimeFormatter; 7. Use Instant to be compatible with the old date types when necessary; date processing in modern Java should give priority to using java.timeAPI, which provides clear, immutable and linear

Pre-formanceTartuptimeMoryusage, Quarkusandmicronautleadduetocompile-Timeprocessingandgraalvsupport, Withquarkusoftenperforminglightbetterine ServerLess scenarios.2.Thyvelopecosyste,

Networkportsandfirewallsworktogethertoenablecommunicationwhileensuringsecurity.1.Networkportsarevirtualendpointsnumbered0–65535,withwell-knownportslike80(HTTP),443(HTTPS),22(SSH),and25(SMTP)identifyingspecificservices.2.PortsoperateoverTCP(reliable,c

Java's garbage collection (GC) is a mechanism that automatically manages memory, which reduces the risk of memory leakage by reclaiming unreachable objects. 1.GC judges the accessibility of the object from the root object (such as stack variables, active threads, static fields, etc.), and unreachable objects are marked as garbage. 2. Based on the mark-clearing algorithm, mark all reachable objects and clear unmarked objects. 3. Adopt a generational collection strategy: the new generation (Eden, S0, S1) frequently executes MinorGC; the elderly performs less but takes longer to perform MajorGC; Metaspace stores class metadata. 4. JVM provides a variety of GC devices: SerialGC is suitable for small applications; ParallelGC improves throughput; CMS reduces

Gradleisthebetterchoiceformostnewprojectsduetoitssuperiorflexibility,performance,andmoderntoolingsupport.1.Gradle’sGroovy/KotlinDSLismoreconciseandexpressivethanMaven’sverboseXML.2.GradleoutperformsMaveninbuildspeedwithincrementalcompilation,buildcac

defer is used to perform specified operations before the function returns, such as cleaning resources; parameters are evaluated immediately when defer, and the functions are executed in the order of last-in-first-out (LIFO); 1. Multiple defers are executed in reverse order of declarations; 2. Commonly used for secure cleaning such as file closing; 3. The named return value can be modified; 4. It will be executed even if panic occurs, suitable for recovery; 5. Avoid abuse of defer in loops to prevent resource leakage; correct use can improve code security and readability.

Choosing the right HTMLinput type can improve data accuracy, enhance user experience, and improve usability. 1. Select the corresponding input types according to the data type, such as text, email, tel, number and date, which can automatically checksum and adapt to the keyboard; 2. Use HTML5 to add new types such as url, color, range and search, which can provide a more intuitive interaction method; 3. Use placeholder and required attributes to improve the efficiency and accuracy of form filling, but it should be noted that placeholder cannot replace label.
