国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Table of Contents
Create data script" >Create data script
Start testing " >Start testing
普通分頁(yè)查詢(xún)" >普通分頁(yè)查詢(xún)
相同偏移量,不同數(shù)據(jù)量" >相同偏移量,不同數(shù)據(jù)量
相同數(shù)據(jù)量,不同偏移量" >相同數(shù)據(jù)量,不同偏移量
如何優(yōu)化 " >如何優(yōu)化
優(yōu)化偏移量大問(wèn)題" >優(yōu)化偏移量大問(wèn)題
采用子查詢(xún)方式" >采用子查詢(xún)方式
采用 id 限定方式" >采用 id 限定方式
優(yōu)化數(shù)據(jù)量大問(wèn)題" >優(yōu)化數(shù)據(jù)量大問(wèn)題
Home Java JavaInterview questions Interviewer: How did you query 10 million data?

Interviewer: How did you query 10 million data?

Aug 15, 2023 pm 04:34 PM
java java interview

Recently I have been doing mock interviews and resume optimization for everyone, and I found that many people see what tens of millions of data and the like Interview questions will make you weak.

Maybe some people have never encountered a table with tens of millions of data, and they don’t know what will happen when querying tens of millions of data.

Today I will take you through a practical operation. This time it is based on MySQL 5.7.26 for testing

Preparing data

What should I do if I don’t have 10 million data?

Can’t you create it without data?

Is it difficult to create data?

The code creates 10 million?

That's impossible, it's too slow, and it might take a whole day. You can use database scripts to execute much faster.

Create table
CREATE TABLE `user_operation_log`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  `attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
Create data script

Use batch Insert, the efficiency will be much faster, and every 1000 items will be committed. If the amount of data is too large, it will also lead to slow batch insertion efficiency.

DELIMITER ;;
CREATE PROCEDURE batch_insert_log()
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE userId INT DEFAULT 10000000;
 set @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES';
 set @execData = '';
  WHILE i<=10000000 DO
   set @attr = "&#39;測(cè)試很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)很長(zhǎng)的屬性&#39;";
  set @execData = concat(@execData, "(", userId + i, ", &#39;10.0.69.175&#39;, &#39;用戶(hù)登錄操作&#39;", ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");
  if i % 1000 = 0
  then
     set @stmtSql = concat(@execSql, @execData,";");
    prepare stmt from @stmtSql;
    execute stmt;
    DEALLOCATE prepare stmt;
    commit;
    set @execData = "";
   else
     set @execData = concat(@execData, ",");
   end if;
  SET i=i+1;
  END WHILE;

END;;
DELIMITER ;

Start testing

My computer configuration is relatively low: win10 standard pressure i5, read and write about 500MB SSD

Due to the low configuration, I only prepared for this test 3148000 pieces of data were obtained, occupying 5G of disk (without indexing), and ran for 38 minutes. Students with good computer configuration can insert multiple points of data for testing

SELECT count(1) FROM `user_operation_log`

Return result: 3148000

The three query times are:

  • 14060 ms
  • 13755 ms
  • 13447 ms

普通分頁(yè)查詢(xún)

MySQL 支持 LIMIT 語(yǔ)句來(lái)選取指定的條數(shù)數(shù)據(jù), Oracle 可以使用 ROWNUM 來(lái)選取。

MySQL分頁(yè)查詢(xún)語(yǔ)法如下:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset
  • 第一個(gè)參數(shù)指定第一個(gè)返回記錄行的偏移量
  • 第二個(gè)參數(shù)指定返回記錄行的最大數(shù)目

下面我們開(kāi)始測(cè)試查詢(xún)結(jié)果:

SELECT * FROM `user_operation_log` LIMIT 10000, 10

查詢(xún)3次時(shí)間分別為:

  • 59 ms
  • 49 ms
  • 50 ms

這樣看起來(lái)速度還行,不過(guò)是本地?cái)?shù)據(jù)庫(kù),速度自然快點(diǎn)。

換個(gè)角度來(lái)測(cè)試

相同偏移量,不同數(shù)據(jù)量

SELECT * FROM `user_operation_log` LIMIT 10000, 10
SELECT * FROM `user_operation_log` LIMIT 10000, 100
SELECT * FROM `user_operation_log` LIMIT 10000, 1000
SELECT * FROM `user_operation_log` LIMIT 10000, 10000
SELECT * FROM `user_operation_log` LIMIT 10000, 100000
SELECT * FROM `user_operation_log` LIMIT 10000, 1000000

查詢(xún)時(shí)間如下:

##1000 items61ms74ms60ms10000 items164ms180ms217ms100000 Articles1609ms1741ms1764ms##1000000 Articles

從上面結(jié)果可以得出結(jié)束:數(shù)據(jù)量越大,花費(fèi)時(shí)間越長(zhǎng)

相同數(shù)據(jù)量,不同偏移量

SELECT * FROM `user_operation_log` LIMIT 100, 100
SELECT * FROM `user_operation_log` LIMIT 1000, 100
SELECT * FROM `user_operation_log` LIMIT 10000, 100
SELECT * FROM `user_operation_log` LIMIT 100000, 100
SELECT * FROM `user_operation_log` LIMIT 1000000, 100
QuantityFirst timeSecond timeThird time
10 items53ms52ms47ms
100 items50ms60ms55ms
16219ms16889ms17081ms
偏移量第一次第二次第三次
10036ms40ms36ms
100031ms38ms32ms
1000053ms48ms51ms
100000622ms576ms627ms
10000004891ms5076ms4856ms

從上面結(jié)果可以得出結(jié)束:偏移量越大,花費(fèi)時(shí)間越長(zhǎng)

SELECT * FROM `user_operation_log` LIMIT 100, 100
SELECT id, attr FROM `user_operation_log` LIMIT 100, 100

如何優(yōu)化

既然我們經(jīng)過(guò)上面一番的折騰,也得出了結(jié)論,針對(duì)上面兩個(gè)問(wèn)題:偏移大、數(shù)據(jù)量大,我們分別著手優(yōu)化

優(yōu)化偏移量大問(wèn)題

采用子查詢(xún)方式

我們可以先定位偏移位置的 id,然后再查詢(xún)數(shù)據(jù)

SELECT * FROM `user_operation_log` LIMIT 1000000, 10

SELECT id FROM `user_operation_log` LIMIT 1000000, 1

SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10

查詢(xún)結(jié)果如下:

sql花費(fèi)時(shí)間
第一條4818ms
第二條(無(wú)索引情況下)4329ms
第二條(有索引情況下)199ms
第三條(無(wú)索引情況下)4319ms
第三條(有索引情況下)201ms

從上面結(jié)果得出結(jié)論:

  • 第一條花費(fèi)的時(shí)間最大,第三條比第一條稍微好點(diǎn)
  • 子查詢(xún)使用索引速度更快

缺點(diǎn):只適用于id遞增的情況

id非遞增的情況可以使用以下寫(xiě)法,但這種缺點(diǎn)是分頁(yè)查詢(xún)只能放在子查詢(xún)里面

注意:某些 mysql 版本不支持在 in 子句中使用 limit,所以采用了多個(gè)嵌套select

SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t)
采用 id 限定方式

這種方法要求更高些,id必須是連續(xù)遞增,而且還得計(jì)算id的范圍,然后使用 between,sql如下

SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100

SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100

查詢(xún)結(jié)果如下:

sql花費(fèi)時(shí)間
第一條22ms
第二條21ms

從結(jié)果可以看出這種方式非???/p>

注意:這里的 LIMIT 是限制了條數(shù),沒(méi)有采用偏移量

優(yōu)化數(shù)據(jù)量大問(wèn)題

返回結(jié)果的數(shù)據(jù)量也會(huì)直接影響速度

SELECT * FROM `user_operation_log` LIMIT 1, 1000000

SELECT id FROM `user_operation_log` LIMIT 1, 1000000

SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000

查詢(xún)結(jié)果如下:

sql 花費(fèi)時(shí)間
第一條 15676ms
第二條 7298ms
第三條 15960ms

It can be seen from the results that by reducing unnecessary columns, the query efficiency can also be significantly improved

The first and third query speeds are almost the same. At this time, you will definitely complain, then I Why write so many fields? Just * and you’re done.

Note that my MySQL server and client are on the same machine, so the query data is similar. Qualified students can test it. Test the client separately from MySQL

SELECT * Isn’t it delicious?

By the way, I would like to add why SELECT * is banned. Isn't it delicious because it's simple and mindless?

Two main points:

  1. Using "SELECT * " the database needs to parse more objects, fields, permissions, attributes and other related content. When the SQL statements are complex and there are many hard parses, it will put a heavy burden on the database.
  2. Increases network overhead, * Sometimes useless and large text fields such as log and IconMD5 are mistakenly added, and the data transmission size will increase geometrically. Especially since MySQL and the application are not on the same machine, this overhead is very obvious.

The above is the detailed content of Interviewer: How did you query 10 million data?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Java Optional example Java Optional example Jul 12, 2025 am 02:55 AM

Optional can clearly express intentions and reduce code noise for null judgments. 1. Optional.ofNullable is a common way to deal with null objects. For example, when taking values ??from maps, orElse can be used to provide default values, so that the logic is clearer and concise; 2. Use chain calls maps to achieve nested values ??to safely avoid NPE, and automatically terminate if any link is null and return the default value; 3. Filter can be used for conditional filtering, and subsequent operations will continue to be performed only if the conditions are met, otherwise it will jump directly to orElse, which is suitable for lightweight business judgment; 4. It is not recommended to overuse Optional, such as basic types or simple logic, which will increase complexity, and some scenarios will directly return to nu.

How to iterate over a Map in Java? How to iterate over a Map in Java? Jul 13, 2025 am 02:54 AM

There are three common methods to traverse Map in Java: 1. Use entrySet to obtain keys and values at the same time, which is suitable for most scenarios; 2. Use keySet or values to traverse keys or values respectively; 3. Use Java8's forEach to simplify the code structure. entrySet returns a Set set containing all key-value pairs, and each loop gets the Map.Entry object, suitable for frequent access to keys and values; if only keys or values are required, you can call keySet() or values() respectively, or you can get the value through map.get(key) when traversing the keys; Java 8 can use forEach((key,value)-&gt

How to fix java.io.NotSerializableException? How to fix java.io.NotSerializableException? Jul 12, 2025 am 03:07 AM

The core workaround for encountering java.io.NotSerializableException is to ensure that all classes that need to be serialized implement the Serializable interface and check the serialization support of nested objects. 1. Add implementsSerializable to the main class; 2. Ensure that the corresponding classes of custom fields in the class also implement Serializable; 3. Use transient to mark fields that do not need to be serialized; 4. Check the non-serialized types in collections or nested objects; 5. Check which class does not implement the interface; 6. Consider replacement design for classes that cannot be modified, such as saving key data or using serializable intermediate structures; 7. Consider modifying

Comparable vs Comparator in Java Comparable vs Comparator in Java Jul 13, 2025 am 02:31 AM

In Java, Comparable is used to define default sorting rules internally, and Comparator is used to define multiple sorting logic externally. 1.Comparable is an interface implemented by the class itself. It defines the natural order by rewriting the compareTo() method. It is suitable for classes with fixed and most commonly used sorting methods, such as String or Integer. 2. Comparator is an externally defined functional interface, implemented through the compare() method, suitable for situations where multiple sorting methods are required for the same class, the class source code cannot be modified, or the sorting logic is often changed. The difference between the two is that Comparable can only define a sorting logic and needs to modify the class itself, while Compar

How to parse JSON in Java? How to parse JSON in Java? Jul 11, 2025 am 02:18 AM

There are three common ways to parse JSON in Java: use Jackson, Gson, or org.json. 1. Jackson is suitable for most projects, with good performance and comprehensive functions, and supports conversion and annotation mapping between objects and JSON strings; 2. Gson is more suitable for Android projects or lightweight needs, and is simple to use but slightly inferior in handling complex structures and high-performance scenarios; 3.org.json is suitable for simple tasks or small scripts, and is not recommended for large projects because of its lack of flexibility and type safety. The choice should be decided based on actual needs.

Outlook shortcut for new email Outlook shortcut for new email Jul 11, 2025 am 03:25 AM

How to quickly create new emails in Outlook is as follows: 1. The desktop version uses the shortcut key Ctrl Shift M to directly pop up a new email window; 2. The web version can create new emails in one-click by creating a bookmark containing JavaScript (such as javascript:document.querySelector("divrole='button'").click()); 3. Use browser plug-ins (such as Vimium, CrxMouseGestures) to trigger the "New Mail" button; 4. Windows users can also select "New Mail" by right-clicking the Outlook icon of the taskbar

Java method references explained Java method references explained Jul 12, 2025 am 02:59 AM

Method reference is a way to simplify the writing of Lambda expressions in Java, making the code more concise. It is not a new syntax, but a shortcut to Lambda expressions introduced by Java 8, suitable for the context of functional interfaces. The core is to use existing methods directly as implementations of functional interfaces. For example, System.out::println is equivalent to s->System.out.println(s). There are four main forms of method reference: 1. Static method reference (ClassName::staticMethodName); 2. Instance method reference (binding to a specific object, instance::methodName); 3.

Understanding the Java volatile Keyword Usage Understanding the Java volatile Keyword Usage Jul 12, 2025 am 01:50 AM

The volatile keyword in Java often feels a bit abstract, especially for those who are new to concurrent programming. In fact, its function is very clear: to ensure the visibility of variables between multiple threads. That is to say, when one thread modifies the variable value modified by volatile, other threads can see the change immediately. It is not a master key to solve all concurrency problems, but it is very useful in some scenarios. Let’s take a look at how to use it and where it is suitable for use. When do you need to use volatile? The most typical application scenario is the status flag, such as controlling whether the thread continues to run: privatevolatilebooleanrunning=true;

See all articles