MySQL中distinct和group by性能比較

更新時(shí)間：2021-02-12 14:00:00 來源：動(dòng)力節(jié)點(diǎn) 瀏覽5670次

MySQL是目前最流行的關(guān)系型數(shù)據(jù)庫(kù)之一，而關(guān)系數(shù)據(jù)庫(kù)將數(shù)據(jù)保存在不同的表中，而不是將所有數(shù)據(jù)放在一個(gè)大倉(cāng)庫(kù)內(nèi)，這樣就增加了速度并提高了靈活性。我們知道在MySQL數(shù)據(jù)庫(kù)中DISTINCT可以去掉重復(fù)數(shù)據(jù)，而GROUP BY在分組后也會(huì)去掉重復(fù)數(shù)據(jù)，那這兩個(gè)關(guān)鍵字在去掉重復(fù)數(shù)據(jù)時(shí)的效率，究竟誰會(huì)更高一點(diǎn)？本文我們就來比較一些distinct和group by的性能。

一、測(cè)試過程：

準(zhǔn)備一張測(cè)試表

??CREATE TABLE `test_test` (
?????`id` int(11) NOT NULL auto_increment,
??????`num` int(11) NOT NULL default '0',
??????PRIMARY KEY ?(`id`)
?????) ENGINE=MyISAM ?DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

建個(gè)儲(chǔ)存過程向表中插入10W條數(shù)據(jù)

???create procedure p_test(pa int(11))
?????begin
??????declare max_num int(11) default 100000;
??????declare i int default 0;
?????declare rand_num int;
??????select count(id) into max_num from test_test;
?????while i < pa do
??????????????if max_num < 100000 then
??????????????????????select cast(rand()*100 as unsigned) into rand_num;
??????????????????????insert into test_test(num)values(rand_num);
??????????????end if;
??????????????set i = i +1;
??????end while;
?????end

調(diào)用存儲(chǔ)過程插入數(shù)據(jù)

call p_test(100000);

開始測(cè)試：（不加索引）

?select distinct num from test_test;
????select num from test_test group by num;
????[SQL] select distinct num from test_test;
????受影響的行: 0
????時(shí)間: 0.078ms
????[SQL] ?
???select num from test_test group by num;
???受影響的行: 0
????時(shí)間: 0.031ms

二、num字段上創(chuàng)建索引

ALTER TABLE `test_test` ADD INDEX `num_index` (`num`) ;

再次查詢

select distinct num from test_test;
????select num from test_test group by num;
????[SQL] select distinct num from test_test;
???受影響的行: 0
????時(shí)間: 0.000ms
????[SQL] ?
????select num from test_test group by num;
????受影響的行: 0
????時(shí)間: 0.000ms

這時(shí)候我們發(fā)現(xiàn)時(shí)間太小了 0.000秒都無法精確了。

我們轉(zhuǎn)到命令行下測(cè)試

?mysql> set profiling=1;
????mysql> select distinct(num) from test_test;
????mysql> select num from test_test group by num;
????mysql> show profiles;
????+----------+------------+----------------------------------------+
????| Query_ID | Duration ??| Query ?????????????????????????????????|
????+----------+------------+----------------------------------------+
????| ???????1 | 0.00072550 | select distinct(num) from test_test ???|
????| ???????2 | 0.00071650 | select num from test_test group by num |
???+----------+------------+----------------------------------------+

?

加了索引之后 distinct 比沒加索引的 distinct 快了 107倍。

加了索引之后 group by 比沒加索引的 group by 快了 43倍。

再來對(duì)比：distinct 和 group by

不管是加不加索引 group by 都比 distinct 快。因此使用的時(shí)候建議選 group by。

默認(rèn)情況下，distinct會(huì)被hive翻譯成一個(gè)全局唯一reduce任務(wù)來做去重操作，因而并行度為1。而group by則會(huì)被hive翻譯成分組聚合運(yùn)算，會(huì)有多個(gè)reduce任務(wù)并行處理，每個(gè)reduce對(duì)收到的一部分?jǐn)?shù)據(jù)組，進(jìn)行每組聚合（去重）

通過上述兩個(gè)實(shí)驗(yàn)，我們可以得出這樣一條結(jié)論：在重復(fù)量比較高的表中，使用DISTINCT可以有效提高查詢效率，而在重復(fù)量比較低的表中，使用DISTINCT會(huì)嚴(yán)重降低查詢效率。所以并不是所有的DISTINCT都是降低效率的，當(dāng)然你得提前判斷數(shù)據(jù)的重復(fù)量。想要獲取更多的MySQL知識(shí)，請(qǐng)到本站的MySQL教程觀看最新的MySQL學(xué)習(xí)資料，開啟全新的MySQL學(xué)習(xí)之旅。

MySQL教程

上一篇實(shí)例解析MySQL多表聯(lián)查下一篇MySQL外鍵使用詳解

相關(guān)閱讀

成人麻豆免费视频精品区,校园春色中文字幕人妻,69国产亚洲精品成人av久久,男女激情久久免费国产,欧美一区二区三区4区,欧美狠狠爱第三页视频,120日本少妇视频,79自拍视频在线观看,中文字幕丝袜美腿一二三区

MySQL中distinct和group by性能比較

JVM

多線程下載器項(xiàng)目實(shí)戰(zhàn)

Java日志框架全集（選學(xué)）

高并發(fā)解決方案（選學(xué)）

零基礎(chǔ)能學(xué)Java嗎？

零基礎(chǔ)能學(xué)Java嗎？

零基礎(chǔ)能學(xué)Java嗎？

關(guān)于我們

課程中心

在線課程

資料廣場(chǎng)

全國(guó)免費(fèi)電話

成人麻豆免费视频精品区,校园春色中文字幕人妻,69国产亚洲精品成人av久久,男女激情久久免费国产,欧美一区二区三区4区,欧美狠狠爱第三页视频,120日本少妇视频,79自拍视频在线观看,中文字幕丝袜美腿一二三区

MySQL中distinct和group by性能比較

JVM

多線程下載器項(xiàng)目實(shí)戰(zhàn)

Java日志框架全集（選學(xué)）

高并發(fā)解決方案（選學(xué)）

零基礎(chǔ)能學(xué)Java嗎？

零基礎(chǔ)能學(xué)Java嗎？

零基礎(chǔ)能學(xué)Java嗎？

關(guān)于我們

課程中心

在線課程

資料廣場(chǎng)

全國(guó)免費(fèi)電話

零基礎(chǔ)能學(xué)Java嗎？

零基礎(chǔ)能學(xué)Java嗎？