• ADADADADAD

    MySQL和Oracle中的半连接测试总结(一)[ mysql数据库 ]

    mysql数据库 时间:2024-12-03 12:12:11

    作者:文/会员上传

    简介:

    SQL中的半连接在MySQL和Oracle还是存在一些差距,从测试的情况来看,Oracle的处理要更加全面。
    首先我们来看看在MySQL中怎么测试,对于MySQL方面的测试也参考了不少海翔兄的博客

    以下为本文的正文内容,内容仅供参考!本站为公益性网站,复制本文以及下载DOC文档全部免费。

    SQL中的半连接在MySQL和Oracle还是存在一些差距,从测试的情况来看,Oracle的处理要更加全面。
    首先我们来看看在MySQL中怎么测试,对于MySQL方面的测试也参考了不少海翔兄的博客文章,自己也完整的按照他的测试思路练习了一遍。
    首先创建下面的表:
    create table users(
    userid int(11) unsigned not null,
    user_name varchar(64) default null,
    primary key(userid)
    )engine=innodb default charset=UTF8;

    如果要插入数据,可以使用存储过程的方式。比如先插入20000条定制数据。
    delimiter $$
    drop procedure if exists proc_auto_insertdata$$
    create procedure proc_auto_insertdata()
    begin
    declare
    init_data integer default 1;
    while init_data<=20000 do
    insert into users values(init_data,concat('user' ,init_data));
    set init_data=init_data+1;
    end while;
    end$$
    delimiter ;
    call proc_auto_insertdata();
    初始化的过程会很快,最后一步即插入数据花费了近6秒的时间。
    [test]>source insert_proc.sql
    Query OK, 0 rows affected (0.12 sec)
    Query OK, 0 rows affected (0.00 sec)
    Query OK, 0 rows affected (0.00 sec)
    Query OK, 1 row affected (5.63 sec)

    然后我们使用如下的半连接查询数据,实际上执行了6秒左右。
    select u.userid,u.user_name from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    1999 rows in set (6.36 sec)
    为了简化测试条件和查询结果,我们使用count的方式来完成对比测试。
    [test]>select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    +-----------------+
    | count(u.userid) |
    +-----------------+
    | 1999 |
    +-----------------+
    1 row in set (6.38 sec)
    然后使用如下的方式来查看,当然看起来这种结构似乎有些多余,因为userid<-1的数据是不存在的。
    select count(u.userid) from users u
    where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
    +-----------------+
    | count(u.userid) |
    +-----------------+
    | 1999 |
    +-----------------+
    1 row in set (0.06 sec)
    但是效果却好很多。
    当然两种方式的执行计划差别很大。
    第一种效率较差的执行计划如下:
    [test]>explain select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    +----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
    | 1 | SIMPLE || ALL | NULL | NULL | NULL | NULL | NULL | NULL |
    | 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 19762 | Using where; Using join buffer (Block Nested Loop) |
    | 2 | MATERIALIZED | t | range | PRIMARY | PRIMARY | 4 | NULL | 1998 | Using where |
    +----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
    3 rows in set (0.02 sec)
    第二个执行效率较高的执行计划如下:
    [test]>explain select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
    +----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
    | 1 | PRIMARY | u | ALL | NULL | NULL | NULL | NULL | 19762 | Using where |
    | 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
    | 2 | SUBQUERY | t | range | PRIMARY | PRIMARY | 4 | NULL | 1998 | Using where |
    +----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
    3 rows in set (0.00 sec)

    我们在这个测试中先不解释更多的原理,只是对比说明。
    如果想得到更多的执行效率对比情况,可以使用show status 的方式。
    首先flush status
    [test]>flush status;
    Query OK, 0 rows affected (0.02 sec)
    然后执行语句如下:
    [test]>select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    +-----------------+
    | count(u.userid) |
    +-----------------+
    | 1999 |
    +-----------------+
    1 row in set (6.22 sec)
    查看状态信息,关键词是Handler_read.
    [test]>show status like 'Handler_read%';
    +-----------------------+-------+
    | Variable_name | Value |
    +-----------------------+-------+
    | Handler_read_first | 2 |
    | Handler_read_key | 2 |
    | Handler_read_last | 0 |
    | Handler_read_next | 1999 |
    | Handler_read_prev | 0 |
    | Handler_read_rnd | 0 |
    | Handler_read_rnd_next | 22001 |
    +-----------------------+-------+
    7 rows in set (0.04 sec
    Handler_read_key这个参数的解释是根据键读一行的请求数。如果较高,说明查询和表的索引正确。
    Handler_read_next这个参数的解释是按照键顺序读下一行的请求数。如果用范围约束或如果执行索引扫描来查询索引列,该值增加。
    Handler_read_rnd_next这个参数的解释是在数据文件中读下一行的请求数。如果正进行大量的表扫描,该值较高。通常说明表索引不正确或写入的查询没有利用索引。
    这是一个count的操作,所以Handler_read_rnd_next的指标较高,这是一个范围查询,所以Handler_read_next 的值也是一个范围值。

    然后运行另外一个子查询,可以看到show status的结果如下:

    [test]>show status like 'Handler_read%';
    +-----------------------+-------+
    | Variable_name | Value |
    +-----------------------+-------+
    | Handler_read_first | 2 |
    | Handler_read_key | 20002 |
    | Handler_read_last | 0 |
    | Handler_read_next | 1999 |
    | Handler_read_prev | 0 |
    | Handler_read_rnd | 0 |
    | Handler_read_rnd_next | 20001 |
    +-----------------------+-------+
    7 rows in set (0.00 sec)
    可以和明显看到Handler_read_key这个值很高,根据参数的解释,说明查询和表的索引使用正确。也就意味着这种方式想必于第一种方案要好很多。
    而对于此,MySQL其实也有一些方式方法可以得到更细节的信息。
    一种就是explain extended的方式。
    [test]>explain extended select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    。。。。
    3 rows in set, 1 warning (0.00 sec)
    然后show warnings就会看到详细的信息。
    [test]>show warnings;
    | Note | 1003 | /* select#1 */ select count(`test`.`u`.`userid`) AS `count(u.userid)` from `test`.`users` `u` semi join (`test`.`users` `t`) where ((`test`.`u`.`user_name` = ``.`user_name`) and (`test`.`t`.`userid` < 2000)) |
    1 row in set (0.00 sec)
    第二个语句的情况如下:
    [test]>explain extended select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
    3 rows in set, 1 warning (0.00 sec)

    [test]>show warnings;
    | Note | 1003 | /* select#1 */ select count(`test`.`u`.`userid`) AS `count(u.userid)` from `test`.`users` `u` where ((`test`.`u`.`user_name`,`test`.`u`.`user_name` in ((/* select#2 */ select `test`.`t`.`user_name` from `test`.`users` `t` where (`test`.`t`.`userid` < 2000) ), (`test`.`u`.`user_name` inonwhere ((`test`.`u`.`user_name` = `materialized-subquery`.`user_name`))))) or (`test`.`u`.`user_name`,`test`.`u`.`user_name` in ((/* select#3 */ select `test`.`t`.`user_name` from `test`.`users` `t` where 0 ), (`test`.`u`.`user_name` inonwhere ((`test`.`u`.`user_name` = `materialized-subquery`.`user_name`)))))) |
    1 row in set (0.00 sec)
    还有一种方式就是使用 optimizer_trace,在5.6可用
    set optimizer_trace="enabled=on";
    运行语句后,然后通过下面的查询得到trace信息。
    select *from information_schema.optimizer_trace\G

    当然可以看出半连接的表现其实还不够好,能不能选择性的关闭呢,有一个参数可以控制,即是optimizer_switch,其实我们也可以看看这个参数的情况。
    | optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on |
    关闭半连接的设置
    >set optimizer_switch="semijoin=off";
    Query OK, 0 rows affected (0.00 sec)
    再次运行原本执行时间近6秒的SQL,执行时间大大降低。
    [test]> select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    +-----------------+
    | count(u.userid) |
    +-----------------+
    | 1999 |
    +-----------------+
    1 row in set (0.05 sec)
    执行第二个语句,情况如下:
    [test]>select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
    +-----------------+
    | count(u.userid) |
    +-----------------+
    | 1999 |
    +-----------------+
    1 row in set (0.07 sec)

    参考内容如下:
    http://dbaplus.cn/news-11-133-1.html
    http://blog.chinaunix.net/uid-16909016-id-214888.html

    而在Oracle中表现如何呢。
    创建测试表
    create table users(
    userid number not null,
    user_name varchar2(64) default null,
    primary key(userid)
    );
    初始化数据,其实一句SQL就可以搞定。递归查询可以换种方式来用,效果杠杠的。
    insert into users select level,'user'||level from dual connect by level<=20000;
    收集一下统计信息
    exec dbms_stats.gather_table_stats(ownname=>'CYDBA',tabname=>'USERS',cascade=>true);
    然后执行和MySQL中同样的语句。
    我们使用trace的方式来查看,我们仅列出trace的情况。
    SQL> set autot trace exp stat
    SQL> select u.userid,u.user_name from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
    1999 rows selected.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 771105466
    ---------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ---------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 2003 | 52078 | 21 (5)| 00:00:01 |
    |* 1 | HASH JOIN RIGHT SEMI | | 2003 | 52078 | 21 (5)| 00:00:01 |
    | 2 | TABLE ACCESS BY INDEX ROWID| USERS | 1999 | 25987 | 3 (0)| 00:00:01 |
    |* 3 | INDEX RANGE SCAN | SYS_C0042448 | 1999 | | 2 (0)| 00:00:01 |
    | 4 | TABLE ACCESS FULL | USERS | 20000 | 253K| 17 (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    1 - access("U"."USER_NAME"="T"."USER_NAME")
    3 - access("T"."USERID"<2000)
    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    205 consistent gets
    0 physical reads
    0 redo size
    52196 bytes sent via SQL*Net to client
    1983 bytes received via SQL*Net from client
    135 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1999 rows processed


    SQL> select u.userid,u.user_name from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
    1999 rows selected.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1012235795
    ------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 2004 | 94188 | 22 (5)| 00:00:01 |
    |* 1 | HASH JOIN | | 2004 | 94188 | 22 (5)| 00:00:01 |
    | 2 | VIEW | VW_NSO_1 | 2000 | 68000 | 4 (0)| 00:00:01 |
    | 3 | HASH UNIQUE | | 2000 | 26000 | 4 (25)| 00:00:01 |
    | 4 | UNION-ALL | | | | | |
    | 5 | TABLE ACCESS BY INDEX ROWID| USERS | 1 | 13 | 1 (0)| 00:00:01 |
    |* 6 | INDEX RANGE SCAN | SYS_C0042448 | 1 | | 1 (0)| 00:00:01 |
    | 7 | TABLE ACCESS BY INDEX ROWID| USERS | 1999 | 25987 | 3 (0)| 00:00:01 |
    |* 8 | INDEX RANGE SCAN | SYS_C0042448 | 1999 | | 2 (0)| 00:00:01 |
    | 9 | TABLE ACCESS FULL | USERS | 20000 | 253K| 17 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    1 - access("U"."USER_NAME"="USER_NAME")
    6 - access("USERID"<(-1))
    8 - access("T"."USERID"<2000)
    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    207 consistent gets
    0 physical reads
    0 redo size
    52196 bytes sent via SQL*Net to client
    1983 bytes received via SQL*Net from client
    135 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1999 rows processed
    从Oracle的表现来看,支持的力度要全面很多。当然半连接的玩法还有很多,比如exists,这些限于篇幅暂没有展开。而且对于对比测试中的更多知识点分析,我们后期也会逐步补充。
    MySQL和Oracle中的半连接测试总结(一).docx

    将本文的Word文档下载到电脑

    推荐度:

    下载
    热门标签: mysql连接oracle