接前文
http://blog.itpub.net/29254281/viewspace-2150229/
前文中的算法想了一天半,终于在昨天晚上得出了正确的结果.
在我的环境中,耗时90s ,还有进一步优化的空间.
首选是生成 t1 和 t2的方式.
之前使用create table 方式 导致类型不对,
因为是临时作用的表,所以可以预先创建表结构
CREATE TABLE `t1` (
`roomid` int(11) NOT NULL DEFAULT '0',
`userid` bigint(20) NOT NULL DEFAULT '0',
`s` timestamp ,
`e` timestamp,
primary KEY (`roomid`,`userid`,`s`,`e`),
KEY (`roomid`,`s`,`e`)
) ;
CREATE TABLE `t2` (
`roomid` int(11) NOT NULL DEFAULT '0',
`userid` bigint(20) NOT NULL DEFAULT '0',
`s` timestamp ,
`e` timestamp,
primary KEY (`roomid`,`userid`,`s`,`e`),
KEY (`roomid`,`s`,`e`)
) ;
前文中的第一步可以封装为一个过程
DELIMITER$$CREATEDEFINER=`root`@`localhost`PROCEDURE`p`()BEGINinsertintot1selectdistinctroomid,userid,if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))efrom(SELECTDISTINCTs.roomid,s.userid,s.s,(SELECTMIN(e)FROM(SELECTDISTINCTroomid,userid,roomendASeFROMu_room_logaWHERENOTEXISTS(SELECT*FROMu_room_logbWHEREa.roomid=b.roomidANDa.userid=b.useridANDa.roomend>=b.roomstartANDa.roomend<b.roomend))s2WHEREs2.e>s.sANDs.roomid=s2.roomidANDs.userid=s2.userid)ASeFROM(SELECTDISTINCTroomid,userid,roomstartASsFROMu_room_logaWHERENOTEXISTS(SELECT*FROMu_room_logbWHEREa.roomid=b.roomidANDa.userid=b.useridANDa.roomstart>b.roomstartANDa.roomstart<=b.roomend))s,(SELECTDISTINCTroomid,userid,roomendASeFROMu_room_logaWHERENOTEXISTS(SELECT*FROMu_room_logbWHEREa.roomid=b.roomidANDa.userid=b.useridANDa.roomend>=b.roomstartANDa.roomend<b.roomend))eWHEREs.roomid=e.roomidANDs.userid=e.userid)t1,numswherenums.id<=datediff(e,s)+1;END
函数修改如下
DELIMITER$$CREATEDEFINER=`root`@`localhost`FUNCTION`f`(pTimetimestamp)RETURNSint(11)BEGINdeclarepResultbigint;insertintot2selectdistinctv6.roomid,v6.userid,greatest(s,starttime)s,least(e,endtime)efrom(selectroomid,asDATETIME)starttime,asDATETIME)endtimefrom(select@dasstarttime,@d:=d,v3.roomid,v3.dendtimefrom(selectdistinctroomid,whennums.id=1thenv1swhennums.id=2thenv1ewhennums.id=3thenv2swhennums.id=4thenv2eenddfrom(selectv1.roomid,v1.sv1s,v1.ev1e,v2.sv2s,v2.ev2efromt1v1innerjoint1v2on((v1.sbetweenv2.sandv2.eorv1.ebetweenv2.sandv2.e)andv1.roomid=v2.roomid)wherev2.roomidin(selectdistinctroomidfromt1wheredate(s)=pTime)andv2.s>=pTimeandv2.s<(pTime+interval'1'and(v2.roomid,v2.userid,v2.s,v2.e)!=(v1.roomid,v1.userid,v1.s,v1.e))a,numswherenums.id<=4orderbyroomid,d)v3,(select@d:='')vars)v4wherestarttime!='')v5innerjoint1v6on(v5.starttimebetweenv6.sandv6.eandv5.endtimebetweenv6.sandv6.eandv5.roomid=v6.roomid);selectrow_count()intopResult;RETURNpResult;END
原来是针对每天每个房间处理,经过优化对某天的所有房间进行处理,批量的形式更快
另外在中间过程增加了类型转换,可以更好的利用索引
selectroomid,CAST(starttimeasDATETIME)starttime,CAST(endtimeasDATETIME)endtime
另外第7行 原来没有 distinct 可能导致bug
selectdistinctv6.roomid,v6.userid,greatest(s,starttime)s,least(e,endtime)e
调用时执行:
truncate table t1;
truncate table t2;
call p;
select f(s) from (
select distinct date(s) s from t1
) t
两步的执行时间:
今天优化了一天,从90s优化到25s以内,已经达到了预期。
我觉得在单线程环境,基本上已经达到最优.
如果还想优化到极致,第二步的函数执行,可以通过JAVA程序多线程一起跑,只要服务器CPU核数多,优化效果应该还是很明显的。