接前文:
需求描述和第一版解决方案(执行时间90秒)
http://blog.itpub.net/29254281/viewspace-2150229/
优化和修改bug的版本(执行时间25秒)
http://blog.itpub.net/29254281/viewspace-2150259/
我觉得在集合思维处理方式中,前文已经达到最优了.
如果放弃完全的集合处理思维,实际上还可以更加的优化.
前文的几个问题.
1.引入了过多的表结构.
2.写表本身也花费了时间.
3.前文按天批处理,粒度还是细了.应该一把批量全出最快.
4.前文计算最小间隔范围的部分,因为应用集合化思维,不好理解性能还差.
前文计算最小间隔范围的部分如下
selectroomid,asDATETIME)starttime,asDATETIME)endtimefrom( select@dasstarttime,@d:=d,v3.roomid,v3.dendtimefrom( selectdistinctroomid, whennums.id=1thenv1s whennums.id=2thenv1e whennums.id=3thenv2s whennums.id=4thenv2e enddfrom( selectv1.roomid,v1.sv1s,v1.ev1e,v2.sv2s,v2.ev2e fromt1v1 innerjoint1v2on((v1.sbetweenv2.sandv2.eorv1.ebetweenv2.sandv2.e)andv1.roomid=v2.roomid) wherev2.roomidin(selectdistinctroomidfromt1wheredate(s)=pTime) andv2.s>=pTimeandv2.s<(pTime+interval'1'and(v2.roomid,v2.userid,v2.s,v2.e)!=(v1.roomid,v1.userid,v1.s,v1.e) )a,numswherenums.id<=4 orderbyroomid,d )v3,(select@d:='')vars )v4wherestarttime!=''
该部分使用集合处理方式,不好理解性能还差.
这块可以通过游标写临时表轻易解决。
本质上最小范围就是
每天每个房间每个记录的开始时间和结束时间都扣出来作为一行 排序。
然后找到每个时间最近的下一个时间,作为最小时间范围.
如果使用游标,遍历一遍即可.
DELIMITER$$CREATEDEFINER=`root`@`localhost`PROCEDURE`p`() BEGIN declaredoneintdefault0; declarev_roomidbigint; declarev_starttimestamp; declarev_endtimestamp; declarecur_testCURSORforselectroomid,s,efromt1;DECLARECONTINUEHANDLERFORNOTFOUNDSETdone=1;droptableifexistst1; droptableifexiststmp_time_point; CREATEtemporaryTABLE`t1`( `roomid`int(11)NOTNULLDEFAULT'0', `userid`bigint(20)NOTNULLDEFAULT'0', `s`timestampNOTNULLDEFAULTONUPDATEtimestampNOTNULLDEFAULT'0000-00-0000:00:00', primaryKEY`roomid`(`roomid`,`s`,`e`,`userid`) )ENGINE=InnoDB;createtemporarytabletmp_time_point( roomidbigint, timepointtimestamp, primarykey(roomid,timepoint) )engine=memory; insertintot1 selectdistinct roomid, userid, if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))e from( SELECTDISTINCTs.roomid,s.userid,s.s,( SELECTMIN(e) FROM(SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )s2 WHEREs2.e>s.s ANDs.roomid=s2.roomid ANDs.userid=s2.userid )ASe FROM(SELECTDISTINCTroomid,userid,roomstartASs FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomstart>b.roomstart ANDa.roomstart<=b.roomend) )s,(SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )e WHEREs.roomid=e.roomid ANDs.userid=e.userid)t1, nums wherenums.id<=datediff(e,s)+1 ;opencur_test; repeat fetchcur_testintov_roomid,v_start,v_end; ifdone!=1then insertignoreintotmp_time_point(roomid,timepoint)values(v_roomid,v_start); insertignoreintotmp_time_point(roomid,timepoint)values(v_roomid,v_end); endif; untildoneendrepeat; closecur_test; selectroomid,date(s)dt,round(second,s,e))/60)ts,max(c)cfrom( selectroomid,s,e,distinctuserid)cfrom( selectdistinctv6.roomid,v6.userid,greatest(s,starttime)s,least(e,endtime)e from( selectdistinctroomid,asDATETIME)starttime,asDATETIME)endtimefrom( select if(@roomid=roomid,@d,'')asstarttime,@d:=timepoint,@roomid:=roomid,p.roomid,p.timepointendtime fromtmp_time_pointp,(select@d:='',@roomid:=-1)vars orderbyroomid,timepoint )v4wherestarttime!=''anddate(starttime)=date(endtime) )v5innerjoint1v6on(v5.starttimebetweenv6.sandv6.eandv5.endtimebetweenv6.sandv6.eandv5.roomid=v6.roomid) )v6groupbyroomid,s,ehavingdistinctuserid)>1 )v7groupbyroomid,date(s); END
都内聚到一个过程之后,不需要创建额外的普通表,直接在过程中创建临时表.实现高内聚,低耦合.
call p
过程返回的结果即为最终结果.
三次测试耗时均低于 10.3秒