此sql语句如何优化。
有表多个同结构表命名为dev_log_table1,dev_log_table2,dev_log_table3,结构详细如下:
+---------------------+------+----------+-----------+-----------+------------+----------+---------+
| time | type | subtype | src_ip | dst_ip | log_repeat | dev_name | dev_pri |
+---------------------+------+----------+-----------+-----------+------------+----------+---------+
| 2009-06-29 16:25:31 | ips | IPSEVENT | 100.1.1.2 | 200.1.1.2 | 1 | mjl.cc | 0 |
| 2009-06-29 16:25:32 | ips | IPSEVENT | 100.1.1.2 | 200.1.4.2 | 4 | mjl.cc | 0 |
| 2009-06-29 16:25:33 | ips | IPSEVENT | 100.1.1.2 | 200.1.1.2 | 1 | mjl.cc | 0 |
| 2009-06-29 16:25:33 | ips | IPSEVENT | 100.3.1.2 | 200.1.0.2 | 4 | mjl.cc | 0 |
| 2009-06-29 16:25:33 | ips | IPSEVENT | 100.1.1.2 | 200.1.1.2 | 1 | mjl.cc | 0 |
+---------------------+------+----------+-----------+-----------+------------+----------+---------+
(字段意思分别为,时间,类型,子类型,源IP,目的IP,事件发生次数,设备名,设备优先级)
需求:求出一段时间内,某个设备,发生的事件总数,事件是由多少个源IP,多少目的IP产生的。
如果用:SELECT distinct src_ip ,distinct dst_ip, sum(log_repeat) from (满足条件后的union多表)
当 union多表得出的数据很大时,C盘马上暴满。所以没有使用。
下面是我写的SQL,在三个表1千万条数据中查需要7,8分钟。显然用户受不了。大家帮忙看一下如何优化,或如果写。
SELECT
(
SELECT COUNT(distinct src_ip) SRCCOUNT
FROM
(
(
SELECT distinct src_ip
FROM dev_log_table3
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
)
UNION ALL
(
SELECT distinct src_ip
FROM dev_log_table2
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
)
UNION ALL
(
SELECT distinct src_ip
FROM dev_log_table1
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
)
) temptb ) SRCCOUNT, COUNT(distinct dst_ip) DSTCOUNT, SUM(log_repeat) SUMEVENTCOUNT, CEIL(SUM(log_repeat) / 23) AVGCOUNT FROM
(
(
SELECT dst_ip, sum(log_repeat) log_repeat
FROM dev_log_table3
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
GROUP BY dst_ip
)
UNION ALL
(
SELECT dst_ip, sum(log_repeat) log_repeat
FROM dev_log_table2
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
GROUP BY dst_ip
)
UNION ALL
(
SELECT dst_ip, sum(log_repeat) log_repeat
FROM dev_log_table1
WHERE type='ips'
AND (time>=' 2009-06-29 00:00:17' AND time<'2009-06-29 23:41:17')
AND subtype='ipsevent'
AND ( (dev_pri = 0 AND dev_name = 'mjl.cc'))
GROUP BY dst_ip
)
) temptb
千万级数据应该考虑自开发了,或者迁移到非关系数据库,鉴于如此简单的结构,用关系数据库很浪费性能。
也许这则新闻能给你和你的公司以启示:http://www.cnbeta.com/articles/88079.htm
或者,如果你的需求比较单一的话,按日期分表,或者直接在文件系统中按日期划分目录存储数据,都会比这样用起来好得多吧。
没太看懂,不过 获取SRCCOUNT的方法好像有优化空间。
还有就是看执行计划,然后分段分析了。