MySQL线上常见故障剖析

各种故障

  1. 应用获取不到连接池
  2. 数据库响应慢
  3. SQL
  4. 服务器load
  5. SWAP
  6. 表不见了
  7. MySQL crash
  8. 主机Hung

观察你的系统

MySQL

– 活动进程(Process list)

– 日志文件(slow log, alert log, general query log, binlog)

– Status variables ( com_select, com_insert,.etc )

– InnoDB(物理读、逻辑读、 innodb status)

– 参数配置

– Stack trace(plus source code)

SQL

– 执行计划, explain

OS

– 内存, SWAP, /proc/meminfo

– CPU, load, ps

– IO (磁盘、网络)

• Iostat

Profile

– Oprofile

– gprof


Case 1: XXX系统报连接池满

iostat


orzdba


slowlog

What’s in slow log?


Mk-query-digest


mk-query-digest 全面分析slow log


explain

查看执行计划

– 选择了不好的索引


哪些SQL在执行


Slow log

– Set global long_query_time=0

General log

Binlog

– For DML, mysqlbinlog binlog解析

Processlist

– If some query is really slow

Tcpdump

– Tcpdump + mk-query-digest


Case 2: 很多MySQL线程都卡住了


Processlist

Id: 1842782 User: provide Host: 192.168.0.1:59068 db: provide Command: Query

Time: 2326

State: Waiting for table

Info: update table_xxxx set sold=sold+1, money=money+39800, Gmt_create=now() where xxxx_id=1 and day='2011-10-07 00:00:00


Id: 1657130 User: provide Host: 192.168.0.2 :40093 db: provide Command: Query

Time: 184551

State: Sending data

Info: select xxxx_id, sum(sold) as sold from table_xxxx where xxxx_id in (select xxxx_id from table_xxxx where Gmt_create >= '2011-10-05 08:59:00') group by xxxx_id


1044 system user Connect 27406 Flushing tables FLUSH TABLES


Processlist分析

– 谁是因,谁是果?

System user execute flush tables

– System user是谁, mysql主从复制( io thread, sql thread)

Binlog

谁最先执行了flushtables

– 人工执行?

– App? 没有权限

– 定时任务,备份

Xtrabackup 会执行flush tables with read lock, 不记录到binlog

• Mysqldump理论上不会执行flushtables ,但如果有bug呢

( http://bugs.mysql.com/bug.php?id=35157 )

Case 3: 服务器load高

调查问题

– SQL层面未见明显异常

– 业务没有变动,没有发布

– 调用量没有明显变化


Iostat

– r/s, w/s

– await, svctm

– avgrq-sz


Blktrace, btt


IO调度算法

– cfq -> deadline


Case 4: DDL lost table

alert.log大量报错

– 持续10几分钟后, Table lost。

• 几百个进程都block”opening tables” ,这些表都不是DDL的那个表

丢表时的alert.log


Pstack-master thread


Pstack–alter table





Case 5: MyISAM


Orzdba


vmstat

strace mysqld

Oprofile global


Oprofile mysqld


pstack


Summary


展开阅读全文

页面更新:2024-04-21

标签:主从   线程   磁盘   算法   变动   层面   常见故障   进程   服务器   计划   系统

1 2 3 4 5

上滑加载更多 ↓
推荐阅读:
友情链接:
更多:

本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828  

© CopyRight 2020-2024 All Rights Reserved. Powered By 71396.com 闽ICP备11008920号-4
闽公网安备35020302034903号

Top