Azkaban笔记
Azkaban 使用笔记
https://github.com/azkaban/azkaban
Azkaban-开源任务调度程序(使用篇)
https://www.jianshu.com/p/484564beda1d
Azkaban定时任务的misfire策略
根据 GitHub 上 azkaban QuartzScheduler 的源码,可以看到使用的misfire策略是 withMisfireHandlingInstructionFireAndProceed
,即 Quartz 默认的策略,即立即触发一次,然后按照cron调度触发。
// TODO kunkun-tang: Need management code to deal with different misfire policy
final Trigger trigger = TriggerBuilder
.newTrigger()
.withSchedule(
CronScheduleBuilder.cronSchedule(cronExpression)
.withMisfireHandlingInstructionFireAndProceed()
// .withMisfireHandlingInstructionDoNothing()
// .withMisfireHandlingInstructionIgnoreMisfires()
)
.build();
相关数据表
executors 表
执行服务器表,每个id对应一台服务器
execution_flows 表,每一次任务调度,都会在这个表中写入一个有新 exec_id 的记录
status
30 表示 running 执行中
70 表示 failed 失败
50 表示 success 成功执行
一次azkaban异常排查
异常表现
异常表现为azkaban web界面无法上传定时任务的zip包到project,等待很长时间后报错,同样也无法删除project以及修改project属性。
我们的azkaban平台是2台Executor服务器,一台web服务器。
排查过程
登录到web服务器后台查看日志,上传任务时抛如下异常,是web服务器在写azkaban 相关mysql表时报锁等待超时错误 Lock wait timeout exceeded
2018/12/14 16:24:10.706 +0800 ERROR [JdbcProjectImpl] [Azkaban] Error initializing project id: 56 version: 7
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: INSERT INTO project_versions (project_id, version, upload_time, uploader, file_type, file_name, md5, num_chunks, resource_id) values (?,?,?,?,?,?,?,?,?) Parameters: [56, 7, 1544775799795, azkaban, zip, sync-user-identity-1.0.0-SNAPSHOT-sync-user-identity.zip, [-27, 52, 115, 62, -67, 107, 55, 49, 95, -107, -41, 27, -81, 90, 22, 115], 0, null]
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:365)
at azkaban.project.JdbcProjectImpl.lambda$uploadProjectFile$2(JdbcProjectImpl.java:267)
at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:95)
at azkaban.project.JdbcProjectImpl.uploadProjectFile(JdbcProjectImpl.java:280)
at azkaban.storage.DatabaseStorage.put(DatabaseStorage.java:58)
at azkaban.storage.StorageManager.uploadProject(StorageManager.java:106)
at azkaban.project.AzkabanProjectLoader.persistProject(AzkabanProjectLoader.java:197)
at azkaban.project.AzkabanProjectLoader.uploadProject(AzkabanProjectLoader.java:114)
at azkaban.project.ProjectManager.uploadProject(ProjectManager.java:506)
at azkaban.webapp.servlet.ProjectManagerServlet.ajaxHandleUpload(ProjectManagerServlet.java:1738)
at azkaban.webapp.servlet.ProjectManagerServlet.handleUpload(ProjectManagerServlet.java:1821)
at azkaban.webapp.servlet.ProjectManagerServlet.handleMultiformPost(ProjectManagerServlet.java:201)
at azkaban.webapp.servlet.LoginAbstractAzkabanServlet.doPost(LoginAbstractAzkabanServlet.java:311)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:688)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2018/12/14 16:24:10.722 +0800 INFO [ProjectManagerServlet] [Azkaban] Installation Failed.
azkaban.project.ProjectManagerException: Error initializing project id: 56 version: 7
at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:371)
at azkaban.project.JdbcProjectImpl.lambda$uploadProjectFile$2(JdbcProjectImpl.java:267)
at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:95)
at azkaban.project.JdbcProjectImpl.uploadProjectFile(JdbcProjectImpl.java:280)
at azkaban.storage.DatabaseStorage.put(DatabaseStorage.java:58)
at azkaban.storage.StorageManager.uploadProject(StorageManager.java:106)
at azkaban.project.AzkabanProjectLoader.persistProject(AzkabanProjectLoader.java:197)
at azkaban.project.AzkabanProjectLoader.uploadProject(AzkabanProjectLoader.java:114)
at azkaban.project.ProjectManager.uploadProject(ProjectManager.java:506)
at azkaban.webapp.servlet.ProjectManagerServlet.ajaxHandleUpload(ProjectManagerServlet.java:1738)
at azkaban.webapp.servlet.ProjectManagerServlet.handleUpload(ProjectManagerServlet.java:1821)
at azkaban.webapp.servlet.ProjectManagerServlet.handleMultiformPost(ProjectManagerServlet.java:201)
at azkaban.webapp.servlet.LoginAbstractAzkabanServlet.doPost(LoginAbstractAzkabanServlet.java:311)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:688)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: INSERT INTO project_versions (project_id, version, upload_time, uploader, file_type, file_name, md5, num_chunks, resource_id) values (?,?,?,?,?,?,?,?,?) Parameters: [56, 7, 1544775799795, azkaban, zip, sync-user-identity-1.0.0-SNAPSHOT-sync-user-identity.zip, [-27, 52, 115, 62, -67, 107, 55, 49, 95, -107, -41, 27, -81, 90, 22, 115], 0, null]
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:365)
... 27 more
修改任务属性时抛如下异常,也是Lock wait timeout exceeded
2018/12/14 20:24:32.577 +0800 ERROR [DatabaseOperator] [Azkaban] update failed
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: UPDATE execution_flows SET executor_id=? where exec_id=? Parameters: [2, 636386]
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:456)
at azkaban.db.DatabaseOperator.update(DatabaseOperator.java:121)
at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:48)
at azkaban.executor.JdbcExecutorLoader.assignExecutor(JdbcExecutorLoader.java:312)
at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1501)
at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:78)
at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1871)
at azkaban.executor.ExecutorManager$QueueProcessorThread.handleDispatchExceptionCase(ExecutorManager.java:1959)
at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1878)
at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1851)
at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1789)
2018/12/14 20:24:32.578 +0800 WARN [ExecutorManager] [Azkaban] Executor d-awsbj-uds-uds-azkaban-1536651303:12321 (id: 2) responded with exception for exec: 636386
azkaban.executor.ExecutorManagerException: Error updating executor id 2
at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:54)
at azkaban.executor.JdbcExecutorLoader.assignExecutor(JdbcExecutorLoader.java:312)
at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1501)
at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:78)
at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1871)
at azkaban.executor.ExecutorManager$QueueProcessorThread.handleDispatchExceptionCase(ExecutorManager.java:1959)
at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1878)
at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1851)
at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1789)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: UPDATE execution_flows SET executor_id=? where exec_id=? Parameters: [2, 636386]
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:456)
at azkaban.db.DatabaseOperator.update(DatabaseOperator.java:121)
at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:48)
... 8 more
重启了web服务器,还是有问题。
原因和解决
然后连上azkaban数据库,查看 information_schema.INNODB_TRX 表,发现有好多卡住的事务,不知道为什么。
在mysql命令行 kill trx_mysql_thread_id 好像也不起作用。
找DBA帮看下,发现事务卡住是因为azkaban数据库所在的服务器磁盘满了,写不进去。
DBA紧急给做了磁盘扩容,马上就好了。
之所以磁盘占满,一是因为数据库服务器磁盘只有50G,二是因为azkaban会把所有job的日志都写到execution_logs表中,这个表增长非常快,看了下光这个表就占了40G空间。
上一篇 VSCode使用笔记
下一篇 ELK使用笔记
页面信息
location:
protocol
: host
: hostname
: origin
: pathname
: href
: document:
referrer
: navigator:
platform
: userAgent
: