注意事项: 高可用需要依赖zookeeper的故障恢复,因此要先准备好zookeeper集群,建议独立搭建zookeeper集群,不要用flink内置的单节点zookeeper
1.flink版本
flink-1.11.2-bin-scala_2.11.tgz2.机器数量:3台
master:2台 '--对应jobmanager' flink01 flink02 worker:3台 '--对应taskmanager' flink01 flink02 flink033.配置flink-conf.yaml文件:vim conf/flink-conf.yaml
'Common【普通】' ------------------------------------------------------ # jobmanager配置 jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m jobmanager.memory.process.size: 1600m # TaskManager配置 taskmanager.heap.size: 1024m taskmanager.memory.process.size: 1728m taskmanager.numberOfTaskSlots: 1 # 程序默认并行计算的个数 parallelism.default: 1 ------------------------------------------------------ 'High Availability【高可用】' ------------------------------------------------------ # 高可用方式:zookeeper high-availability: zookeeper # 高可用集群数据存储文件夹 high-availability.storageDir: file:///opt/soft/flink-1.11.2/ha # 高可用集群的zookeeper地址 high-availability.zookeeper.quorum: flink01:2181,flink02:2181,flink03:2181 # flink zookeeper 根目录 high-availability.zookeeper.path.root: /opt/soft/flink-1.11.2/ha/zookeeper ------------------------------------------------------ 'Fault tolerance and checkpointing【容错和检查点】' ------------------------------------------------------ # 状态后端将checkpoints存储在哪 state.backend: filesystem # checkpoints:检查点目录 state.checkpoints.dir: file:///opt/soft/checkpoints # 对于增量检查点,只存储上一个检查点的差异,而不是完整的检查点状态。某些状态后端可能不支持增量检查点并忽略此选项 state.backend.incremental: true # Task故障恢复策略:参考官网(https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/task_failure_recovery.html#restart-all-failover-strategy) # 1.重启策略 # (1)这里采用固定时间间隔重启策略 restart-strategy: fixed-delay # (2)失败重启次数 restart-strategy.fixed-delay.attempts: 3 # (3)每次重启时间间隔 restart-strategy.fixed-delay.delay: 5 s # 2.Task故障恢复策略 jobmanager.execution.failover-strategy: region ------------------------------------------------------4.配置masters文件:vim conf/masters
flink01 flink025.配置workers文件:vim conf/workers 注意:此配置文件在不同版本的flink中名字可能不一样,此版本是workers,有些版本是slaves。
flink01 flink02 flink036.分发配置文件
scp conf/flink-conf.yaml flink02://opt/soft/flink-1.11.2/conf/ scp conf/flink-conf.yaml flink03://opt/soft/flink-1.11.2/conf/ scp conf/masters flink02://opt/soft/flink-1.11.2/conf/ scp conf/masters flink03://opt/soft/flink-1.11.2/conf/ scp conf/workers flink02://opt/soft/flink-1.11.2/conf/ scp conf/workers flink03://opt/soft/flink-1.11.2/conf/7.启动flink standalone
在flink01上执行: [root@flink01 flink-1.11.2]# bin/start-cluster.sh Starting HA cluster with 2 masters. Starting standalonesession daemon on host flink01. Starting standalonesession daemon on host flink02. Starting taskexecutor daemon on host flink01. Starting taskexecutor daemon on host flink02. Starting taskexecutor daemon on host flink03.8.检查进程
[root@flink01 ~]# jps 20582 Jps 1752 Kafka 1481 QuorumPeerMain 19786 StandaloneSessionClusterEntrypoint 20139 TaskManagerRunner [root@flink02 ~]# jps 9249 Jps 1481 QuorumPeerMain 9113 TaskManagerRunner 1754 Kafka 8783 StandaloneSessionClusterEntrypoint [root@flink03 ~]# jps 1745 Kafka 1478 QuorumPeerMain 5051 Jps 4925 TaskManagerRunner至此,flink standalone模式部署完成!