蓝鲸智云安装踩坑笔记

it2025-02-16  4

一、部署方式

单机部署官方文档提供了两种方式,一是按单机部署文档操作,二是按标志部署文档操作,关键是修改install.config文档。本人按标准部署文档操作单机部署。

10.xx.xx.111 nginx,rabbitmq,kafka(config),zk(config),es,appt,fta,consul,mysql,beanstalk,mongodb,appo,paas,cmdb,job,gse,license,redis,influxdb,bkdata(databus),bkdata(dataapi),bkdata(monitor)

这里的10.xx.xx.111是第一个网卡的地址(云服务器通常是内网地址)。

二、部署cmdb

./bk_install cmdb

查看日志:

查看gse状态: 解决方法:查看48533进程,杀掉占用的进程重新部署

lsof -i:48533

三、部署bkdata

./bk_install bkdata

1.安装bkdata的时候,依赖mysql5.7,如果是mysql8,会报错:

my_config.h: No such file or directory

需要卸载mysql8安装mysql5.7。

2.create topic failed

initdata for bkdata() [10.xx.xx.111]20201020-094445 172 exec initdata_bkdata on 10.xx.xx.111 [10.xx.xx.111]20201020-094447 104 start to make migration for bkdata ... [10.xx.xx.111]20201020-094447 112 on-migrate ... /data/bkce/bkdata/dataapi/on_migrate [10.xx.xx.111]20201020-094449 10 init dataserver zk config [10.xx.xx.111]20201020-094449 13 create topic waiting node ready Traceback (most recent call last): File "/opt/py27/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/opt/py27/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/data/bkce/bkdata/dataapi/databus/script/kafka/__main__.py", line 145, in <module> ensure_topic("connect-offsets." + m, 5, replication_factor=2, configs=topic_configs) File "/data/bkce/bkdata/dataapi/databus/script/kafka/__main__.py", line 85, in ensure_topic topic, error_code)) Exception: Unknown error code during creation of topic `connect-offsets.etl`: -1 [10.xx.xx.111]20201020-094450 14 create topic failed. [10.xx.xx.111]20201020-094450 153 migrate failed for bkdata(dataapi) [10.xx.xx.111]20201020-094450 179 Abort

参考数据链路简易排查方案:https://bk.tencent.com/s-mart/community/question/571

四、安装官方服务

报错如下

应用当前状态:正在上线,不能进行部署操作!

[root@rbtnode1 install]# ./bkcec install saas-o install saas-o(all) Deploy official saas bk_fta_solutions 2020-10-20 11:38:58 42 INFO request login token 2020-10-20 11:38:58 32 INFO <RequestsCookieJar[<Cookie bklogin_csrftoken=123abcd for paas.bk.com/>]> 2020-10-20 11:38:58 51 INFO emulate login to http://paas.bk.com:80/login/, form data: {'username': '123abcd', 'csrfmiddlewaretoken': '123abcd', 'password': '123abcd'} 2020-10-20 11:38:58 55 INFO bklogin_csrftoken: 123abcd 2020-10-20 11:38:59 32 INFO <RequestsCookieJar[<Cookie bk_csrftoken=123abcd for paas.bk.com/>]> 2020-10-20 11:38:59 77 INFO get upload token:123abcd from http://paas.bk.com:80/saas/bk_fta_solutions/release/ 2020-10-20 11:38:59 89 INFO uploading file /data/src/official_saas/bk_fta_solutions_V5.1.43-bkofficial.tar.gz, url:http://paas.bk.com:80/saas/upload/bk_fta_solutions/, data: {'csrfmiddlewaretoken': '123abcd'} ... 2020-10-20 11:39:08 232 INFO query saas_version_id: 3 2020-10-20 11:39:08 235 INFO start deploy app:bk_fta_solutions url: http://paas.bk.com:80/saas/release/online/3/ 2020-10-20 11:39:08 104 INFO start deploy bk_fta_solutions, upload_csrftoken: 123abcd 2020-10-20 11:39:08 113 INFO resposne: {u'message': u'SaaS App\u5e94\u7528\u5f53\u524d\u72b6\u6001\uff1a\u6b63\u5728\u4e0a\u7ebf\uff0c\u4e0d\u80fd\u8fdb\u884c\u90e8\u7f72\u64cd\u4f5c\uff01', u'result': False} Traceback (most recent call last): File "deck/saas.py", line 236, in <module> event_id, app_code = appmgr.deploy(deploy_url, saas_env[args.deploy_env]) File "deck/saas.py", line 116, in deploy logg.info(u"{}".format(deploy_result["msg"])) KeyError: 'msg' [10.xx.xx.111]20201020-113908 153 Deploy saas bk_fta_solutions failed. [10.xx.xx.111]20201020-113909 47 Abort

在本地服务器安装没这个问题,云服务器安装出现的,估计是云服务器(4核,16 GiB,100G SSD)的配置跟不上,在paas的开发者中心可以看到确实是在部署中,需要等待一段时间部署完成既可以了。这是服务器资源不够用的现象,最好将btdata迁移到另外的服务器。 查看日志

cd /data/bkce/paas_agent/apps/logs

五、迁移服务

安装完之后发现服务器跑不动,CPU内存都非常紧张,按文档做迁移

./bkcec sync common

执行该命令时报错

[root@rbtnode1 install]# ./bkcec sync common rsynchronize files of [certs,scripts,identities,common service] [10.27.193.111]20201020-152425 232 >> rsync -a /data/src/service/py27/ root@10.27.193.199:/opt/py27/ bash: rsync: command not found rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: remote command not found (code 127) at io.c(226) [sender=3.1.2] [10.27.193.111]20201020-152425 233 copy files to remote failed. [10.27.193.111]20201020-152425 232 >> rsync -a /data/src/service/py27/ /opt/py27/ [10.27.193.111]20201020-152425 232 >> rsync -a /data/src/service/py36/ /opt/py36/ [10.27.193.111]20201020-152426 232 >> rsync -a /tmp/dCF2EO4AgIe7AKF/ /data/bkce/ [10.27.193.111]20201020-152426 232 >> rsync -a /data/src/ENTERPRISE /data/src/blueking.env /data/src/ [10.27.193.111]20201020-152426 232 >> rsync -a --delete /data/install/ /data/install/ [10.27.193.111]20201020-152426 232 >> rsync -a /tmp/aA4UyRO2SlHk7sI/ /data/src/

解决方法:需要将目标机器按 蓝鲸文档中心 > 部署维护 > 环境准备 准备好环境。

六、停止服务失败

类型提示supervisor-*.sock文件不存在的,需要重启服务,但是直接用 **./bkcec stop 服务名 ** 时进程不能杀掉,

[10.xx.xx.111] server unix:///data/bkce/logs/cmdb/supervisor-cmdb3.sock no such file

检查服务残留

ps -ef | grep cmdb

需要手动删除

ps -ef | grep cmdb | grep -v grep | awk '{print $2}' | xargs kill -9

七、安装proxy

参考安装proxy常见报错 https://bk.tencent.com/s-mart/community/question/204 重定向是由于Nginx没有配置外网IP的原因,按提示做就解决了

八、监控数据没上来

数据链路简易排查方案 https://bk.tencent.com/s-mart/community/question/571 由于是单机部署,需要修改kafka的配置

cat /data/bkce/service/kafka/config/server.properties

最新回复(0)