Docker Swarm 实战
emmmmmm,最近有一套老系统在重构,目前项目运行环境为jdk1.7+resin-4.0.47+activemq+redis+mysql
,嗯,就这些,现在准备要上测试服了,服务器买了一堆,现在需要用的基础服务包括但不仅限于elasticsearch、logstash、kibana、filebeat、zookeeper、activemq、mongodb、redis、MySQL
,现在不需要用tomcat||resin
了,项目用的Spring Boot
,直接运行jar
包就好,我不是很懂,测试环境从不加监控,所以zabbix
暂时也放一边,现在需要把运行环境搭建一下,所以,直接上Docker Swarm
了。
准备工作
目前docker-ce
已经全部安装完毕,使用ansible
进行的批量安装,这里就不多提了,整理了一下,需要用到的基础服务如下,elasticsearch集群、elasticsearch-head、logstash、kibana、kafka集群、zookeeper集群、activemq、mongodb、redis集群、MySQL
。
测试服随意一点,以方便快捷为主,所以全部放到容器,现在启用swarm
,把做基础服务的服务器加到集群里。
启动swarm
[root@docker-manager ~]# docker swarm init --advertise-addr 172.24.90.38
Swarm initialized: current node (bh950ji26br60or076cmwvmu3) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
加入集群
[root@docker-manager ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377
[root@docker-manager ~]# ansible service -m shell -a "docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377"
现在已经全部加进来了,顺便提一嘴,现在swarm
没做高可用,暂时只有一个manager
,建议生产环境的话最低三个,也就是一个Leader
,两个Reachable
,下面开始创建服务。
创建MySQL服务
配置文件
我都是用yml
文件去创建的服务,也就是stack
,感觉这些服务写配置文件比较好,K8S
暂时玩不利索,所以暂时用swarm
了,还有就是我的服务约束是用主机名指定的,不太建议这样做,如果被指定的服务器宕掉了就会抛no suitable node
的错,会一直等待被指定的服务器恢复,单机的服务没啥子办法,服务多的服务器话最好是用标签,自己看着办吧,MySQL
的话就比较简单了,如下。
[root@docker-manager /swarm/mysql]# cat mysql.yml
version: '3.7'
services:
mysql:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mysql:5.7
hostname: mysql
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mysql]
ports:
- 3306:3306
environment:
MYSQL_ROOT_PASSWORD: passwd
volumes:
- /data/mysql:/var/lib/mysql
networks:
- recharge
networks:
recharge:
external: true
name: recharge
新建网络
我指定使用了一个名为recharge
的网络,现在还没有,所以要创建一下,创建后就可以启动服务了
[root@docker-manager ~]# docker network create --driver overlay --subnet 13.14.15.0/24 --ip-range 13.14.15.0/24 --gateway 13.14.15.1 recharge
jnptx5xp3jhmcn8uw4owrzqv9
创建服务
记得创建数据目录撒
[root@docker-manager /swarm/mysql]# ansible mysql -m file -a "path=/data/mysql state=directory"
[root@docker-manager /swarm/mysql]# docker stack deploy -c mysql.yml --with-registry-auth mysql
[root@docker-manager /swarm/mysql]# docker stack ps mysql
验证的话只要看一下数据目录有没有东西就行了,有东西就说明正常启动了。
[root@docker-manager /swarm/mysql]# ansible mysql -m shell -a "ls /data/mysql/"
没有问题的撒,过,下面搞activemq
创建activemq服务
先贴一下Dockerfile
吧
Dockerfile
FROM webcenter/activemq
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone && \
sed -i '/1G/d' /opt/activemq/bin/env
那个sed
删掉的是脚本内一个名为ACTIVEMQ_OPTS_MEMORY
的变量,是用来定义mq
启动内存的,默认1G
,现已加入全局变量,请自行调整,还有就是生产环境测试并发的时候,并发了一万条请求进来,结果MQ
直接崩了,所以建议还是改一下配置文件吧,像是什么最大连接数和需要启动的transportConnectors
对象,我们就是用了一个61616
和管理端口,像是什么6161{3..4}的都没用到,我在配置文件里全部删掉了,然后就没问题了。
配置文件
[root@docker-manager /swarm/activemq]# cat activemq.yml
version: '3.7'
services:
activemq:
image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3
hostname: activemq
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == activemq]
ports:
- 8161:8161
- 61616:61616
environment:
ACTIVEMQ_OPTS_MEMORY: -Xms2048M -Xmx2048M
networks:
- recharge
networks:
recharge:
external: true
name: recharge
创建服务
[root@docker-manager /swarm/activemq]# docker stack deploy -c activemq.yml --with-registry-auth activemq
[root@docker-manager /swarm/activemq]# docker stack ps activemq
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jcmhuzekzm75 activemq_activemq.1 registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3 activemq Running Preparing 8 seconds ago
这个怎么确认是否创建成功,访问一下8161端口就知道了,这个是web
管理端口
[root@docker-manager /swarm/activemq]# curl -I -u admin:admin 127.0.0.1:8161
HTTP/1.1 200 OK
Date: Mon, 21 Jan 2019 08:35:41 GMT
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: text/html
Content-Length: 6047
Server: Jetty(9.2.13.v20150730)
木有问题撒,过,下面是mongodb
创建activemq集群
这个时候后加的撒,之前mq
一直是单点,而且从来没出过问题,但是我还是有点方,所以最近加了一个mq
的集群,具体的搭建方式看一下这里,不是伪集群撒,我是直接把他扔到容器里了,使用七个容器,其中三个为zookeeper
集群,剩下三个为activemq
集群,一个master
,两个slave
,还有一个nginx
四层代理,就这样,配置文件的话帖重点吧。
activemq集群配置信息
Dockerfile
基于我之前的镜像构建的。
FROM registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3
COPY activemq.xml /opt/activemq/conf
COPY run.sh /app/
activemq.xml
<persistenceAdapter>
<replicatedLevelDB
directory="${activemq.data}/leveldb"
replicas="3"
bind="tcp://0.0.0.0:22181"
zkAddress="mqzoo1:2181,mqzoo2:2181,mqzoo3:2181"
zkPath="/zookeeper/leveldb-stores"
hostname="sedhostname"
/>
</persistenceAdapter>
run.sh
#!/bin/sh
sed -i s#sedhostname#$HOSTNAME#g /opt/activemq/conf/activemq.xml
python /app/entrypoint/Init.py
exec /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
服务配置文件
version: '3.7'
services:
activemq1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
hostname: activemq1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == activemq001]
environment:
ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
networks:
- rj-bai
activemq2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
hostname: activemq2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == activemq002]
environment:
ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
networks:
- rj-bai
activemq3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
hostname: activemq3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == activemq003]
environment:
ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
networks:
- rj-bai
networks:
rj-bai:
external: true
name: rj-bai
zookeeper for activemq
version: '3.7'
services:
mqzoo1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: mqzoo1
deploy:
replicas: 1
placement:
constraints: [node.hostname == activemq001]
environment:
ZOO_MY_ID: 1
JVMFLAGS: -Xms512m -Xmx512m
ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- rj-bai
mqzoo2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: mqzoo2
deploy:
replicas: 1
placement:
constraints: [node.hostname == activemq002]
environment:
ZOO_MY_ID: 2
JVMFLAGS: -Xms512m -Xmx512m
ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- rj-bai
mqzoo3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: mqzoo3
deploy:
replicas: 1
placement:
constraints: [node.hostname == activemq003]
environment:
ZOO_MY_ID: 3
JVMFLAGS: -Xms512m -Xmx512m
ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- rj-bai
networks:
rj-bai:
external: true
name: rj-bai
mq代理服务
使用nginx
做了一个四层代理,配置文件如下。
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
stream {
log_format main '$remote_addr - [$time_local] ' '$status ' '$upstream_addr';
access_log /var/log/nginx/access.log main;
upstream activemq {
server activemq1:61616 max_fails=1 fail_timeout=1s;
server activemq2:61616 max_fails=1 fail_timeout=1s;
server activemq3:61616 max_fails=1 fail_timeout=1s;
}
server {
listen 61616;
proxy_pass activemq;
}
upstream admin {
server activemq1:8161 max_fails=1 fail_timeout=1s;
server activemq2:8161 max_fails=1 fail_timeout=1s;
server activemq3:8161 max_fails=1 fail_timeout=1s;
}
server {
listen 8161;
proxy_pass admin;
}
}
配置文件
version: '3.7'
services:
activemq:
image: registry.cn-beijing.aliyuncs.com/rj-bai/nginx:1.15.9-mq
hostname: nginx
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
endpoint_mode: vip
ports:
- 8161:8161
- 61616:61616
networks:
- rj-bai
networks:
rj-bai:
external: true
name: rj-bai
所以你程序连接mq
的时候直接连接这个代理就行了,创建顺序zookeeper-mq-cluster-mq-proxy
就可以了
创建mongodb服务
这个我用到了mongo&mongo-express
,mongo-express
为mongodb
的web
管理工具,功能类似phpMyAdmin
,选装,Dockerfile
如下。
Dockerfile
mongo
FROM mongo
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
mongo-express
FROM mongo-express
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
配置文件
mongo 如下
[root@docker-manager /swarm/mongodb]# cat mongo.yml
version: '3.7'
services:
mongo:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb:4.0.5
hostname: mongodb
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mongodb]
# environment:
# MONGO_INITDB_ROOT_USERNAME: root
# MONGO_INITDB_ROOT_PASSWORD: passwd
ports:
- 27017:27017
volumes:
- /data/mongodb:/data/db
networks:
- recharge
networks:
recharge:
external: true
name: recharge
被我注释掉的是定义登陆mongodb
的用户名和密码,我问了一下开发人员说不用密码就好,所以我就没加,既然没加密码,安全限制不用说了,自己去做吧,下面是mongo-express
mongo-express如下
[root@docker-manager /swarm/mongodb]# cat mongo-express.yml
version: '3.7'
services:
mongo-express:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb-express:0.12.0
hostname: mongodb-express
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mongodb]
ports:
- 8081:8081
environment:
# ME_CONFIG_MONGODB_ADMINUSERNAME: root
# ME_CONFIG_MONGODB_ADMINPASSWORD: passwd
ME_CONFIG_BASICAUTH_USERNAME: admin
ME_CONFIG_BASICAUTH_PASSWORD: Sowhat?
networks:
- recharge
networks:
recharge:
external: true
name: recharge
至于这两个服务我为什么要拆开,因为暂时stack
暂时不支持定义服务的启动顺序,正常来说是先启动mongodb
,再启动mongo-express
,如果先启动mongo-express
第一次会启动失败,等mongo
起了之后就没问题了,大概就这样,所以我分开了。
创建服务
[root@docker-manager /swarm/mongodb]# ansible mongodb -m file -a "path=/data/mongodb state=directory"
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo.yml --with-registry-auth mongo
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo-express.yml --with-registry-auth mongo-express
打开管理页面,看一下是否能连接到mongodb
,常规操作一下。
没问题,这个页面的时间不太清楚是在哪里取的值,服务器和容器的时间都没问题,所以不管他,过,下面redis集群
。
创建redis集群
这个镜像是我自己手动做的,用的最新稳定版本,配置文件是直接传到了镜像里,也开启了持久化,大概是这样。
Dockerfile
先贴一下Dockerfile
FROM registry.cn-beijing.aliyuncs.com/rj-bai/centos:7.5
RUN yum -y install wget make gcc && yum clean all && \
wget http://download.redis.io/releases/redis-5.0.3.tar.gz && tar zxf redis-5.0.3.tar.gz && rm -f redis-5.0.3.tar.gz && \
cd redis-5.0.3/ && make && make install
COPY start.sh /
COPY redis.conf /
CMD ["/bin/bash", "/start.sh"]
start.sh文件内容
#!/bin/bash
if [ -n "$DIR" ];
then
sed -i s\#./\#$DIR\#g /redis.conf
fi
if [ ! -n "$REDIS_PORT" ];
then
redis-server /redis.conf
else
sed -i 's#6379#'$REDIS_PORT'#g' /redis.conf && redis-server /redis.conf
fi
redis.conf主要内容
port 6379
save 900 1
save 300 10
save 60 10000
dbfilename "dump.rdb"
dir ./
用脚本替换了两个值,一个是端口一个是数据存储目录,大概就这样。
配置文件
[root@docker-manager /swarm/redis]# cat redis.yml
version: '3.7'
services:
redis1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-1]
environment:
DIR: /data/7000
REDIS_PORT: 7000
volumes:
- /data/7000:/data/7000
networks:
- host
redis2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-1]
environment:
DIR: /data/7001
REDIS_PORT: 7001
volumes:
- /data/7001:/data/7001
networks:
- host
redis3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-2]
environment:
DIR: /data/7002
REDIS_PORT: 7002
volumes:
- /data/7002:/data/7002
networks:
- host
redis4:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-2]
environment:
DIR: /data/7003
REDIS_PORT: 7003
volumes:
- /data/7003:/data/7003
networks:
- host
redis5:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis5
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-3]
environment:
DIR: /data/7004
REDIS_PORT: 7004
volumes:
- /data/7004:/data/7004
networks:
- host
redis6:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis6
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-3]
environment:
DIR: /data/7005
REDIS_PORT: 7005
volumes:
- /data/7005:/data/7005
networks:
- host
networks:
host:
external: true
name: host
我用的是网络是host
,开始创建服务。
创建服务
[root@docker-manager /swarm/redis]# docker stack deploy -c redis.yml --with-registry-auth redis
[root@docker-manager /swarm/redis]# docker stack ps redis
这还不算完事,现在只是把redis
启动了,还没有做集群,还得这样。
[root@docker-manager ~]# docker run --rm -it inem0o/redis-trib create --replicas 1 172.24.89.242:7000 172.24.89.242:7001 172.24.89.241:7002 172.24.89.241:7003 172.24.89.237:7004 172.24.89.237:7005
执行后需要输入一个yes,看到这个提示就算是成功了。
登录到容器确认一下。
[root@docker-manager /swarm/redis]# ssh redis-1
Last login: Mon Jan 21 18:39:54 2019 from 172.24.90.38
[root@redis-1 ~]# docker exec -it redis_redis1.1.lhe09fkmcs2j5mnvj3ow7uo14 /bin/bash
[root@redis1 /]# redis-cli -c -p 7000
127.0.0.1:7000> cluster nodes
28aa26332cd2799fe7b615865fa0259b9154299a 172.24.89.241:7003@17003 slave 34f738b7eff022be6eeb5c4cceafd52a935b1fc6 0 1548067394712 4 connected
3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 172.24.89.237:7004@17004 master - 0 1548067394211 5 connected 10923-16383
09db8b01f9f9d0b38152af82a8c38fdf85e1a9b3 172.24.89.242:7001@17001 slave 37b6dec48b97e816431d7f2cbe71489c3afdc508 0 1548067395000 3 connected
34f738b7eff022be6eeb5c4cceafd52a935b1fc6 172.24.89.242:7000@17000 myself,master - 0 1548067394000 1 connected 0-5460
a7c5314086b0a7d57d09861da8af31c514f8a167 172.24.89.237:7005@17005 slave 3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 0 1548067396217 6 connected
37b6dec48b97e816431d7f2cbe71489c3afdc508 172.24.89.241:7002@17002 master - 0 1548067395214 3 connected 5461-10922
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1113
cluster_stats_messages_pong_sent:1068
cluster_stats_messages_sent:2181
cluster_stats_messages_ping_received:1063
cluster_stats_messages_pong_received:1113
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:2181
没有问题撒,大概就是这样,过,下面是zookeeper
集群
创建zookeeper集群
Dockerfile
基于官方镜像及文档,使用Dockerfile
重新构建了一下镜像,Dockerfile
如下
FROM zookeeper:latest
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
配置文件
[root@docker-manager /swarm/zookeeper]# cat zookeeper.yml
version: '3.7'
services:
zoo1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == zookeeper-1]
environment:
ZOO_MY_ID: 1
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
zoo2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo2
deploy:
replicas: 1
placement:
constraints: [node.hostname == zookeeper-2]
environment:
ZOO_MY_ID: 2
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
zoo3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo3
deploy:
replicas: 1
placement:
constraints: [ node.hostname == zookeeper-3]
environment:
ZOO_MY_ID: 3
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
networks:
recharge:
external: true
name: recharge
创建服务
[root@docker-manager /swarm/zookeeper]# docker stack deploy -c zookeeper.yml --with-registry-auth zookeeper
[root@docker-manager /swarm/zookeeper]# docker stack ps zookeeper
去看一下是否成功了,正常来说一个leader
两个follower
就对了,这种效果撒
哦了撒,没问题,这个zookeeper
主要是项目用的注册中心,下一个,kafka
集群。
创建kafka集群
kafka
依赖zookeeper
,之前的那个zookeeper是项目用的,所以现在再启动一套供kafka
使用,kafka
的Dockerfile
如下
Dockerfile
FROM wurstmeister/kafka
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
zookeeper
的构建方式还是以前那样
配置文件
zookeeper
和kafka
的配置文件我还拆出来了,原因是kafka
依赖zookeeper
,得先启动zookeeper
,再启动kafka
,配置文件如下。
zookeeper
[root@docker-manager /swarm/kafka]# cat kafka-zookeeper.yml
version: '3.7'
services:
kaf-zoo1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo1
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-1]
environment:
ZOO_MY_ID: 1
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
kaf-zoo2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo2
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-2]
environment:
ZOO_MY_ID: 2
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
kaf-zoo3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo3
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-3]
environment:
ZOO_MY_ID: 3
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
healthcheck:
test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
interval: 5s
timeout: 3s
retries: 3
networks:
- recharge
networks:
recharge:
external: true
name: recharge
kafka
[root@docker-manager /swarm/kafka]# cat kafka.yml
version: '3.7'
services:
kafka1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-1]
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka1
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 1
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
kafka2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-2]
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka2
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
kafka3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-3]
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka3
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 3
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
networks:
recharge:
external: true
name: recharge
创建服务
先创建zookeeper
,然后再kafka
[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka-zookeeper.yml --with-registry-auth kafka-zookeeper
[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka.yml --with-registry-auth kafka
是否正常启动这个是否正常启动看kafka
日志就行了。
[root@docker-manager ~]# docker service logs 服务名
自己看一下吧,没有抛错就没问题撒,下面创建elasticsearch
集群。
创建elasticsearch集群
这里先简单的提一下,我们这里用的es*3,logstash*1,kibana*1,es-head*1,kafka集群
,需要kafka
的原因是项目日志不是通过filebeat
或logstash
推到es
里面的,而是log4j
推到了kafka
,logstash
的input
为kafka
,output
为es
集群,说白了就是项目把日志推到了kafka
里面,kafka
推到了logstash
,最后logstash
再推到es
集群,之前就做过,都是传统方式二进制安装,太麻烦就没写文档,大概就是这样,首先一步步实现吧,先把es
集群做了,先贴一下Dockerfile
Dockerfile
先贴一下默认启用的插件吧,有别的需要请使用Dockerfile
自行安装或卸载,使用6.5.4
版本,默认包含 X-Pack
插件。
[root@695653b6515c elasticsearch]# elasticsearch-plugin list
ingest-geoip
ingest-user-agent
上述两个插件我会用到,别的就用不到了,所以这两个就够了,下面是Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:6.5.4
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
配置文件
我使用了三个es
组成了集群,所以配置文件如下,服务名尽量不要改撒,和kibana
有关联的。
[root@docker-manager /swarm/elasticsearch]# cat elasticsearch.yml
version: '3.7'
services:
elasticsearch:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-1]
environment:
- cluster.name=es
- node.name=es-1
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- discovery.zen.fd.ping_timeout=120s
- discovery.zen.fd.ping_retries=6
- discovery.zen.fd.ping_interval=30s
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- recharge
elasticsearch2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-2]
environment:
- cluster.name=es
- node.name=es-2
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- discovery.zen.fd.ping_timeout=120s
- discovery.zen.fd.ping_retries=6
- discovery.zen.fd.ping_interval=30s
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9201:9200
networks:
- recharge
elasticsearch3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-3]
environment:
- cluster.name=es
- node.name=es-3
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- discovery.zen.fd.ping_timeout=120s
- discovery.zen.fd.ping_retries=6
- discovery.zen.fd.ping_interval=30s
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9202:9200
networks:
- recharge
networks:
recharge:
external: true
name: recharge
就这样,接下来需要创建目录和改一些参数。
创建服务
[root@docker-manager ~/sh]# cat es.sh
#!/bin/bash
cat >>/etc/security/limits.conf<<OEF
* soft nofile 65536
* hard nofile 65536
* soft nproc 2048
* hard nproc 4096
OEF
cat >>/etc/sysctl.conf<<OEF
vm.max_map_count=655360
fs.file-max=655360
OEF
/usr/sbin/sysctl -p
[root@docker-manager ~/sh]# ansible elasticsearch -m script -a "/root/sh/es.sh"
[root@docker-manager ~/sh]# ansible elasticsearch -m file -a "path=/elasticsearch state=directory owner=1000 group=1000 mode=755"
[root@docker-manager /swarm/elasticsearch]# docker stack deploy -c elasticsearch.yml --with-registry-auth elasticsearch
[root@docker-manager /swarm/elasticsearch]# docker stack ps elasticsearch
看一下集群状态撒
[root@docker-manager ~]# curl http://127.0.0.1:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1548133528 05:05:28 es green 3 3 0 0 0 0 0 0 - 100.0%
木有问题撒,下面把logstash
启了
创建logstash服务
先看Dockerfile
吧,如下
Dockerfile
FROM docker.elastic.co/logstash/logstash:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
USER logstash
RUN logstash-plugin install logstash-input-kafka && logstash-plugin install logstash-output-elasticsearch
COPY kafka.conf /usr/share/logstash/config/
COPY start.sh /
CMD ["/bin/bash","/start.sh"]
需要什么插件自行安装吧,我是直接吧配置文件传了进去,也可以使用自定义的,脚本内容如下
#!/bin/bash
if [ -n "$CONFIG" ];
then
logstash -f "$CONFIG"
else
logstash -f ./config/kafka.conf
fi
kafka.conf
内容
这个文编的编写最好咨询开发人员,用的topics
都有什么,然后去创建对应的topics
,现在还没定义,我写的default
input{
kafka{
bootstrap_servers => ["kafka1:9092,kafka2:9092,kafka3:9092"]
consumer_threads => 5
topics => ["default"]
decorate_events => true
type => "default"
}
}
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:logdate}"]
}
date {
match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]
target => "@timestamp"
}
mutate {
remove_tag => ["logdate"]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200","elasticsearch2:9200","elasticsearch3:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug {}
}
}
启动logstash
配置文件
[root@docker-manager /swarm/logstash]# cat logstash.yml
version: '3.7'
services:
logstash:
image: registry.cn-beijing.aliyuncs.com/rj-bai/logstash:6.5.4
hostname: logstash
deploy:
replicas: 1
placement:
constraints: [node.hostname == logstash]
environment:
- "LS_JAVA_OPTS=-Xms1G -Xmx1G"
networks:
- recharge
networks:
recharge:
external: true
name: recharge
如果要挂载自定义的配置文件,请使用root
用户挂载撒
创建服务
[root@docker-manager /swarm/logstash]# ansible logstash -m script -a "/root/sh/es.sh"
[root@docker-manager /swarm/logstash]# docker stack deploy -c logstash.yml --with-registry-auth logstash
[root@docker-manager /swarm/logstash]# docker stack ps logstash
没问题的撒,然后接下来kibana
创建kibana服务
Dockerfile
如下
Dockerfile
FROM docker.elastic.co/kibana/kibana:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
配置文件
version: '3.7'
services:
kibana:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kibana:6.5.4
hostname: kibana
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == logstash]
ports:
- 5601:5601
networks:
- recharge
networks:
recharge:
external: true
name: recharge
创建服务
[root@docker-manager /swarm/kibana]# docker stack deploy -c kibana.yml --with-registry-auth kibana
创建es-head
Dockerfile
如下
FROM mobz/elasticsearch-head:5
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
大概就是这样撒,现在可以启动了。
配置文件
version: '3.2'
services:
es-head:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch-head:5
deploy:
placement:
constraints:
- node.hostname == logstash
ports:
- 9100:9100
networks:
- recharge
networks:
recharge:
external:
name: recharge
可以启动了撒。
创建服务
[root@docker-manager /swarm/logstash]# docker stack deploy -c es-head.yml --with-registry-auth es-head
到这里就结束了,最后看一眼所有的服务
所有的服务
就这些,然后看一眼kibana
吧,看es
集群是否运行正常,主页如下
点开monitoring
,启用监控,自己看吧。
现在es
集群还没有数据,而且有些服务还没有测,所以,我提供连接信息后,开发人员要了一个项目包,该包我看了一下除了没有涉及到连接mq&mysql
,其他的都涉及到了,数据库不用说了,做了N次了,绝对没问题,mq
让他们在本地项目连接测了一下,没问题,那就用这个包测一下吧。
测试阶段
然后现在有一个蛋疼的问题,在创建redis
的时候我用的网络是host
网络,而不是recharge
网络,我们这里项目连接基础服务都是通过DNS
解析去连接的,也就是写了hosts
文件,项目配置文件连接信息写的都是redis1,kafka1
这种的,所以redis
得手动写hosts
了,至于为什么我用的hosts
网络,原因就是做集群的时候不能用hosts
去指定节点信息,支持不是很友好,如果跑在容器里,我会启动6个,做集群的时候我还得去查容器的IP
,如果某个容器宕掉IP
也会变,真的麻烦,考虑上生产的时候不用容器跑redis
了,测试的暂时就这样吧。
现在直接在manager
节点手动启一服务个就行了,不写配置文件了,我随便拉了一个openjdk1.8
的镜像,使用Dockerfile
把包传了进去,项目连接信息有这些。
spring.dubbo.registry.address=zookeeper://zoo1:2181;zookeeper://zoo2:2181;zookeeper://zoo3:2181
redis.ipPorts=redis1:7000,redis1:7001,redis2:7002,redis2:7003,redis3:7004,redis3:7005
spring.data.mongodb.host=mongo
elk.kafka.urls=kafka1:9092,kafka2:9092,kafka3:9092
镜像创建过程我就不写了,做个运行jar
包的镜像应该都没问题,我这里直接启动,已经做好了
[root@docker-manager ~]# docker service create --name spring-boot --network recharge --constraint node.role==manager --replicas 1 spring-boot
[root@docker-manager ~]# docker service logs -f spring-boot
正常启动了,没有任何抛错,现在logstash
应该也有项目日志输出了,看一下。
[root@docker-manager ~]# ssh logstash
[root@docker-manager ~]# docker service logs -f logstash_logstash
取了其中一段,去kibana
看一眼撒
有了撒,创建索引后
有数据了,当然用的不止是这一个,现在顺便把nginx
日志的绘图加进去,现在nginx
我并没有扔到容器里,所以推log
需要filebeat
了,现在做一下。
nginx日志分析
点开kibana
图标→Logging→Add log data→Nginx logs
,会看到详细步骤,我都是centos7+
的系统,所以选择RPM
,阔以看到要装两个插件,都已经装了,所以登录到nginx
服务器,执行几条命令就好,为了方便我全部服务器添加了es、kibana
的hosts
解析,所以配置如下。
[root@nginx ~]# rpm -Uvh https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.5.4-x86_64.rpm
[root@nginx ~]# vim /etc/filebeat/filebeat.yml
setup.kibana:
host: "kibana:5601"
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["elasticsearch:9200","elasticsearch2:9201","elasticsearch3:9202"]
我的log
并不是全推,这个nginx
只有两个,所以写成这样。
[root@nginx ~]# filebeat modules enable nginx
Enabled nginx
[root@nginx ~]# vim /etc/filebeat/modules.d/nginx.yml
- module: nginx
access:
enabled: true
var.paths: ["/usr/local/nginx/logs/yourlogfile.log","/usr/local/nginx/logs/yourlogfile.log"]
error:
enabled: true
var.paths: ["/usr/local/nginx/logs/error.log"]
最后启动filebeat
[root@nginx ~]# filebeat setup
Loaded index template
Loading dashboards (Kibana must be running and reachable)
Loaded dashboards
Loaded machine learning job configurations
[root@nginx ~]# systemctl start filebeat.service
之后会多一个名为filebeat
的索引,然后就有图形了,在添加数据的页面可以看到支持分析的logs
有很多,像是什么redis、mysql
系统日志之类的,自行探索吧,我暂时不打算加了,上生产再加。
最终测试
咨询了开发人员这套系统的项目是13个,生产数据库准备用阿里云的RDS
,测试服暂时是准备了十个业务服务器,所以现在组成集群的服务器有这些,27个
emmmm,说实话第一次这样做,不知道基础服务能不能抗住,所以我准备测一下,启服务试试,越多越好,所以我写了下面的文件。
version: '3.7'
services:
spring-boot:
image: registry.cn-beijing.aliyuncs.com/rj-bai/spring-boot:latest
deploy:
replicas: 50
endpoint_mode: vip
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
window: 10s
update_config:
parallelism: 10
delay: 60s
failure_action: rollback
rollback_config:
parallelism: 10
delay: 60s
failure_action: pause
ports:
- 1116:1116
networks:
- recharge
networks:
recharge:
external: true
name: recharge
compose
文件的编写建议参考官方文档,我上面用的参数在那里都能找到,我是直接启动50
个spring-boot
服务,没有什么服务约束,任意服务器都能跑,感觉这样测没啥意义,但是能让我心里有个底,50
个服务同时启动上面的服务器能不能抗住,所以,就启动吧,上述项目没有涉及到连接mysql&mq
,其他的都涉及到了,开始。
[root@docker-manager /swarm/spring-boot]# docker stack deploy -c spring-boot.yml --with-registry-auth spring-boot
[root@docker-manager /swarm/spring-boot]# docker stack ps spring-boot
大概是这种效果。
然后看logstash,日志疯狂输出,等停了应该就算完事了,看一眼写了多少日志。
算了一下大概3943
条日志,现在已经启动完了,看看有没有宕掉的服务。
[root@docker-manager ~]# for i in `docker stack ls | grep -vi "NAME" | awk {'print $1'}`;
> do
> docker stack ps $i --filter desired-state=shutdown
> done
nothing found in stack: activemq
nothing found in stack: elasticsearch
nothing found in stack: es-head
nothing found in stack: kafka
nothing found in stack: kafka-zookeeper
nothing found in stack: kibana
nothing found in stack: logstash
nothing found in stack: mongo-express
nothing found in stack: mongodb
nothing found in stack: mysql
nothing found in stack: redis
nothing found in stack: spring-boot
nothing found in stack: zookeeper
没有,一切正常,最后瞜一眼全部服务
[root@docker-manager ~]# docker stack ls
[root@docker-manager ~]# docker service ls
没啥子卵问题,先把这个spring-boot
服务删了吧,留着没啥用了,还有zabbix
也可以用swarm
去部署了,建议使用hosts
网络去部署,之前写过,这里就不贴了,下面简单提一下jenkins
。
jenkins
最后简单提一下jenkins
这一块,jenkins
也是暂时没有放到容器里,因为jenkins
服务器现在要做的事情有很多,之前说白了就是jenkins
把项目包打出来,调用playbook
开始更新,所以装了ansible
,这样就行了,现在情况不同了。
现在要更新项目就必须用镜像去更新了,看了一下docker
的插件,貌似没有我想要的,所以我暂时的做法是jenkins
打包,打包成功后写Dockerfile
将项目包传到准备好的jdk1.8
镜像里,然后传到阿里云的镜像仓库,jenkins
远程调用manager
服务器上的脚本进行更新,经过确认所有项目都会映射一个端口出来,且不冲突,需要挂载一个目录到/www/logs
,本目录为项目日志存储目录,所以jenkins
构建成功后操作如下,变量均为jenkins
内置变量。
#!/bin/bash
### 定义服务属性,自行修改
APP_FILE=jar包绝对路径
SERVER_NAME=服务名
APP_PORT=项目端口
REPLICAS=副本数
ENVIRONMENT=启动环境
NODE_LABELS=节点标签
REGISTRY=私有仓库地址
JAVA_OPTS="-server -Xms1024M -Xmx1024M -XX:CompressedClassSpaceSize=128M -Djava.security.egd=file:/dev/./urandom"
### 登陆到私有仓库
docker login --username=xxxx --password xxxx $REGISTRY
### 创建项目名和构建次数目录
mkdir -p /data/docker/$JOB_NAME/$BUILD_NUMBER
### 复制项目包到新建目录
if [ -f $APP_FILE ];
then
cp $APP_FILE /data/docker/$JOB_NAME/$BUILD_NUMBER
else
echo "项目包不存在,脚本退出"
exit 1
fi
### 编写Dockerfile,构建镜像
cd /data/docker/$JOB_NAME/$BUILD_NUMBER
cp /data/init/entrypoint.sh /data/docker/$JOB_NAME/$BUILD_NUMBER
jar=`ls *.jar`
cat >>Dockerfile<<OEF
FROM $REGISTRY/oracle-jdk:1.8
ADD $jar /
ADD entrypoint.sh /
CMD ["/bin/bash","/entrypoint.sh"]
OEF
docker build -t $REGISTRY/$JOB_NAME:$BUILD_NUMBER .
### 镜像传到私有仓库
docker push $REGISTRY/$JOB_NAME:$BUILD_NUMBER
sleep 5
### 服务端更新
ssh swarm-manager "/scripts/deploy-service.sh" "$REGISTRY/$JOB_NAME:$BUILD_NUMBER" "$SERVER_NAME" "$APP_PORT" "$REPLICAS" "$ENVIRONMENT" "$NODE_LABELS" "$JAVA_OPTS"
manager
脚本如下,所有项目调用这一个脚本就够了,只要参数别传错了。
#!/bin/bash
### 加载环境变量
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
### 判断参数传入个数,没有做更详细的判断
IMAGE=$1
SERVER_NAME=$2
APP_PORT=$3
REPLICAS=$4
ENVIRONMENT=$5
NODE_LABELS=$6
JAVA_OPTS=`echo ${@:7}`
if [ $# -lt 7 ] ; then
echo "USAGE: $0 请依次传入镜像地址、服务名称、端口号、副本个数、启动环境、JAVA_OPTS、节点标签名称撒"
exit 1;
fi
### 登录到镜像仓库,下载镜像
docker login --username=xxxx --password xxxx $IMAGE
docker pull "$IMAGE" > /dev/null 2>&1
### 判断镜像是否成功
if [ "$?" -ne 0 ]
then
echo "Pull "$IMAGE" Failed"
exit 1
fi
### 检查当前是否有该服务,如果有直接更新,没有创建
docker service ps "$SERVER_NAME" > /dev/null 2>&1
if [ $? -eq 0 ]
then
docker service update --with-registry-auth --image "$IMAGE" "$SERVER_NAME" > /tmp/"$SERVER_NAME"
cat /tmp/"$SERVER_NAME" | grep "rollback" > /dev/null 2>&1
if [ "$?" -eq 0 ];
then
echo "Update "$SERVER_NAME" fail,executed rollback"
exit 1
else
echo "Update "$SERVER_NAME" Success"
exit 0
fi
else
docker service create --name "$SERVER_NAME" \
--replicas "$REPLICAS" \
--network recharge \
--constraint node.labels.regin=="$NODE_LABELS" \
--with-registry-auth \
--endpoint-mode vip \
--publish "$APP_PORT:$APP_PORT" \
--update-parallelism 1 \
--update-order start-first \
--update-failure-action rollback \
--rollback-parallelism 1 \
--rollback-failure-action pause \
--health-cmd "curl 127.0.0.1:"$APP_PORT" > /dev/null 2>&1 || exit 1" \
--health-interval 30s \
--health-start-period 10s \
--health-timeout 3s \
--health-retries 3 \
--env JAVA_OPTS="${JAVA_OPTS}" \
--env ENVIRONMENT=$ENVIRONMENT \
--mount type=bind,src=/www/logs,dst=/www/logs \
$IMAGE > /dev/null
if [ "$?" -eq 0 ]
then
echo "Deploy "$SERVER_NAME" Success"
else
echo "Deploy "$SERVER_NAME" fail"
exit 1
fi
fi
顺便贴一下entrypoint.sh
的内容吧,否则有些地方看着晕,不知道有些参数是干嘛的。
#!/bin/bash
## 获取jar包名称
Jar=`ls /*.jar`
## 还可以进行一些别的初始化工作
## 启动jar包
java ${JAVA_OPTS} -jar "$Jar" --spring.profiles.active="$ENVIRONMENT"
大概就是这样,至于在创建服务时使用的参数,如果你不知道是干嘛的,来这里翻一下,都能找到,我的更新策略是start-first
,逻辑就是会先启动新容器,具体先启动几个新的,就看你同时更新最大任务数是多少了,也就是--update-parallelism
参数,我目前更新策略是同时更新一个,所以就会启动一个新的,等新的启动完了再去关闭一个旧的,以此类推,如果你服务器资源紧张,不要开这个撒。
现在贴的这些都已经在生产环境使用了,只要参数传的没问题,就不会有叉子,像是什么更新回滚策略健康检查全部都加上了,按需调整就好,但是容器也没有加资源限制,默认情况下容器是没有任何资源限制的,但是要注意一个问题,健康检查用的命令一定要在容器中能执行,你的镜像中必须有这个命令才行,否则在创建的时候无限等待,或更新的时候在指定时间制定次数内检查失败就回滚了。
这张图是年前贴的,就不删了。
后期维护遇到的问题
更新项目
今天老大找我,说再买一个服务器,能不能把全部后台迁移到新买的服务器上,我说可以,能迁移,又问我现在后台java
启动内存这块配置是多大,我说是2G
,老大又说太大了,1G
就行了,现在的直接删了就行,重新创建就行,我说不用,更新一下服务就好,我看了一下和后台相关的服务一共三个,各启动了一个副本,所以又买了一个双核4G
的阿里云服务器,买了之后常规操作将他加入swarm
集群,然后开始考虑我要做的事情。
现在已经是在生产环境了,我项目部署方式上面写了,基础服务也是用的上面的部署方式,目前所有应用都扔到标签为server
的节点,我当时想的很简单,服务器买完已经加入到集群了,接下来给这个服务器打一个标签,然后更新一下服务的constraint
和ENV
,应该就可以了,我的操作如下。
先给新服务器打一个标签。
[root@docker-manager ~]# docker node update --label-add regin=upms worker-11
worker-11
[root@docker-manager ~]# docker node inspect worker-11 | grep -i "label" -A2
"Labels": {
"regin": "upms"
},
然后准备更新服务,先是查了一下现在的是什么,
[root@docker-manager ~]# docker service inspect upms-admin | egrep "regin|env" -A 3
"environment=prd",
"Xms=-Xms2048M",
"Xmx=-Xmx2048M"
],
--
"node.labels.regin==server"
],
"Platforms": [
{
[root@docker-manager ~]# docker service ps upms-admin
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
k9ht7k0h1zwt upms-admin.1 registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx worker-3 Running Running 3 hours ago
现在是跑在worker-3
节点的,现在需要把node.labels.regin==serve
改为node.labels.regin==upms
,-Xms2048M
改为-Xms1024M
,-Xmx2048M
改为-Xmx1024M
,大概就是这样,用命令去更新一下撒,更新之后应该跑在worker-11
节点就对了。
[root@docker-manager ~]# docker service update \
> --env-add "Xms=-Xms1024M" \
> --env-add "Xmx=-Xmx1024M" \
> --constraint-rm node.labels.regin==server \
> --constraint-add node.labels.regin==upms \
> upms-admin
[root@docker-manager ~]# docker service inspect upms-admin | egrep "regin|env" -A 3
"environment=prd",
"Xms=-Xms1024M",
"Xmx=-Xmx1024M"
],
--
"node.labels.regin==upms"
],
"Platforms": [
{
--
"environment=prd",
"Xms=-Xms2048M",
"Xmx=-Xmx2048M"
],
--
"node.labels.regin==server"
],
"Platforms": [
{
[root@docker-manager ~]# docker service ps upms-admin
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
v7m646mqanrh upms-admin.1 registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx worker-11 Running Running 37 seconds ago
ku3zy6b3cerd \_ upms-admin.1 registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx worker-3 Shutdown Shutdown about a minute ago
可以看到成功了,但是我在更新第一个项目的时候失败了,我的操作如下,下面是在我本地操作的。
失败示例
先在我本地创建一个服务吧,模拟生产环境的服务创建方式,本地的测试只有两个worker
,所以先打一下标签吧。
[root@manager ~]# docker node update --label-add regin=server worker-1
[root@manager ~]# docker node update --label-add regin=upms worker-2
[root@manager ~]# docker node inspect worker-{1..2} | grep "regin"
"regin": "server"
"regin": "upms"
然后创建一个服务,指定在server
运行,也就是worker-1
,也用和生产一样的策略,有些变量我就手写了。
[root@manager ~]# docker service create --name nginx \
> --replicas 1 \
> --network recharge \
> --constraint node.labels.regin==server \
> --with-registry-auth \
> --endpoint-mode vip \
> --publish 80:80 \
> --update-parallelism 1 \
> --update-order start-first \
> --update-failure-action rollback \
> --rollback-parallelism 1 \
> --rollback-failure-action pause \
> --health-cmd "curl 127.0.0.1:80 > /dev/null 2>&1 || exit 1" \
> --health-interval 30s \
> --health-start-period 30s \
> --health-retries 3 \
> --health-timeout 3s \
> --env "environment="prd"" \
> --env "Xms="-Xms2048M"" \
> --env "Xmx="-Xmx2048M"" \
> --mount type=bind,src=/www/logs,dst=/www/logs \
> registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl
q0pi4boho3iqal51r5xq3wjnf
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
[root@manager ~]# docker service ps nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
bommsx7gbs6b nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl worker-1 Running Running 27 seconds ago
这样就创建好了,现在我想把这个服务迁到worker-2
上,也就是有upms
标签的节点,我迁移第一个项目的操作如下。
[root@manager ~]# docker service update --constraint-add node.labels.regin==upms nginx
结果就是这样。
nginx
overall progress: 0 out of 1 tasks
1/1: no suitable node (scheduling constraints not satisfied on 3 nodes)
提示找不到合适的节点,这个服务并没有受到影响,因为我指定了先启动新的再停旧的,再开一个窗口看一下现在的状态。
[root@manager ~]# docker service ps nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ppzxj3gmx2cr nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl Running Pending about a minute ago "no suitable node (scheduling …"
bommsx7gbs6b \_ nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl worker-1 Running Running 3 minutes ago
大概就这样,无法启动,操作窗口就夯住了,只能Ctrl+c
取消,然后把之前的constraint
删掉。
[root@manager ~]# docker service update --constraint-rm node.labels.regin==server nginx
nginx
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
[root@manager ~]# docker service ps nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
wt4ni6n4r2ig nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl worker-2 Running Running 28 seconds ago
ppzxj3gmx2cr \_ nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl Shutdown Pending 16 minutes ago "no suitable node (scheduling …"
bommsx7gbs6b \_ nginx.1 registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl worker-1 Shutdown Shutdown 24 seconds ago
这样就正常启动了,所以说白了,如果你在创建时候指定过服务约束,现在想换地方,更新的时候constraint-rm
和constraint-add
要同时执行,否则就崩了,还好是后台,也有先启再停的策略。
总的来说其实swarm
已经很强大了,但是对于一些管理细节方面和K8S
相比差的还是比较远,目前swarm
对于我当前是够用了,目前为止公司一套业务使用最多的服务器大概是在50
台左右,所以swarm
能够轻松驾驭了,最近也在琢磨k8s
了,真的好复杂,估计一时半会不会在生产环境使用了,否则就是自找麻烦,暂时就这样。
本作品采用 知识共享署名-相同方式共享 4.0 国际许可协议 进行许可。