New versions of systemd (like in Ubuntu 16.04) are able to configure pid Cgroups (
ulimit -u via Cgroups) using TasksMax.
Check the commit implementing TaksMax.
Author: Lennart Poettering <firstname.lastname@example.org> Date: Fri Nov 13 19:28:32 2015 +0100 core: enable TasksMax= for all services by default, and set it to 512 Also, enable TasksAccounting= for all services by default, too. See: http://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.html
But not we’ve got silly defaults:
$ systemctl show -p TasksMax docker TasksMax=512 $ systemctl show -p TasksMax mysql TasksMax=512
So if you use Docker or MySQL most likely you are going to have trouble without really putting load on your server.
Just set TasksMax for your service.
Feel free to configure that setting fitting you load :)
Update 1: There will be an update for Debian/Ubuntu setting
TaskMax=infinity as default.
Update 2: With version 230 there is a new default \o/ KillUserProcesses=yes. Check it out yourself and cry :(
In my last I was looking for a way to do performance monitoring and I stumbled upon Prometheus. Prometheus is much more than monitoring a single node service. Anyway let’s get the idea of gathering metrics using MySQL as example.
This how a simple configuration of Prometheus could look like:
global: scrape_interval: 1m scrape_timeout: 10s evaluation_interval: 1m scrape_configs: - job_name: mysql scheme: http target_groups: - targets: - '10.17.148.31:9104' labels: zone: mysql
Every minute Prometheus accesses 172.17.148.31:9104/metrics (/metrics is a Prometheus convention) and labels the
zone=mysql. Querying the data you can use the labels.
This is a simple configuration. The fun of Prometheus is to have a lot of targets/jobs.
Let’s have a look at our specific endpoint:
> curl 10.17.148.31:9104/metrics ... mysql_global_status_threads_cached 26 mysql_global_status_threads_connected 99 mysql_global_status_threads_created 125 mysql_global_status_threads_running 2 ...
You as a MySQL administrator know what this is all about. The data is provided by an exporter. In our case a container :)
> docker run -d -p 9104:9104 --link=mysql:backend \ -e DATA_SOURCE_NAME=prometheus:prometheus@secret(backend:3306)/ \ prom/mysqld-exporter
This is old school Docker. Obviously the MySQL is running in a container also (mysql) and we are using the deprecated
The mysqld-exporter has a lot of options:
$ docker run --rm prom/mysqld-exporter --help Usage of /bin/mysqld_exporter: -collect.auto_increment.columns Collect auto_increment columns and max values from information_schema -collect.binlog_size Collect the current size of all registered binlog files -collect.global_status Collect from SHOW GLOBAL STATUS (default true) -collect.global_variables Collect from SHOW GLOBAL VARIABLES (default true) -collect.info_schema.processlist Collect current thread state counts from the information_schema.processlist -collect.info_schema.tables Collect metrics from information_schema.tables (default true) -collect.info_schema.tables.databases string The list of databases to collect table stats for, or '*' for all (default "*") -collect.info_schema.tablestats If running with userstat=1, set to true to collect table statistics -collect.info_schema.userstats If running with userstat=1, set to true to collect user statistics -collect.perf_schema.eventsstatements Collect metrics from performance_schema.events_statements_summary_by_digest -collect.perf_schema.eventsstatements.digest_text_limit int Maximum length of the normalized statement text (default 120) -collect.perf_schema.eventsstatements.limit int Limit the number of events statements digests by response time (default 250) -collect.perf_schema.eventsstatements.timelimit int Limit how old the 'last_seen' events statements can be, in seconds (default 86400) -collect.perf_schema.eventswaits Collect metrics from performance_schema.events_waits_summary_global_by_event_name -collect.perf_schema.file_events Collect metrics from performance_schema.file_summary_by_event_name -collect.perf_schema.indexiowaits Collect metrics from performance_schema.table_io_waits_summary_by_index_usage -collect.perf_schema.tableiowaits Collect metrics from performance_schema.table_io_waits_summary_by_table -collect.perf_schema.tablelocks Collect metrics from performance_schema.table_lock_waits_summary_by_table -collect.slave_status Collect from SHOW SLAVE STATUS (default true) -config.my-cnf string Path to .my.cnf file to read MySQL credentials from. (default "/home/golang/.my.cnf") -log.level value Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal, panic]. (default info) -log_slow_filter Add a log_slow_filter to avoid exessive MySQL slow logging. NOTE: Not supported by Oracle MySQL. -web.listen-address string Address to listen on for web interface and telemetry. (default ":9104") -web.telemetry-path string Path under which to expose metrics. (default "/metrics")
Prometheus ships with an expression browser. Giving you the opportunity to access and graph the data. It also provides his own query language :) Without graphing the following two queries should be self-explaining:
Brian Brazil mentioned to use another (function)[https://prometheus.io/docs/querying/functions/] thx \o/
I recommend to use Grafana as dashboard. You just need to provide the Prometheus server as source and reuse the Queries you used in the expression browser. There is also PomDash, but afaik Grafana is the way to go.
Prometheus rocks. Having a central point to do the performance analysis of the whole datacenter is awesome. There a lot of exporters you can use. Even writing your own exporter is quite easy.
There is a nice presentation I recommend to check it and see the nice graphs Grafana builds :)
Percona is always great in adopting new stuff. Today they announced there Percona Monitoring and Management. Of course it uses also some exporters, Prometheus and Grafana. I’m quite sure it would/could kill other solutions on the market \o/
This blogpost extends last one. In the last blogpost, we had a look into Docker Network and how it makes the communication between the containers (over multiple hosts) easier. Of course we used Galera for that :)
In this blogpost we are going to use Docker Swarm to bootstrap a Galera Cluster.
Why using Docker Swarm?
Docker Swarm is (simplified) a proxy. So we've got one accesspoint to manage multiple hosts. (The swarm manage service will run on 172.17.152.11:2376). We also use Docker Swarm to abstract from the nodes. As we want the cluster to be running but we don't want to define explicitly where to run them. (Think about a 3-node-cluster on Docker Swarm with 100 nodes.)
Let us point the local docker to Docker Swarm:
We still got the cluster from the last blogpost running:
$ docker ps -f name=galera3 -f name=galera2 -f name=galera1 CONTAINER ID IMAGE NAMES 751f4f071359 erkules/galera:basic swarm3/galera3 24d4a2dfe3e2 erkules/galera:basic swarm2/galera2 d3410d308171 erkules/galera:basic swarm1/galera1
Docker Swarm extends NAMES. So we see the hosts the containers run on also.
Let's get rid from the old cluster:
$ docker rm -f swarm3/galera3 swarm2/galera2 swarm1/galera1 swarm3/galera3 swarm2/galera2 swarm1/galera1
We are going to deploy a Galera Cluster. For simplicity we are going to reuse the old overlay-network (named galera).
With Docker Swarm we also change to way we run the containers:
- We don't mention where to run the container
- Every container gets the label galera=setup1
- We tell the container not to run on a node with another container running that label (affinity:...). This is to make sure not two Galera instances are on the same host.
$ docker run -d --name galera1 --net galera -e affinity:galera!=setup1 \ --label galera=setup1 erkules/galera:basic \ --wsrep-cluster-address=gcomm://galera1,galera2,galera3 --wsrep-new-cluster 1c3f8576cb124261c35412c7e643b341ec6f69d70c6a601b7dde8c3574774c42
$ docker run -d --name galera2 --net galera -e affinity:galera!=setup1 \ --label galera=setup1 erkules/galera:basic \ --wsrep-cluster-address=gcomm://galera1,galera2,galera3 611501de09b64475e9356dfb50be7f5bf179919a9e94e60b9d1e466bb7450437
$ docker run -d --name galera3 --net galera -e affinity:galera!=setup1 \ --label galera=setup1 erkules/galera:basic \ --wsrep-cluster-address=gcomm://galera1,galera2,galera3 582aaf272bb449733ca1b95cced1ae7b3ef2e20e105d09261a36b6a0912d9f07
So let's check if everything went fine:
$ docker ps -f label=galera=setup1 CONTAINER ID IMAGE NAMES 582aaf272bb4 erkules/galera:basic swarm2/galera3 611501de09b6 erkules/galera:basic swarm1/galera2 1c3f8576cb12 erkules/galera:basic swarm3/galera1 $ docker exec swarm1/galera2 \ mysql -e 'show global status like "wsrep_cluster_size"' Variable_name Value wsrep_cluster_size 3
What happens when we start a third Galera container?
$ docker run -d --name galera4 --net galera -e affinity:galera!=setup1 \ --label galera=setup1 erkules/galera:basic \ --wsrep-cluster-address=gcomm://galera1,galera2,galera3 docker: Error response from daemon: unable to find a node that satisfies galera!=setup1. See 'docker run --help'
It failed! Very good. Having only three Docker nodes there was no machine left to start a forth Galera container.
For Planetmysql: MySQL g
Galera und Docker Network
Using Docker to run Galera on multiple nodes is quite a mess as described here. It is possible but no fun at all. As Docker does NATing every bidirectional setup is complicated.
Starting with Docker Network (version 1.9) you can simply span a Docker owned network (multiple of them) over multiple nodes. These networks are separated as the known bridged network Docker uses by default. It also provides a simple node/container discovery using the --name switch. Feels like a simple DNS.
Let's have a look how easy it is to deploy a Galera cluster. It is not for production. I use my own Docker image. It is just to play around.
There are our nodes:
- Swarm1 IP=172.17.152.11
- Swarm2 IP=172.17.152.12
- Swarm3 IP=172.17.152.13
At the start we still a network. We are going to name it galera.
$ DOCKER_HOST=172.17.152.11:2375 docker network ls --filter "name=galera" NETWORK ID NAME DRIVER
Creating a network is .. easy:
$ DOCKER_HOST=172.17.152.11:2375 docker network create -d overlay galera b0cadfa914206c212cce0de611d500620cd07bcae289841f7dc03c26d19b6e91 $ DOCKER_HOST=172.17.152.11:2375 docker network ls --filter "name=galera" NETWORK ID NAME DRIVER b0cadfa91420 galera overlay
Every other node is part of this network too:
$ DOCKER_HOST=172.17.152.12:2375 docker network ls --filter "name=galera" NETWORK ID NAME DRIVER b0cadfa91420 galera overlay $ DOCKER_HOST=172.17.152.13:2375 docker network ls --filter "name=galera" NETWORK ID NAME DRIVER b0cadfa91420 galera overlay
So let's bootstrap a Galera Cluster. Remind the --net switch also. As you see we use the hostname to connect the cluster :)
$ DOCKER_HOST=172.17.152.11:2375 docker run -d --name galera1 \ --net galera erkules/galera:basic --wsrep-cluster-address=gcomm:// d3410d308171df5a3ef2da3b37a7d11ea6479dc8550eea24447d488cbf490a0d $ DOCKER_HOST=172.17.152.12:2375 docker run -d --name galera2 \ --net galera erkules/galera:basic --wsrep-cluster-address=gcomm://galera1 24d4a2dfe3e2ba6914c83c92f814116ff72d5de8daba8304fe89ed3661f57270 $ DOCKER_HOST=172.17.152.13:2375 docker run -d --name galera3 \ --net galera erkules/galera:basic --wsrep-cluster-address=gcomm://galera1 751f4f071359495a28e31bc996b62c2042283ee08df0194b7871402e2f851e06
Let's see if everything went fine:
$ DOCKER_HOST=172.17.152.13:2375 docker exec -ti galera3 \ mysql -e 'show status like "wsrep_cluster_size"' +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+
This is a quite simple and insecure example. Docker Network will change also the way we build simple MySQL replication setups. If you used to use --links. Get rid of it. As they are deprecated already in favor of Docker Network. Next time we are going to use Docker Swarm (on top of Docker Network) to deploy a Galera Cluster.
Viel Spaß :)
Ahoi Im giving two Talks at DevOpsCon
Docker is kinda awesome, as it releases a lot of creativity and let
us rethink infrastructure. Think about upgrading an application(docker container). We just stop the old one and start the new container. Rollback is easy as stopping the new container and starting from the old image.
Let’s have a look at nginx. Within the Docker ecosystem, in a world where the backends come and go. You profit from writing the nginx configuration in a dynamic way. Most likely using confd or consul-template.
After that you stop the container and start it new from the image.
Sending nginx a SIGHUP would have told it to simply reread the configuration without stopping it by spawning a new process.
Nginx even has a nice trick to upgrade. Sending a SIGUSR2 nginx spawns a new process with the new binary.
In a standard Docker workflow you don’t use this features.
Using master-master for MySQL? To be frankly we need to get rid of that architecture. We are skipping the active-active setup and show why master-master even for failover reasons is the wrong decision.
So why does a DBA thinks master-master is good for in a failover scenario?
- The recovered node does get his data automatically.
- You need not to use a backup for recovery.
Please remember: MySQL Replication is async
Again: MySQL Replication is async. Even the so called semi-sync Replication!
So following is quite likely.
See a nice master-master setup:
activ standby +------+ c +------+ | |------------->| | |abcd | |ab | | | | | | |<-------------| | +------+ +------+
Oh my god the node went down:
RIP activ +------+ +------+ | |-----||------>| | |abcd | |abc | | | | | | |<----||-------| | +------+ +------+
Np, we’ve got master-master. After the takeover the recovering node fetches up. (As a fact it has one transaction more:( )
recovered activ +------+ +------+ | |------------->| | |abcd | |abce | | | e | | | |<-------------| | +------+ +------+
Great we got no sync data anymore!
recovered activ +------+ +------+ | |------------->| | |abcde | |abce | | | | | | |<-------------| | +------+ +------+
As a fact there is no need for master-master anyway. We’ve got GTID nowadays. Use a simple replication. In a failover you can use GTID to check if you got extra transactions on the recovering node.
If not then simply create a replication and you get all the missing data.
But if there are extra transactions on the recovering node you got to rebuild the node anyway.
FYI: This works with GTID@MariaDB and GTID@MySQL.
Welcome to the GTID era! \o/