Benchmarking Galera Cluster

| Keine Kommentare | Keine TrackBacks

What is it about?

I used to do some benchmarkstuff and blogged about it on my blog written in German. Im going to do testings and benchmarkings again:)

We are going to have a look into ‘benchmarking’ a 3-node Galera Cluster. The application (sysbench) is on a separate node accessing one node of the cluster. This would be the case in a i.e. VIP setup.


3 Galera Nodes

  • Virtual machines (OpenStack) provided by
  • VCPU: 4
  • RAM: 4GB
  • OS: Centos 6.4-x86_64
  • MySQL-server-5.6.14wsrep25.1-1.rhel6.x86_64
  • galera-25.3.2-1.rhel6.x86_64

Separate sysbench node

  • Same specs as the Galera nodes
  • sysbench 0.5
  • oltp test on 5 tables 1000000 rows each (ca. 1.2GB)
  • A run took 60 seconds

MySQL Config

user                          = mysql
binlog_format                 = ROW
default-storage-engine        = innodb

innodb_autoinc_lock_mode      = 2
innodb_flush_log_at_trx_commit= 0
innodb_buffer_pool_size       = 2048M
innodb_log_buffer_size        = 128M
innodb_file_per_table         = 1

query_cache_size              = 0
query_cache_type              = 0
bind-address                  =

init_file                     = /etc/mysql/init
max_connections               = 2000

# Galera

wsrep_provider                = "/usr/lib64/galera/"
wsrep_cluster_name            = deadcandance
wsrep_cluster_address         = "gcomm://$useyourown"/
wsrep_slave_threads           = 
wsrep_certify_nonPK           = 1
wsrep_max_ws_rows             = 131072
wsrep_max_ws_size             = 1073741824
wsrep_sst_method              = rsync



We are running in a hypervisor (and OpenStack) setup. Testing is in a way not reliable. Not only because of the hypervisor. We don’t know how the host, storage and network resources are consumed by other users also. So small variances are statistically irrelevant.

1. test: We use different settings for wsrep_slave_threads

for i in 1 4 8 16 24 32 48 64; do set wsrep_slave_threads=$i and run; done

galera compared

This surprised me as in another test I had different results. Im not sure if it is the oltp test or the “hardware” making a change of wsrep_slave_threads some kind of useless.

2. Test: Setting gcs.fc_limit to 512 (instead the default 16)

We could tune the replication part. See Flow Control.

galera flow_control

Ahh ok this helped. And in our setup it is fine to play with that settings. (Yes there are more. read the link :) But it is true? How does our Flow Control behaved lets hava a look at the WSREP_FLOW_CONTROL_PAUSED status variable:

galera flow_control

Ok there you see the cluster wasn’t paused that often anymore. But the values are still to high. We are going to have a look at this setting in future tests. Right now it is quite likely the machines couldn’t catch up. ‘

3. Test

Now we take one of the Galera runs and compare them with:

  • A stand alone MySQL having the same configuration.
  • A stand alone MySQL with sync_binlog und innodb_flush_log_at_trx_commit=1 set.
  • A stand alone MySQL with sync_binlog und innodb_flush_log_at_trx_commit=1 set with a Semisynchronous Replication running.

Semisynchronous Replication is often used for HA setups. The argument is to make sure the data is on (one) slave at least. As a fact this is wrong. But this is the use case.

galera flow_control

  • We see the Galera Replication ‘overhead’
  • We see the performance drop (overhead) to get some local storage consistency. But still we see Group Commit doing a good job in scaling.
  • We see the Semisynchronous Replication ‘overhead’

Lets see another graph comparing two different Galera runs with the Semisynchronous Replication run.

galera flow_control

Make up your own mind.


  • Galera is faster.
  • Galera is virtual synchron.
  • Galera easy Fail Over implementations because of the Mulit-Master technique.

Fake Semisync

Even it looks like Semisynchronous Replication is good for setups with a higher concurrency. Lets have a look at the RPL_SEMI_SYNC_MASTER_NO_TX status variable I monitored while doing the test.

galera flow_control

So it was no Semisynchronous Replication all the time. It switched back to asynchronous Replication. So Semisynchronous Replication could’t catch the workload either. Dropping back into Asynchronous Replication broke the consistency of the Data in the cluster. Thats where Galera reduce the performance (still higher than Semisynchronous Replication) to provide this consistency :)

Ok thats my friend the end

  • We had a simple setup
  • Different setups, distributions and ‘Hardware’ is going to be used.
  • If you had some ideas, feel free to ping/mail me.
  • As Im missing real(tm) hardware. Feel free to make me happy providing me access to that real hardware:)

Viel Spaß

Erkan :)

Keine TrackBacks


Jetzt kommentieren

Über diese Seite

Diese Seite enthält einen einen einzelnen Eintrag von erkan vom 27.01.14 22:46.

Galera Phrases ist der vorherige Eintrag in diesem Blog.

Talks first half of this year ist der nächste Eintrag in diesem Blog.

Aktuelle Einträge finden Sie auf der Startseite, alle Einträge in den Archiven.


Powered by Movable Type 4.23-en