- Part I was an introduction to some basic knowledge needed to run Redis in production.
- Part II talked about Sentinel and Cluster a bit, then dove into standing up Sentinel for production.
- Part III we finished up Sentinel and tested it out
- Part IV we blew all that away and stood up Cluster.
And now we’re going to pause for a minute on talking strictly Redis, and reflect on a few things you should do to run any of this in production. If you read the previous posts you know we used Docker Swarm to orchestrate all of this. Using Docker is a great tool to help you standardize the way things are deployed. But when you deploy a tool like Redis to production you need to make sure that your servers are ready for it.
Quick story time. We were migrating our websites from using Microsoft’s NLB to using HaProxy, this was about a year and a half ago. I had done tonnes of research on HaProxy and configured the application in a way that I was positive we could hand hundreds of thousands of requests. In short, I was wrong. Becuase while the Haproxy Config was fine, the Kernel configuration was not. Since that time I have rolled out a ‘standardized’ sysctl.conf to all of my servers (We use Ubuntu 16.04).
I highly suggest you do the same.
Redis will throw up some warnings to you right off the bat if you dont have things configured correctly:
Notice those lines towards the bottom?:
- TCP Backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128
- overcommit_memory is est to 0! … To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’
- WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command ‘echo never > /sys/kernel/mm/transparent_hugepage/enabled’ as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
I’d go ahead and add a few more:
- Do you have swapping enabled? If so, disable it. Provision your servers appropriately to avoid swapping. If you cannot, set swappiness to a low value.
- Increase your fs.file-max and fs.nr_open configs
Here is what my /etc/sysctl.conf looks like:
Ok so that takes care of the server, but what about Redis itself? It has a large number of tune-able knobs. The big ones I suggest you take a look at are:
- DB Backups / AOF
- Replication Tuning
- Max Memory Policy
DB Backups / AOF
These settings control how often Redis is going to try and save changes from memory to disk. This is one area where Redis will stray from being a single threaded app. There are two primary methods for this. RDB Dumps and AOF (Append only File). RDB Dumps will save the entire in memory database to disk. Whereas AOF Appends changes, as the name suggests. This is one area where your use case will strictly dictate the configuration. If you’re intending to use Redis as a transient cache that can be easily restored, then you dont really need to save anything. However if you’re using Redis to track real time state for an application, you probably don’t want to loose any writes.
I currently use RDB Dumps only for my Redis Sentinel setup. But as we move to Redis Cluster (which requires some application changes as you’ll see later) I’m going to implement AOF.
My setting is:
This says, perform and RDB dump every 14400 seconds if there has been at least 1 change. In my case there will have been hundreds of millions of changes in that time frame. The reason we chose this method is b/c we have multiple Replicas online ready to take over. In order for us to be full down with Redis it would require multiple VMs spread across multiple Virtual Hosts to suddenly come crashing to a halt; frankly if this were to happen whether or not Redis is running is like number 12 on the recovery checklist. If I’m being honest I probably don’t even need to persist to disk.
Replication
When configuring replication you need to take into account your peak times. More than once I have seen a slave get disconnected during peak traffic only to fall into an endless replication loop where it fails to replicate, disconnects, reconnects starts again, etc. etc. This puts a heavy burden on the master because before replication can begin the master performs a fork() and bgsave operation. Having these running against a largish dataset ever 45 seconds really has a negative impact on throughput.
In my environment all my hosts are connected via 10GB networking, with even larger pipes running between virtual hosts. I also have equipped Redis with as little slow disk as possible. These two things combine to form my suggestion of using diskless replication:
If you really want to learn more about Redis replication, and you should if you’re running in production, I suggest you read the docs: https://redis.io/topics/replication
Max Memory Policy
I suggest you configure a maxmemory setting of ~ 85% of the memory available on the system. If you’re running in Docker you should really do some homework and determine how large your cache should be, and then reserve that amount of memory on the Docker host.
It is also important to consider implementing a maxmemory-policy that takes volatile keys into account. What are volatile keys? They are keys that have some form of expiration on them, and they are generally the first to go when we start reaching max memory. Copying straight from the config file:
Perhaps the default of volatile-ttl works for you, but again this all depends on your use case.