Or, How to Run Multiple Instances of Redis on One Machine.
Why would you even want to do something like this? Well Redis is a single threaded application. So if you have a server with 8 cores and it runs Redis, only 1 of those cores will ever be used by Redis.
Yes, there are some cases, such as bgsave, where this is not true.
By running multiple instances on the same machine and pinning each instance to specific CPU core you can better utilize the Cores to more quickly serve data.
To accomplish this I use :
taskset -c N
Here is an example from my init.d file that I use to run Redis on my Ubuntu 16.04 machine:
Its pretty simple, instead of calling redis-server directly you first call /usr/bin/tasket and then pass in the proper arguments. If you were to type the full command out it would look like this:
taskset -c 0 redis-server /etc/redis/redis.conf
#this will use taskset to launch an instance of redis server and pin it #to core 0 on the server
The full file is below
willd@myserver.local@ORDRedis1:~$ cat /etc/init.d/redis_6380
#!/bin/sh
#Configurations injected by install_server below....
EXEC=/usr/bin/taskset
CLIEXEC=/usr/local/bin/redis-cli
PIDFILE=/var/run/redis_6380.pid
CONF="-c 1 /usr/local/bin/redis-server /etc/redis/6380.conf"
REDISPORT="6380"
###############
# SysV Init Information
# chkconfig: - 58 74
# description: redis_6380 is the redis daemon.
### BEGIN INIT INFO
# Provides: redis_6380
# Required-Start: $network $local_fs $remote_fs
# Required-Stop: $network $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Should-Start: $syslog $named
# Should-Stop: $syslog $named
# Short-Description: start and stop redis_6380
# Description: Redis daemon
### END INIT INFO
case "$1" in
start)
if [ -f $PIDFILE ]
then
echo "$PIDFILE exists, process is already running or crashed"
else
echo "Starting Redis server..."
$EXEC $CONF
fi
;;
stop)
if [ ! -f $PIDFILE ]
then
echo "$PIDFILE does not exist, process is not running"
else
PID=$(cat $PIDFILE)
echo "Stopping ..."
$CLIEXEC -p $REDISPORT shutdown
while [ -x /proc/${PID} ]
do
echo "Waiting for Redis to shutdown ..."
sleep 1
done
echo "Redis stopped"
fi
;;
status)
PID=$(cat $PIDFILE)
if [ ! -x /proc/${PID} ]
then
echo 'Redis is not running'
else
echo "Redis is running ($PID)"
fi
;;
restart)
$0 stop
$0 start
;;
*)
echo "Please use start, stop, restart or status as first argument"
;;
esac
So far we have covered a bit of ground in learning how to configure both Redis Sentinel and Redis Cluster for production. Now that we have an understanding of the server, and a running setup, its time to start digging into using Redis.
It would be rather tedious and redundant to start going through all the various Redis Data types and regurgitating what is already in the documentation.
Instead, its time to start writing some code. For this, I’ll be using C# and StackExchange.Redis as my library. I’ll be using visual studio (b/c I’m addicted to Resharper) but you can just as easily install VSCode to follow along.
As far as I can tell there are going to be a couple of, rather generic, use cases .
A backend application, maybe a windows service, or docker container.
A website
If you’re creating a website, and you’re using IIS, you may want to persist session state to Redis so as to make load balancing a snap. Thats pretty simple with the AspNet Redis Session State Provider.
So that will be our first example.
To get started you’ll want to create a new website project, here is what I selected in VS:
Contemporary practices would dictate that our web apps should be stateless. I’m not disagreeing at all, this is just a quick example of how to use the Redis State provider.
Now when we open up our web.config we can see that some <sessionState> entries have been added, one is an example of all the configs and is commented out, the other is the actual entry:
<sessionState mode="Custom" customProvider="MySessionStateStore">
<providers>
<!-- For more details check https://github.com/Azure/aspnet-redis-providers/wiki -->
<!-- Either use 'connectionString' OR 'settingsClassName' and 'settingsMethodName' OR use 'host','port','accessKey','ssl','connectionTimeoutInMilliseconds' and 'operationTimeoutInMilliseconds'. -->
<!-- 'throwOnError','retryTimeoutInMilliseconds','databaseId' and 'applicationName' can be used with both options. -->
<!--
<add name="MySessionStateStore"
host = "127.0.0.1" [String]
port = "" [number]
accessKey = "" [String]
ssl = "false" [true|false]
throwOnError = "true" [true|false]
retryTimeoutInMilliseconds = "5000" [number]
databaseId = "0" [number]
applicationName = "" [String]
connectionTimeoutInMilliseconds = "5000" [number]
operationTimeoutInMilliseconds = "1000" [number]
connectionString = "<Valid StackExchange.Redis connection string>" [String]
settingsClassName = "<Assembly qualified class name that contains settings method specified below. Which basically return 'connectionString' value>" [String]
settingsMethodName = "<Settings method should be defined in settingsClass. It should be public, static, does not take any parameters and should have a return type of 'String', which is basically 'connectionString' value.>" [String]
loggingClassName = "<Assembly qualified class name that contains logging method specified below>" [String]
loggingMethodName = "<Logging method should be defined in loggingClass. It should be public, static, does not take any parameters and should have a return type of System.IO.TextWriter.>" [String]
redisSerializerType = "<Assembly qualified class name that implements Microsoft.Web.Redis.ISerializer>" [String]
/>
-->
<add name="MySessionStateStore" type="Microsoft.Web.Redis.RedisSessionStateProvider" host="" accessKey="" ssl="true" />
</providers>
</sessionState>
Using the example comment as a guide, update the entry labeled “MySessionStateStore” to point to your redis server. If you have followed along with the previous tutorials, or if this is running in production, you’ll probably have multiple redis servers. You can input a valid Connection String based upon the specification in the StackExchange.Redis library.
So something like this as the connection string
redis0:6379,redis1:6380,redis2:6381
Frankly this example is a bit antiquated. Most of us these days will be using a more modern web framework, such as MVC or WebAPI.
We’ll move onto another web example where we actually write some code next time.
Part I was an introduction to some basic knowledge needed to run Redis in production.
Part II talked about Sentinel and Cluster a bit, then dove into standing up Sentinel for production.
Part III we finished up Sentinel and tested it out
Part IV we blew all that away and stood up Cluster.
And now we’re going to pause for a minute on talking strictly Redis, and reflect on a few things you should do to run any of this in production. If you read the previous posts you know we used Docker Swarm to orchestrate all of this. Using Docker is a great tool to help you standardize the way things are deployed. But when you deploy a tool like Redis to production you need to make sure that your servers are ready for it.
Quick story time. We were migrating our websites from using Microsoft’s NLB to using HaProxy, this was about a year and a half ago. I had done tonnes of research on HaProxy and configured the application in a way that I was positive we could hand hundreds of thousands of requests. In short, I was wrong. Becuase while the Haproxy Config was fine, the Kernel configuration was not. Since that time I have rolled out a ‘standardized’ sysctl.conf to all of my servers (We use Ubuntu 16.04).
I highly suggest you do the same.
Redis will throw up some warnings to you right off the bat if you dont have things configured correctly:
vagrant@redis1:~$ sudo docker run -it redis:latest
1:C 21 Jan 2019 01:33:22.891 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 21 Jan 2019 01:33:22.893 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 21 Jan 2019 01:33:22.893 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.3 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 21 Jan 2019 01:33:22.894 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 21 Jan 2019 01:33:22.894 # Server initialized
1:M 21 Jan 2019 01:33:22.894 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 21 Jan 2019 01:33:22.894 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 21 Jan 2019 01:33:22.894 * Ready to accept connections
Notice those lines towards the bottom?:
TCP Backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128
overcommit_memory is est to 0! … To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command ‘echo never > /sys/kernel/mm/transparent_hugepage/enabled’ as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
I’d go ahead and add a few more:
Do you have swapping enabled? If so, disable it. Provision your servers appropriately to avoid swapping. If you cannot, set swappiness to a low value.
Ok so that takes care of the server, but what about Redis itself? It has a large number of tune-able knobs. The big ones I suggest you take a look at are:
DB Backups / AOF
Replication Tuning
Max Memory Policy
DB Backups / AOF
These settings control how often Redis is going to try and save changes from memory to disk. This is one area where Redis will stray from being a single threaded app. There are two primary methods for this. RDB Dumps and AOF (Append only File). RDB Dumps will save the entire in memory database to disk. Whereas AOF Appends changes, as the name suggests. This is one area where your use case will strictly dictate the configuration. If you’re intending to use Redis as a transient cache that can be easily restored, then you dont really need to save anything. However if you’re using Redis to track real time state for an application, you probably don’t want to loose any writes.
I currently use RDB Dumps only for my Redis Sentinel setup. But as we move to Redis Cluster (which requires some application changes as you’ll see later) I’m going to implement AOF.
My setting is:
save 14400 1
This says, perform and RDB dump every 14400 seconds if there has been at least 1 change. In my case there will have been hundreds of millions of changes in that time frame. The reason we chose this method is b/c we have multiple Replicas online ready to take over. In order for us to be full down with Redis it would require multiple VMs spread across multiple Virtual Hosts to suddenly come crashing to a halt; frankly if this were to happen whether or not Redis is running is like number 12 on the recovery checklist. If I’m being honest I probably don’t even need to persist to disk.
Replication
When configuring replication you need to take into account your peak times. More than once I have seen a slave get disconnected during peak traffic only to fall into an endless replication loop where it fails to replicate, disconnects, reconnects starts again, etc. etc. This puts a heavy burden on the master because before replication can begin the master performs a fork() and bgsave operation. Having these running against a largish dataset ever 45 seconds really has a negative impact on throughput.
In my environment all my hosts are connected via 10GB networking, with even larger pipes running between virtual hosts. I also have equipped Redis with as little slow disk as possible. These two things combine to form my suggestion of using diskless replication:
If you really want to learn more about Redis replication, and you should if you’re running in production, I suggest you read the docs: https://redis.io/topics/replication
Max Memory Policy
I suggest you configure a maxmemory setting of ~ 85% of the memory available on the system. If you’re running in Docker you should really do some homework and determine how large your cache should be, and then reserve that amount of memory on the Docker host.
It is also important to consider implementing a maxmemory-policy that takes volatile keys into account. What are volatile keys? They are keys that have some form of expiration on them, and they are generally the first to go when we start reaching max memory. Copying straight from the config file:
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
#
# volatile-lru -> Evict using approximated LRU among the keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU among the keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key among the ones with an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
#
# LRU means Least Recently Used
# LFU means Least Frequently Used
#
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.
#
# Note: with any of the above policies, Redis will return an error on write
# operations, when there are no suitable keys for eviction.
#
# At the date of writing these commands are: set setnx setex append
# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
# getset mset msetnx exec sort
#
# The default is:
#
maxmemory-policy volatile-ttl
Perhaps the default of volatile-ttl works for you, but again this all depends on your use case.
In Part III we finished setting up Redis Sentinel and performed a test by shutting down the Primary and watching as Sentinel failed over to one of the Secondaries. In this article we are going to throw all of that away and start fresh, only this time we will deploy Redis Cluster to our 3 nodes.
If you are following along with the Vagrant setup described in Part II then now is the time to run vagrant destroy.
This will wipe out the current setup. Once that is done you need to start it all back up with vagrant up
Once the new environment is built, go ahead and finish setting up Docker swarm, again described in Part II.
Now that is done, we should have 3 machines, redis1, redis2, and redis3. Redis1 should be a docker swarm Manager node with 2 and 3 joined as Workers. Lets get started building out our cluster.
Like I mentioned in the previous post, it’ll be best if we define our own image. We started that work with the redis-sentinel image. We’re going to update things based on that. To start create a new file called redis-cluster.conf, and populate with this:
Now lets build the image and push it up to docker hub
docker build -t wjdavis5/redis_wordpress:latest .
#you might notice I changed the name from the previous tutorial
docker push wjdavis5/redis_wordpress:latest
Now lets start off by creating our 3 Primaries on the Docker Swarm:
These identify the master ids in the redis cluster, we’ll need them for later. We are going to configure the instances as follows:
You’ll notice that with each instanced pinned to a Docker Host, and the Replica for a Master being on a different machine, we afford ourselves some fault tolerance.
Ok, now on to configuring the replicas. First we need to create the Docker Services:
Once those are running we need to connect to Redis1 and issue the commands to join the replicas to the masters:
This is the part where you’ll need the master ids from the previous output. Here is the redis-cli help that may be of use:
wjd@DESKTOP-J7C5B0A:/mnt/c/Users/willi/redi/redis-5.0.3/src$ ./redis-cli --cluster help
Cluster Manager Commands:
add-node new_host:new_port existing_host:existing_port
--cluster-slave
--cluster-master-id <arg>
help
For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.
And now the commands to initiate the replicas.
sudo docker run -it redis:latest redis-cli --cluster add-node 192.168.10.123:6382 192.168.10.123:6379 --cluster-slave --cluster-master-id 3bf86faf87cc78977516fb2e0da5f0986026d892
sudo docker run -it redis:latest redis-cli --cluster add-node 192.168.10.124:6383 192.168.10.123:6379 --cluster-slave --cluster-master-id cfa19d648dd63c3993daa66b011df2d85dc7a5ac
sudo docker run -it redis:latest redis-cli --cluster add-node 192.168.10.125:6384 192.168.10.123:6379 --cluster-slave --cluster-master-id aaeab6aa905df8bfac214651af1b424a4cebee60
#example output:
vagrant@redis1:~$ sudo docker run -it redis:latest redis-cli --cluster add-node 192.168.10.125:6384 192.168.10.123:6379 --cluster-slave --cluster-master-id aaeab6aa905df8bfac214651af1b424a4cebee60
>>> Adding node 192.168.10.125:6384 to cluster 192.168.10.123:6379
>>> Performing Cluster Check (using node 192.168.10.123:6379)
M: aaeab6aa905df8bfac214651af1b424a4cebee60 192.168.10.123:6379
slots:[0-5460] (5461 slots) master
S: 36afb4e506bef2245e192a83abd856fdea252b34 192.168.10.124:6383
slots: (0 slots) slave
replicates cfa19d648dd63c3993daa66b011df2d85dc7a5ac
M: cfa19d648dd63c3993daa66b011df2d85dc7a5ac 192.168.10.125:6381
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: c4f9af7131f8dc89f0cd6f493c2e8c6b15d0ddf0 192.168.10.123:6382
slots: (0 slots) slave
replicates 3bf86faf87cc78977516fb2e0da5f0986026d892
M: 3bf86faf87cc78977516fb2e0da5f0986026d892 192.168.10.124:6380
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.10.125:6384 to make it join the cluster.
Waiting for the cluster to join
>>> Configure node as replica of 192.168.10.123:6379.
[OK] New node added correctly.
We now have a fully functional Redis Cluster running on Docker Swarm with 3 Masters and 1 Replica / Master spread across the swarm in a fault tolerant way.
In the next post we’ll cover some basic Kernel Tuning that is essential to hosting Redis (or really any application) in Production.
In part 2 we looked at the 2 different ways Redis is setup in production, Cluster and Sentinel, and we started working towards configuring a Redis Sentinel Setup.
We’re going to continue with Configuring Redis Sentinel using our already running Redis1, Reds2, and Redis3 machines.
The next step in the process is to setup the Redis-Sentinel services in Docker. We are going to Pin our services to specific nodes to ensure we don’t ever schedule multiple Redis-Sentinel instances on the same node.
However this time we need to provide a configuration to our sentinel setup. There are several ways we could do that, however, the best solution in terms of future maintainability is to just go ahead and build your own Docker image with your configuration baked in. Once you have done that you can use Docker Hubs CI/CD pipeline to fully automate things. The contents of that thought could easily encompass a full post. So we’ll come back to that part.
For now using your favorite tool (I’m using VS Code) lets get a new project going. If you’re using VS Code create a directory where you like to store projects, open code and select Open Folder, selecting your new folder. I then like to “Save Workspace” into the same folder so I can easily launch Code back into this project.
Once you have the workspace setup create a new file called: Dockerfile
Notice the lack of file extension? You can name it something else like Redis-Sentinel.Dockerfile if you’d like
Next create another file called redis-sentinel.conf and give it these contents:
#https://redis.io/topics/sentinel
#You can read the documentation at that link
port 26379
sentinel monitor mymaster 192.168.10.117 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
The 1st line is the url for the sentinel docs so you can get all smart
The 3rd line tells redis to start on port 26379
The 4th line defines the master we want our sentinel to monitor, it has the Ip, port, and quorum configuration at the end. The Quorum is the number of sentinels that must agree to imitate a fail over.
Next is how long a master must fail PING checks before being considered down
Next is how long we’ll wait before Sentinel will try to fail over the same master again.
parallel-syncs sets the number of slaves that can be reconfigured to use the new master after a failover at the same time. The lower the number, the more time it will take for the failover process to complete, however if the slaves are configured to serve old data, you may not want all the slaves to re-synchronize with the master at the same time
Now update your Dockerfile with these contents:
FROM redis:5
COPY redis-sentinel.conf /usr/local/etc/redis/redis-sentinel.conf
#CMD [ "redis-server", "/usr/local/etc/redis/redis.conf" ]
Notice a couple things here:
I specify redis:5 – using a specific version instead of latest is a must for production
We copy the configuration file to the container
The CMD is for my reference later when we create the containers. We’ll eventually update the redis-server installation to use this custom image as well and I want to be able to use the same image for both.
I have also created a .dockerignore file with this in it: .vagrant
This prevents the vagrant stuff from being sent to the docker daemon during build.
Now that we have our Dockerfile built and our redis-sentinel.conf file ready we can build the image
This isnt a docker tutorial, but in short this command builds an image tagged as “wjdavis5/redis_sentinel_wordpress” based on the current directory “.”
Now we can move back over to our Docker host and create our services.
sudo docker service create --name redis_sentinel1 --constraint node.hostname==redis1 --hostname redis_sentinel1 --mode global --publish published=26379,target=26379,mode=host wjdavis5/redis_sentinel_wordpress:latest redis-sentinel /usr/local/etc/redis/redis-sentinel.conf
sudo docker service create --name redis_sentinel2 --constraint node.hostname==redis2 --hostname redis_sentinel2 --mode global --publish published=26379,target=26379,mode=host wjdavis5/redis_sentinel_wordpress:latest redis-sentinel /usr/local/etc/redis/redis-sentinel.conf
sudo docker service create --name redis_sentinel3 --constraint node.hostname==redis3 --hostname redis_sentinel3 --mode global --publish published=26379,target=26379,mode=host wjdavis5/redis_sentinel_wordpress:latest redis-sentinel /usr/local/etc/redis/redis-sentinel.conf
Now we should be able to connect to an instance of redis sentinel, using the docker run command from before, only this time we need to specify the port -p 26379
Yeah mine says 4 sentinels b/c created an additional while testing this out, you should see 3. The next thing we want to do is test failover. To do that we first want to tail the logs for redis-sentinel. We can do that with the docker logs command:
Now from another window connect to the master (should be redis1) and issue the shutdown command:
vagrant@redis1:~$ sudo docker run -it redis:latest redis-cli -h 192.168.10.117
192.168.10.117:6379> shutdown
not connected>
Now if you watch the redis-sentinel log you should see the fail over initiate
20 Jan 2019 15:01:07.934 * +sentinel-address-switch master mymaster 192.168.10.117 6379 ip 172.17.0.3 port 26379 for 08aa07e38473a554911f3aebfd720692b3b7c948
1:X 20 Jan 2019 15:08:36.236 # +sdown master mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.236 # +sdown master mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.326 # +odown master mymaster 192.168.10.117 6379 #quorum 4/2
1:X 20 Jan 2019 15:08:36.326 # +new-epoch 1
1:X 20 Jan 2019 15:08:36.326 # +try-failover master mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.330 # +vote-for-leader 08aa07e38473a554911f3aebfd720692b3b7c948 1
1:X 20 Jan 2019 15:08:36.330 # 63e82269c31d372eaee4d3bde15aea8a2c2e65f4 voted for 08aa07e38473a554911f3aebfd720692b3b7c948 1
1:X 20 Jan 2019 15:08:36.330 # d7b04497a0b905f692aea802979d030d5c73d0e9 voted for 08aa07e38473a554911f3aebfd720692b3b7c948 1
1:X 20 Jan 2019 15:08:36.330 # 08aa07e38473a554911f3aebfd720692b3b7c948 voted for 08aa07e38473a554911f3aebfd720692b3b7c948 1
1:X 20 Jan 2019 15:08:36.383 # +elected-leader master mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.383 # +failover-state-select-slave master mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.445 # +selected-slave slave 192.168.10.121:6379 192.168.10.121 6379 @ mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.445 * +failover-state-send-slaveof-noone slave 192.168.10.121:6379 192.168.10.121 6379 @ mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.516 * +failover-state-wait-promotion slave 192.168.10.121:6379 192.168.10.121 6379 @ mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.670 # +config-update-from sentinel 63e82269c31d372eaee4d3bde15aea8a2c2e65f4 172.17.0.3 26379 @ mymaster 192.168.10.117 6379
1:X 20 Jan 2019 15:08:36.670 # +switch-master mymaster 192.168.10.117 6379 192.168.10.118 6379
When we issued the shutdown command it should have terminated the docker container that was running. Because we configured this as a Docker Service when the container terminates Docker is automagically going to reschedule the container to execute again. So at this point you should be able to reconnect to redis1, run an info command, and see that it is now a secondary (slave)
In Part I we covered some very basic steps to start up a Redis server and connect to it with the redis-cli tool. This is useful information for playing around in your Dev. environment, but doesn’t help us much when its time to move to production. In this post I will start tocover the steps we took to deploy Redis to production, and keep it running smoothly.
Redis Sentinel
When its time to run in production there are generally two primary ways to offer high availability. The older way of doing this is using Redis Sentinel. In this setup you have a Primary (also called the master) and 2 or more Secondaries (also called slaves). All writes must go to the primary, data is then replicated to the secondaries over the network.
You also must have at least 3 more instances of Redis running in sentinel mode. These sentinels monitor the primary (also called the master). If they determine the master is unavailable a new master will be promoted from one of the available secondaries. All other secondaries will automatically be reconfigured to pull from the new master.
There is a bit more to it than this, but that is a sufficient explanation for right now.
Redis Cluster
In cluster mode we shard reads/writes across multiple instances, and these multiple instances also have Secondaries.
Reads and Writes are distributed across them Primaries by using a computed hash slot. Hash slots are pretty easy to understand if you want to review the Cluster Specification. But suffice it to say that a hash slot is computed based upon the key name, the hash slots are divided between primaries and it is up to the client to route to the correct instance.
Note, when I say Client, I dont mean your application, unless you plan to connect directly Redis and speak the Redis protocols. You’ll likely be using a client library like StackExchange.Redis though
Which One To Choose?
Generally speaking, I think you should just go ahead and choose Redis Cluster if you’re going to be setting this up in a production environment. It gives you the ability to scale horizontally when you need to, and honestly isnt much more difficult to setup than Sentinel. But I’ll cover the setup of both.
Configure Redis Sentinel
I’m going to continue using Docker here. And we’ll assume you have 3 nodes called redis1, redis2, and redis3. If you’re following along on your local machine you can use the following Vagrant file to get started:
Vagrant.configure("2") do |config|
config.vm.define "redis1" do |redis1|
redis1.vm.box = "bento/ubuntu-16.04"
redis1.vm.box_version = "201812.27.0"
redis1.vm.provision :shell, path: "bootstrap.sh", :args => "redis1"
end
config.vm.define "redis2" do |redis2|
redis2.vm.box = "bento/ubuntu-16.04"
redis2.vm.box_version = "201812.27.0"
redis2.vm.provision :shell, path: "bootstrap.sh", :args => "redis2"
end
config.vm.define "redis3" do |redis3|
redis3.vm.box = "bento/ubuntu-16.04"
redis3.vm.box_version = "201812.27.0"
redis3.vm.provision :shell, path: "bootstrap.sh", :args => "redis3"
end
end
And here is the bootstrap.sh file mentioned in the above config:
Now when you run vagrant up, in a few moments you’ll have 3 machines running with docker installed. I’ll be configuring a swarm, like I have in production. So next I need to init the swarm.
vagrant ssh redis1
#wait for connect
sudo docker swarm init
#copy the join command that gets output by the last command, it'll look like this
#sudo docker swarm join --token SWMTKN-1-2wwy0py488uvo6u0lhbpgqpvbhha1kd6w4k1t95uox9m0t4ln0-1fjjc15308hvndpdj8ui7lts9 192.168.10.114:2377
Now you’ll ssh into the other two and join the swarm with the command from the previous example
Now we have a running docker swarm with redis1 as our master. The next steps are to create our redis services. We’ll be pinning each redis instance to a specific node. Here are the commands to create the redis services
sudo docker service create --name redis1 --constraint node.hostname==redis1 --hostname redis1 --mode global --publish published=6379,target=6379,mode=host redis:latest
sudo docker service create --name redis2 --constraint node.hostname==redis2 --hostname redis2 --mode global --publish published=6379,target=6379,mode=host redis:latest
sudo docker service create --name redis3 --constraint node.hostname==redis3 --hostname redis3 --mode global --publish published=6379,target=6379,mode=host redis:latest
We are pinning each instance of Redis to a node b/c we dont want docker to schedule the primary and secondaries to ever be on the same docker host. That would remove some of the high availability we get with running multiple instance.
We now have 3 instances of redis running. You can test connecting to them using the examples from part 1 of this series.
docker run -it redis:latest redis-cli -h 192.168.10.117
#obviously update your ip address
The next step is to enlist redis2 and redis3 as slaves of redis1. To do that we’ll connect to each and run the slaveof command. First ssh into redis 2 and then connect to redis using the docker command above. Then run
slaveof 192.168.10.117 6379
#again, update your IP and port (if you changed the port)
Redis should respond with “OK” Repeat this step on redis 3
Once this is done you should be able to again connect to the instance of redis on redis1 and run the info replication command:
Whew! Now we have 3 instances of redis running with 1 master and 2 slaves. In the next post we’ll work on getting redis-sentinel up and running to monitor them.
I recently read an article on HBR.ORG titled “Stop Working All Those Hours” and it has really made to stop and think about how people, myself included, measure success. The general premise here is that success in the workplace is often measured by the amount of time spent in the office. Stop and think about this for a moment; is it true?
Will spending 50+ hours a week in the office really make you a better employee? Or is it a perfect demonstration of not being as efficient as you could be? Metrics are an important factor in a business, I mean you have to have some empirical data by which you compare employees against each other. But is hours worked a valid metric? Does the employee that came in on Saturday deserve recognition if they could have completed the task on Friday? Is the employee that leaves early to watch their childrens’ little league game less dedicated to his or her job?
Stop working so much!
That last question sort of hits close to home for me because I am frequently hard on myself for having to leave or miss work for any reason, but at the same time I refuse to be one of those parents that doesn’t get to see their children grow up. If time spent truly is a valid metric for measuring employees then my career progression is possibly doomed to be a slow and arduous process.
Admittedly time spent is a metric we instantly gravitate towards, I do however believe that it is, at the very least, an unreliable total picture metric. As leaders and managers we should strive to identify metrics that are specific to a position. By doing so we will be able to obtain a much better idea of how well our employees are actually performing.
As an employee I feel that it is important to closely evaluate our efficiency. If you can make yourself more efficient then perhaps you can spend a few of those hours focusing on your personal life as well. A couple more hours spent catering to yourself will go a long way in ensuring your happiness and overall well being.
I work at Synovia Solutions LLC. creators of the Silverlining Fleet Management software and Here Comes The Bus. Our solution installs hardware devices on vehicles that then report back over cellular to us. During peak times we are processing about 3000 messages / second over UDP.
Our current system includes a monolithic windows service that handles pretty much all aspects of message processing. Its written in .NET (currently 4.6.1) and runs on several physical machines located in a local Data Center. It uses SQL Server as a backend data store.
When I was brought on board one of my primary tasks was to migrate the existing queuing infrastructure, several Sql Server Tables, into a new queuing solution. We chose RabbitMq via the hosted provider CloudAMQP. This was a pretty new paradigm for me, as I had never worked with anything other than MSMQ (GAG!) .
After the initial implementation of Rabbit was written we discovered a show stopper. To explain that I’ll need to cover a bit more on how this all works.
You see the devices on the vehicle communicate over UDP only, but once we receive a message and persist it we have to send an ACK message back to the device. If the device doesnt receive this ACK within 30 seconds it will retransmit the same mesasge. With our existing infrastructure already strained, several times we found ourselves falling behind processing inbound messages, as both the number of incoming messages and the average processing times increased, during peak hours, we hit critical mass. If we had 3k messages in the queue to persist, and persisting was taking upwards to 10ms, devices would begin re transmitting messages, which, at this point are duplicates, and our queue would snowball.
This problem was only made worse by the fact that if the vehicle turned off before all of its data was sent and ack’d the data would reside on the device until the next time the vehicle’s ignition was turned on, at which point it would again resume trying to send out. This was usually during another peak time and so the cycle continued.
When we introduced hosted RabbitMq this problem got worse, because now we have at least a 25ms round trip time from our data center to the AWS data center where CloudAMQP was hosted. We could have opted to host RabbitMq ourselves, but lacking a dedicated Sys Admin this just wasnt in the cards.
It was around this time that Dot Net Core was in beta and we were looking at migrating our ‘Listener’ infrastructure into AWS to eliminate the 25MS round trip time, and move forward with RabbitMq.
We had the idea to take another step, and write a Listener Microservice. At the time I was really torn between using NodeJs, Python, Java or risking it and using the Beta version of Dot Net Core. The main requirements where:
*Be cross platform, specifically it had to run on Linux (Ubuntu).
*Be really freakin fast
*Accept messages, persist them, and Ack them
*Be stateless
*Scale automagically.
*Live behind a load balancer.
Thats it, small and fast. That last one though, the load balancer, yeah there wasnt much in the way of UDP load balancing when we started the project. AWS didnt support UDP, NGINX didnt and the only one I found at the time was Loadbalancer.org. Once I was about halfway done NGINX released their UDP load balancing. AWS still doesnt support it.
At the end of the day we went with the following stack:
*Loadbalancer.org for balancing
*Ubuntu OS
*AWS OpsWorks
*Dot Net Core
*Rabbit Mq
*Redis (for syncing sequence numbers across all instances)
There is still a slight bottleneck when persisting to RabbitMq b/c we have to utilize CONFIRMS to ensure the message is persisted before we can send the ACK. The average time from start to finish is between 5 – 7ms.
That may not sound like much of an improvement, but it doesnt tell the whole story either. Thats the time to process a single message, on a single instance on a single thread. Its about 200 messages / second.
But when I use a typical Producer Consumer model with 25 processing threads we can hit 5000 messages / second, and do so without incurring any additional latency b/c RabbitMq is just awesome. At that rate here is my cpu utilization (from DataDog) over the last 24 hours:
Needless to say we arent even scratching the surface of what this thing can do. Of course we have a few instances running behind the LB.ORG appliance, so we can easily handle 30k messages / second.
All of this running on a T2.Medium instance with 2vPUs and 4GB of RAM, costing ~ $300 / year / instance to run 24/7. We could save even more money utilizing Spot Instances for peak times, but we just dont need it right now.
At the end of the day it has been a pretty awesome experience learning all of these new technologies. RabbitMq is amazing. But I have to give an enormous shout out to the ****Dot Net Core**** team and MSFT. What they have done is really going to shake shit up in the development world.
Gone are the days of going to a Hackathon and being laughed at for not rocking Ruby or NodeJs. C# is an incredibly powerful language and now that it is truly cross-platform I think we are going to start seeing a major paradigm shift in the open-source world.
Some will say yeah, Java has been doing that for ever, and I get that. Java is great. But what Java lacked, at least in comparison to .NET / C#, was Visual Studio. VS is, in my humble opinion, hands down the best development environment in existence.
Seriously, you couple VS with JetBrains ReSharper and you have a code churning productivity machine. Now add in Docker for windows and I can prototype and hack on a level never before possible in this world; and on a level that is, at the very least, equal to that of any other language. I would probably even say it is superior.
Log4Net is a great logging extension for the .NET EcoSystem, that also supports .NET Standard / .NET Core (which you should be using if you arent).
Unless you want to really read up on the framework extensively it can be easy to fall into some performance traps. I’ve found that in many cases these performance issues are caused by less-than-desirable appender configuration. For example lets say you have a FileAppender.
When you log (assuming no other appenders are configured) your application will call into log4net, which will attempt to get a file handle, and once successful will write this to your log file, followed by releasing the lock. The point is your application is going to block on this thread whilst completing all those steps.
One solution that can be implemented easily is to just use the BufferingForwardingAppender.
You simply add this appender (which is a forwarding appender)
Now instead of writing directly to the FileAppender you will write to the BufferingForwardingAppender. Essentially this thread will drop your message in a queue. When that queue fills up (bufferSize) or if you log an event that will trigger the evaluator (warn) all messages will be written to the downstream appenders. In this case the FileAppender. This action will block the calling thread that triggered it, but writing 100 entries to the logfile in one action is much faster than writing 100 individual messages.