I work(ed) at Synovia Solutions LLC. creators of the Silverlining Fleet Management software and Here Comes The Bus. Our solution installs hardware devices on vehicles that then report back over cellular to us. During peak times we are processing about 3000 messages / second over UDP.
Our current system includes a monolithic windows service that handles pretty much all aspects of message processing. Its written in .NET (currently 4.6.1) and runs on several physical machines located in a local Data Center. It uses SQL Server as a backend data store.
When I was brought on board one of my primary tasks was to migrate the existing queuing infrastructure, several Sql Server Tables, into a new queuing solution. We chose RabbitMq via the hosted provider CloudAMQP. This was a pretty new paradigm for me, as I had never worked with anything other than MSMQ (GAG!) .
After the initial implementation of Rabbit was written we discovered a show stopper. To explain that I’ll need to cover a bit more on how this all works.
You see the devices on the vehicle communicate over UDP only, but once we receive a message and persist it we have to send an ACK message back to the device. If the device doesnt receive this ACK within 30 seconds it will retransmit the same mesasge. With our existing infrastructure already strained, several times we found ourselves falling behind processing inbound messages, as both the number of incoming messages and the average processing times increased, during peak hours, we hit critical mass. If we had 3k messages in the queue to persist, and persisting was taking upwards to 10ms, devices would begin re transmitting messages, which, at this point are duplicates, and our queue would snowball.
This problem was only made worse by the fact that if the vehicle turned off before all of its data was sent and ack’d the data would reside on the device until the next time the vehicle’s ignition was turned on, at which point it would again resume trying to send out. This was usually during another peak time and so the cycle continued.
When we introduced hosted RabbitMq this problem got worse, because now we have at least a 25ms round trip time from our data center to the AWS data center where CloudAMQP was hosted. We could have opted to host RabbitMq ourselves, but lacking a dedicated Sys Admin this just wasnt in the cards.
It was around this time that Dot Net Core was in beta and we were looking at migrating our ‘Listener’ infrastructure into AWS to eliminate the 25MS round trip time, and move forward with RabbitMq.
We had the idea to take another step, and write a Listener Microservice. At the time I was really torn between using NodeJs, Python, Java or risking it and using the Beta version of Dot Net Core. The main requirements where:
*Be cross platform, specifically it had to run on Linux (Ubuntu).
*Be really freakin fast
*Accept messages, persist them, and Ack them
*Live behind a load balancer.
Thats it, small and fast. That last one though, the load balancer, yeah there wasnt much in the way of UDP load balancing when we started the project. AWS didnt support UDP, NGINX didnt and the only one I found at the time was Loadbalancer.org. Once I was about halfway done NGINX released their UDP load balancing. AWS still doesnt support it.
At the end of the day we went with the following stack:
*Loadbalancer.org for balancing
*Dot Net Core
*Redis (for syncing sequence numbers across all instances)
There is still a slight bottleneck when persisting to RabbitMq b/c we have to utilize CONFIRMS to ensure the message is persisted before we can send the ACK. The average time from start to finish is between 5 – 7ms.
That may not sound like much of an improvement, but it doesnt tell the whole story either. Thats the time to process a single message, on a single instance on a single thread. Its about 200 messages / second.
But when I use a typical Producer Consumer model with 25 processing threads we can hit 5000 messages / second, and do so without incurring any additional latency b/c RabbitMq is just awesome. At that rate here is my cpu utilization (from DataDog) over the last 24 hours:
Needless to say we arent even scratching the surface of what this thing can do. Of course we have a few instances running behind the LB.ORG appliance, so we can easily handle 30k messages / second.
All of this running on a T2.Medium instance with 2vPUs and 4GB of RAM, costing ~ $300 / year / instance to run 24/7. We could save even more money utilizing Spot Instances for peak times, but we just dont need it right now.
At the end of the day it has been a pretty awesome experience learning all of these new technologies. RabbitMq is amazing. But I have to give an enormous shout out to the *\*Dot Net Core**** team and MSFT. What they have done is really going to shake shit up in the development world.
Gone are the days of going to a Hackathon and being laughed at for not rocking Ruby or NodeJs. C# is an incredibly powerful language and now that it is truly cross-platform I think we are going to start seeing a major paradigm shift in the open-source world.
Some will say yeah, Java has been doing that for ever, and I get that. Java is great. But what Java lacked, at least in comparison to .NET / C#, was Visual Studio. VS is, in my humble opinion, hands down the best development environment in existence.
Seriously, you couple VS with JetBrains ReSharper and you have a code churning productivity machine. Now add in Docker for windows and I can prototype and hack on a level never before possible in this world; and on a level that is, at the very least, equal to that of any other language. I would probably even say it is superior.