Interview with Andy Domeier, Director of Technology Operations at SPS Commerce. Andy discusses SPS Commerce’s migration to the public cloud, enabled by Apache Mesos, and their ongoing digital transformation toward a container-based, microservices architecture. Learn how Andy’s team uses Mesos to provide speed and flexibility for customers, and scale in the public cloud to handle busy seasons and cyber-week demands.
Hi, Andy! Thanks so much for taking the time to talk with me today!
So, give us a little bit of background about SPS Commerce.
SPS Commerce is the largest retail supply chain network that exists today. We help retailers, suppliers, and logistics firms all communicate so they can meet the demands of today’s aggressive market, providing engaging product details and accurate shipping details for the consumer, to address the industry’s short shipping windows and complex logistics.
There are a lot of interesting technology challenges in this space, but direct-to-consumer online ordering has put a huge focus on delivering items to consumers in a timely way. If you order something at 4 o’clock this afternoon and select next day delivery, that’s not a lot of time for a supplier to figure out how to get it on a truck and deliver it to your door. We help retailers and suppliers to make it happen, and make it look easy.
How long have you been using Apache Mesos?
We rolled out Mesos about two years ago. At the time, we owned our own servers and ran almost everything on-premise. We really had an orchestration and scaling challenge that we needed to figure out. We had to overprovision hardware in our datacenters to meet the demands of our peak season. But, we weren’t (and still aren’t) running at peak very often, so there was a lot of money just wasted on idle hardware on the shelf.
We needed a platform that would give us speed and flexibility during our busy seasons.
SPS Commerce’s microservices, running in the cloud on Apache Mesos, translate and communicate data between suppliers, distributors, factories, sourcing companies, third party logistics companies, and retailers so that their customers can meet the demands of today’s aggressive retail market.
How big is your cluster and what workloads do you run?
Right now we have about 235 hosts in our worker pool, and the number of applications we’re running is changing all the time. On any given day we can easily be running between 500 and 1000 different instances of applications, depending on what our loads are and what the demand is at the time.
Almost all our microservices started out as Java or Python engines that transform data and orchestrate data delivery. Many were running as individual JVMs or Python apps. We’ve moved a lot of stuff into Docker, and we want to get everything into containers at some point, because there are other benefits that go along with containerization. That being said, Mesos has let us safely move to containers at our own pace, because we still have flexibility to run things as stand alone apps.
What were the biggest challenges you encountered?
The biggest challenge when we set up Mesos was figuring out how to gain confidence with it. It was a different way for us to run our service stack, and so we needed to change our whole mindset about how to run our applications. That required a lot of cross departmental conversation and organization. We were really motivated to make the move to the public cloud. Mesos has been a great enabler for that move.
We started using the cloud to scale up when we had bursts of data, or when we were expecting a busy time like cyber week–when we were going to get a lot of traffic. We’d have direct connects out to cloud providers and then our worker pool would scale up out there. It gave us an opportunity to run instances of applications in the public cloud and start getting comfortable with that. We’ve now moved almost everything out of our physical datacenters and into the cloud. Mesos enabled us to get comfortable with the cloud, get production workloads out there, just start using it, and slowly move things over to it. It was really an enabler for us.
So you’ve gone from entirely on-premise to almost entirely in the cloud over the course of two years?
Yeah, we have.
Thanks for talking to us today! Any closing thoughts?
For us, a growth company, having a platform to scale and continue to provide our customers with a reliable solution they could trust was a key enabler for us. I was really impressed at how much Mesos enabled us to leverage the elastic capacity of the cloud to do our scaling. It ended up being a huge win for us. Mesos is letting us focus on our business and transform our tech stack.
To learn more about digital transformation, please download the Moor Insights Report: Choosing a Platform for Container Orchestration & Data Services.