Netways uses Mesos to run large scale workloads

"Snowflakes are not manageable." Thats's why Netways, an IT consulting company based in Nuremberg in Germany, uses Mesos.
"Snowflakes are not manageable." Thats's why Netways, an IT consulting company based in Nuremberg in Germany, uses Mesos.

NETWAYS (netways.de) is an IT consulting company based in Nuremberg in Germany. They specialize in consulting around open source projects, particularly in the fields of monitoring, having been a long term supporter of the Icinga 2 project. They also run a number of conferences in Germany, including the Open Source Data Centre conference (OSDC.de) and OpenStack Days Germany.

On top of their consulting work, NETWAYS also has a hosting division, running managed services for their customers, who include the Japanese e-commerce giant Rakuten (rakuten.de). As part of this division NETWAYS built a software-as-a-service platform based on Apache Mesos, to provide hosted instances of popular open source tools like Icinga 2, GitLab, RocketChat and NextCloud. I spoke with Sebastian Saemann from Netways, who leads the engineering and operations teams that built their platform.

We began by talking about the genesis of their Mesos platform. They began looking at the project around 2015, and were convinced early on that containers would be the technical basis for the service. Having already built a significant skill base around configuration management and automation, the team did consider building something themselves, but quickly came to the realisation that a hand crafted platform would not be sustainable in the long term. As Sebastian put it to me – “snowflakes are not manageable”. This then led naturally to a requirement for container orchestration, and having evaluated Docker Swarm, Kubernetes and Mesos, the team decided on Mesos as it was more mature than the others for large scale workloads, and potentially had the ability to run more than just containerised services, should they wish to expand their offering in the future.

Their current Mesos platform consists of three control nodes, each running Zookeeper, Mesos masters and Marathon, together with 20 agent nodes, currently hosting 400-500 containers. They also make use of Marathon-LB and Chronos. The entire platform is hosted on-prem, in NETWAYS datacenter, running in virtual machines on top of NETWAYS private Openstack cloud. Deployment orchestration and configuration management is all provided by Puppet. The underlying OpenStack features full Software Defined Networking through Midonet, and Linux runs all the way down into the network layer, with Mellanox switch hardware running Cumulus Linux. This allows the team to orchestrate every layer with Puppet and their other standard tooling.

The Mesos agents all use the Docker containerizer, and NETWAYS make use of some Docker specific technologies at this level. Persistent storage is provided from their Ceph cluster, via a Docker plugin from YP Engineering (github.com/yp-engineering/rbd-docker-plugin) that enables Docker to provision and mount Ceph Rados Block Devices. Network separation within the platform is provided by using Docker overlay networks, giving layer 2 separation for each customer provisioning services.

Although these integrations have enabled the platform, the main issues that the team have seen in production have also been around the Docker daemon, which can stop responding under certain circumstances, requiring a restart which can cause issues further down the stack like locks in the Ceph layer. One of the future areas the team are interested in is leveraging the Mesos Universal Container Runtime to remove the dependency on Docker.

The platform has now been in production for just over a year, and NETWAYS have seen good customer takeup. Their eventual aim would be to grow the platform to 10 times the size it is currently, depending on customer demand. Mesos’ reliability and operational simplicity has also enabled the team to remain relatively small – although there are 12 engineers in the hosting team overall, only 2 or 3 of them work with Mesos, and then only as part of a wider role. Having built the platform for their own use, NETWAYS have also leveraged their skills in Mesos to offer hosted Mesos clusters for their own customers, and are running a number of managed cluster services. As Sebastian told me “I am a happy customer” !

NETWAYS SaaS offering can be found at nws.netways.de.