Triangle Created with Sketch. }}

Q&A with Zhang Liang of Dangdang: the biggest book seller in China

This is the second post in a six-part series, highlighting Chinese DC/OS and Apache Mesos users that presented at MesosCon Asia in late June. MesosCon North America is coming up from September 13-15th in Los Angeles. Register today.

Hello, thank you so much for taking the time to answer some questions about your use case for Apache Mesos. I’d like to start off by providing some background about Dangdang. What kind of company is Dangdang, and is it similar to a company that might be familiar to US readers?

Dangdang is an E-Commerce company similar to Amazon, established in 1999. It is the biggest book seller in China, and it also sells other products. Although it had an IPO in 2010 on the New York Stock Exchange, Dangdang is currently private, and will IPO again in China in the future.

How does Dangdang use Apache Mesos, and what features are most helpful for you?

We use Mesos as a deployment and execution system. Mesos manages our servers, scales out our applications, and monitors and reports our system status. The two-level scheduler makes it easy to customize the details of our deployment, so we have developed Mesos frameworks by ourselves to schedule our specific jobs and applications. We have a container management system for micro-services, and the two-level scheduler has enabled us to develop and migrate our legacy middleware quickly and easily to Mesos.

I’m glad you called out two-level scheduling, because the ability to customize scheduling rules for specific applications is is one of Mesos’ strengths. Can you tell us a little bit more about the challenges that motivated you to adopt Mesos and write your own custom scheduling tooling?

Sure. Before Mesos, our developers built a new web service to deploy each job they wanted to run–without the help of a job configuration management tool. Dangdang, however, has lots of lightweight jobs, which share the same code, but use different input parameters. Writing a new service for each one of these lightweight jobs didn’t scale as our application’s user base grew. We needed a universal framework to scale different jobs in the cloud.

For example, we have a service that monitors a URL’s availability. In this service, we make URLs and timeout tolerances configurable. We built Elastic-Job-Cloud to distribute workloads onto Mesos, so developers only need to define their own URL and timeout to create a job in this service.

Dangdang also frequently runs data-sync jobs that need to process streaming data, but these can be pretty slow and hard to scale out. Elastic-Job-Cloud’s sharding feature can split a service into several smaller tasks, where every task processes part of the data. If a task fails, Elastic-Job-Cloud restarts it automatically.

Elastic-Job-Cloud is open source, has over 2600 stars on GitHub, and has been forked over 1300 times. Please feel free to take a look at our GitHub account to explore our work more!

That’s a lot of popularity for a Mesos framework, and its great to hear that you are contributing so much back to the Mesos ecosystem! How big is the team of Mesos developers and operators that develop for and manage your Mesos cluster, and how big is the cluster?

We have three developers and three operators who manage the 100 servers we have on Mesos now, operating in a private cloud. Not all of our systems currently run on Mesos, and our legacy private cloud has to support all our workloads and the systems that monitor them. We plan to migrate more and more applications to Mesos in the future. We eventually expect to have over 1000 regular servers running Mesos, and during special periods such as Double 11 (the same as Black Friday in the USA), we will rent cloud servers that will use Mesos to handle scale out.

With all of your experience running on Mesos, I’m sure there are somethings you might like to change. How would you like to see Mesos progress in the future?

We’d like more comprehensive help building and running microservices. A microservice component like spring cloud would help (but spring cloud also has room for improvement). The framework scheduler layer is perfect, but we would like the Mesos executor to deal better with scaling out, sharding and client-based triggers, and routeing.

Challenges aren’t always purely technical. Sometimes getting information across time zones and language barriers can be hard too. How can Mesosphere help improve the community in China?

We have three ideas that would help to improve the Mesos and DC/OS community. First, translating the documentation into Chinese would help with localization. Second, hiring a local tech team for the region would be a huge help. If Mesosphere had a developer team based in China that would be very exciting! Last but not least, holding monthly meetups in China would build the community.

Great advice! Thank you so much for taking the time to talk with us today. It’s been a real pleasure!

Thank you for your attention! I believe Mesos community will continue to grow in China. Mesos has really helped us so far, and so we’re very glad to contribute any feedback we can.

Want to hear from other big Apache Mesos users like Dangdang? MesosCon North America is coming up from September 13-15th in Los Angeles; register today.