Across many industries, businesses are rapidly transforming so they can realize the greatest value of the data they collect, process and analyze. Yet, while data has become the new oil, processing data at large scale can be still challenging, especially when low latency is required. Low latency is becoming a requirement for more and more businesses, for example, for credit card transaction processing, fraud detection, and various IOT/IIOT use cases.
Modern stream processing engines, like Apache Flink, allow organizations to implement these near real-time computations across large clusters. Since version 1.2, Flink offers native support for Apache Mesos and DC/OS.
In this blog post we describe how this support is improved with the recently released version 1.4 and what other improvements will come with version 1.5.
Apache Flink is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.Flink provides the notion of event-time processing and state management, and it integrates well with other open source projects including Apache Kafka or Apache Cassandra.
Flink’s Initial Framework Design
Since Flink 1.2 the basic workflow for Flink running on Mesos is the following: Flink registers as a Mesos framework scheduler and then the Application Master–which includes both Flink’s ResourceManager and JobManager–start on a Mesos agent. Flink’s ResourceManager also hosts the Mesos scheduler, which communicates with the Mesos cluster and allocates resources for the Mesos tasks, which in turn run Flink’s TaskManagers.
For more details check out Till’s and Jörg’s talk at MesosCon Europe “Apache Flink Meets Apache Mesos and DC/OS”.
Improvements with Flink 1.4
Apache Flink 1.4 contains the following improvements for its Apache Mesos integration:
- First of all, the community added support for placement constraints. These constraints allow control over where Flink deploys its TaskManagers, which is especially useful in multi-tenant scenarios.
- Secondly, Flink 1.4 added support for framework roles in combination with reserved resources. Flink can accept a mixture of reserved and unreserved resources which gives you more flexibility in role setup.
Re-designed Deployment Model with Flink 1.5
The main problem of Flink’s current architecture is that it does not completely exploit Mesos’ functionality to build fully resource elastic applications. This means that a job can only be started with a fixed set of resources and cannot dynamically adapt to changing workloads by scaling up and down.
With Flink 1.5, the community wants to change this and make Flink fully resource elastic. The required re-design of Flink’s architecture is done as part of the Flink improvement proposal 6 (short FLIP-6). The goal of FLIP-6 is not only to add resource elasticity but also to make Flink more flexible with respect to different deployment scenarios.
The underlying motivation for FLIP-6 was the fact that Flink should leverage cluster management frameworks–-e.g., Mesos-–to support full resource elasticity for running jobs and provide first-class support for containerized environments.
FLIP-6, along with already-introduced features like re-scalable state, lays the groundwork for dynamic scaling in Flink. This feature enables Flink to programmatically scale up or down based on required resources–a huge step forward in terms of ease of operability and the efficiency of Flink applications.
This also comes with an improved deployment architecture shown in Figure 2. Flink 1.5 will introduce a Dispatcher component that is responsible for receiving jobs from the client and spawning a dedicated Flink Master Process for each job. Running each job in a separate process will give proper user code isolation when executing multiple jobs on a Flink cluster. Moreover, it significantly simplifies the JobManager component because it now is only responsible for executing a single Flink job.
If you want to learn more about other 1.4/1.5 features, check this Flink Blog Post.
Flink on DC/OS
While Flink provides great support for stream processing, Mesos offers an elastic and fault tolerant way to run applications on a shared cluster. Even before Flink officially supported Mesos as a scheduler, 30 percent of Flink users surveyed in September 2016 said they were running on Mesos. Mesosphere DC/OS is a production-proven, secure platform that enables easy and fault-tolerant deployments of Flink next to a large set of other services such as Apache Kafka, Apache Cassandra or TensorFlow.
The most simple way to install Flink 1.4 on a DC/OS cluster is a dcos package install flink or select Flink from the DC/OS catalogue. Check the >Flink example for more details or check out the Fraud Detection demo for more insights.
Call for Contributions
Flink, Mesos, and DC/OS are all open source projects and you are invited to contribute! Check out the Mesos component and DC/OS integration epics or join the DC/OS community #flink slack channel to get started!