Spark-Shuffle

Data Processing

Description

During large shuffles Spark creates a bunch of temporary shuffle blocks to local directory configured by the spark.local.dir property. When the executor stops, an external shuffle service can keep track of these files locations and makes sure that they are not removed while the driver is still alive, so that they can be retrieved later by other executors.

This package provides that mapping and shuffling service seamlessly across a DC/OS cluster.

Installation Documentation: https://github.com/dcos/examples/tree/master/1.8/spark-shuffle

Pre-Install Notes

This DC/OS Service is currently in preview. There may be bugs, incomplete features, incorrect documentation, or other discrepancies.

Post-Install Notes

Service installed.

Licenses

Disclaimer

The software listed above is solely subject to the license(s) listed here, as between you and the creator of the software. Mesosphere is not responsible for, and disclaims any indemnification, warranty of any kind either express or implied, or (unless described in a mutually executed written support agreement) support, with respect to the software listed here.

Ready to get started?