Using Spark's "Hadoop Free" Build
Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages “Hadoop free” builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify
SPARK_DIST_CLASSPATH to include Hadoop’s package jars. The most convenient place to do this is by adding an entry in
This page describes how to connect Spark to Hadoop for different types of distributions.
For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:
Hadoop Free Build Setup for Spark on Kubernetes
To run the Hadoop free build of Spark on Kubernetes, the executor image must have the appropriate version of Hadoop binaries and the correct
SPARK_DIST_CLASSPATH value set. See the example below for the relevant changes needed in the executor Dockerfile: