Spark Release 0.6.1

Spark 0.6.1 is a maintenance release that contains several important bug fixes and performance improvements. You can download it as a source package (2.4 MB tar.gz) or prebuilt package (48 MB tar.gz).

The fixes and improvements in this version include:

  • Fixed overly aggressive message timeouts that could cause workers to disconnect from the cluster
  • Fixed a bug in the standalone deploy mode that did not expose hostnames to scheduler, affecting HDFS locality
  • Improved connection reuse in shuffle, which can greatly speed up small shuffles (contributed by Reynold Xin)
  • Fixed some potential deadlocks in the block manager (contributed by Tathagata Das)
  • Fixed a bug getting IDs of failed hosts from Mesos (contributed by Imran Rashid)
  • Several EC2 script improvements, like better handling of spot instances (contributed by Josh Rosen)
  • Made the local IP address that Spark binds to customizable (contributed by Mikhail Bautin)
  • Support for Hadoop 2 distributions (contributed by Thomas Dudziak)
  • Support for locating Scala on Debian distributions (contributed by Thomas Dudziak)
  • Improved standalone cluster web UI to show more information about jobs
  • Added an option to spread out jobs over the standalone cluster instead of concentrating them on a small number of nodes (spark.deploy.spreadOut)

We recommend that all Spark 0.6 users update to this maintenance release.


Spark News Archive