Spark Release 2.4.8

Spark 2.4.8 is a maintenance release containing stability, correctness, and security fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release.

Notable changes

  • [SPARK-21492]: Fix memory leak in SortMergeJoin
  • [SPARK-25271]: Creating parquet table with all the column null throws exception
  • [SPARK-26625]: spark.redaction.regex should include oauthToken
  • [SPARK-26645]: CSV infer schema bug infers decimal(9,-1)
  • [SPARK-27575]: Spark overwrites existing value of spark.yarn.dist.* instead of merging value
  • [SPARK-27872]: Driver and executors use a different service account breaking pull secrets
  • [SPARK-29574]: spark with user provided hadoop doesn’t work on kubernetes
  • [SPARK-30201]: HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
  • [SPARK-32635]: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
  • [SPARK-32708]: Query optimization fails to reuse exchange with DataSourceV2
  • [SPARK-32715]: Broadcast block pieces may memory leak
  • [SPARK-32738]: thread safe endpoints may hang due to fatal error
  • [SPARK-32794]: Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources
  • [SPARK-32815]: Fix LibSVM data source loading error on file paths with glob metacharacters
  • [SPARK-32836]: Fix DataStreamReaderWriterSuite to check writer options correctly
  • [SPARK-32872]: BytesToBytesMap at MAX_CAPACITY exceeds growth threshold
  • [SPARK-32900]: UnsafeExternalSorter.SpillableIterator cannot spill when there are NULLs in the input and radix sorting is used.
  • [SPARK-32901]: UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while spilling
  • [SPARK-32908]: percentile_approx() returns incorrect results
  • [SPARK-32999]: TreeNode.nodeName should not throw malformed class name error
  • [SPARK-33094]: ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
  • [SPARK-33101]: LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
  • [SPARK-33131]: Fix grouping sets with having clause can not resolve qualified col name
  • [SPARK-33136]: Handling nullability for complex types is broken during resolution of V2 write command
  • [SPARK-33183]: Bug in optimizer rule EliminateSorts
  • [SPARK-33230]: FileOutputWriter jobs have duplicate JobIDs if launched in same second
  • [SPARK-33268]: Fix bugs for casting data from/to PythonUserDefinedType
  • [SPARK-33277]: Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
  • [SPARK-33292]: Make Literal ArrayBasedMapData string representation disambiguous
  • [SPARK-33338]: GROUP BY using literal map should not fail
  • [SPARK-33339]: Pyspark application will hang due to non Exception
  • [SPARK-33372]: Fix InSet bucket pruning
  • [SPARK-33472]: IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements
  • [SPARK-33588]: Partition spec in SHOW TABLE EXTENDED doesn’t respect spark.sql.caseSensitive
  • [SPARK-33593]: Vector reader got incorrect data with binary partition value
  • [SPARK-33726]: Duplicate field names causes wrong answers during aggregation
  • [SPARK-33733]: PullOutNondeterministic should check and collect deterministic field
  • [SPARK-33756]: BytesToBytesMap’s iterator hasNext method should be idempotent.
  • [SPARK-34125]: Make EventLoggingListener.codecMap thread-safe
  • [SPARK-34187]: Use available offset range obtained during polling when checking offset validation
  • [SPARK-34212]: For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
  • [SPARK-34229]: Avro should read decimal values with the file schema
  • [SPARK-34260]: UnresolvedException when creating temp view twice
  • [SPARK-34273]: Do not reregister BlockManager when SparkContext is stopped
  • [SPARK-34318]: Dataset.colRegex should work with column names and qualifiers which contain newlines
  • [SPARK-34327]: Omit inlining passwords during build process.
  • [SPARK-34596]: NewInstance.doGenCode should not throw malformed class name error
  • [SPARK-34607]: NewInstance.resolved should not throw malformed class name error
  • [SPARK-34724]: Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
  • [SPARK-34726]: Fix collectToPython timeouts
  • [SPARK-34776]: Catalyst error on on certain struct operation (Couldn’t find gen_alias)
  • [SPARK-34811]: Redact fs.s3a.access.key like secret and token
  • [SPARK-34855]: SparkContext - avoid using local lazy val
  • [SPARK-34876]: Non-nullable aggregates can return NULL in a correlated subquery
  • [SPARK-34909]: conv() does not convert negative inputs to unsigned correctly
  • [SPARK-34939]: Throw fetch failure exception when unable to deserialize broadcasted map statuses
  • [SPARK-34963]: Nested column pruning fails to extract case-insensitive struct field from array
  • [SPARK-35080]: Correlated subqueries with equality predicates can return wrong results
  • [SPARK-35278]: Invoke should find the method with correct number of parameters
  • [SPARK-35288]: StaticInvoke should find the method without exact argument classes match

Dependency Changes

Known issues

