Spark Release 3.2.2

Spark 3.2.2 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release.

Notable changes

  • [SPARK-37290]: Exponential planning time in case of non-deterministic function
  • [SPARK-37544]: sequence over dates with month interval is producing incorrect results
  • [SPARK-37643]: When charVarcharAsString is true, char datatype partition table query incorrect
  • [SPARK-37670]: Support predicate pushdown and column pruning for de-duped CTEs
  • [SPARK-37675]: Prevent overwriting of push shuffle merged files once the shuffle is finalized
  • [SPARK-37793]: Invalid LocalMergedBlockData cause task hang
  • [SPARK-37865]: Spark should not dedup the groupingExpressions when the first child of Union has duplicate columns
  • [SPARK-37963]: Need to update Partition URI after renaming table in InMemoryCatalog
  • [SPARK-37995]: TPCDS 1TB q72 fails when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false
  • [SPARK-38018]: Fix ColumnVectorUtils.populate to handle CalendarIntervalType correctly
  • [SPARK-38019]: ExecutorMonitor.timedOutExecutors should be deterministic
  • [SPARK-38023]: ExecutorMonitor.onExecutorRemoved should handle ExecutorDecommission as finished
  • [SPARK-38030]: Query with cast containing non-nullable columns fails with AQE on Spark 3.1.1
  • [SPARK-38042]: Encoder cannot be found when a tuple component is a type alias for an Array
  • [SPARK-38056]: Structured streaming not working in history server when using LevelDB
  • [SPARK-38073]: Update atexit function to avoid issues with late binding
  • [SPARK-38075]: Hive script transform with order by and limit will return fake rows
  • [SPARK-38120]: HiveExternalCatalog.listPartitions is failing when partition column name is upper case and dot in partition value
  • [SPARK-38178]: Correct the logic to measure the memory usage of RocksDB
  • [SPARK-38180]: Allow safe up-cast expressions in correlated equality predicates
  • [SPARK-38185]: Fix data incorrect if aggregate function is empty
  • [SPARK-38204]: All state operators are at a risk of inconsistency between state partitioning and operator partitioning
  • [SPARK-38221]: Group by a stream of complex expressions fails
  • [SPARK-38236]: Absolute file paths specified in create/alter table are treated as relative
  • [SPARK-38271]: PoissonSampler may output more rows than MaxRows
  • [SPARK-38273]: decodeUnsafeRows’s iterators should close underlying input streams
  • [SPARK-38285]: ClassCastException: GenericArrayData cannot be cast to InternalRow
  • [SPARK-38286]: Union’s maxRows and maxRowsPerPartition may overflow
  • [SPARK-38304]: Elt() should return null if index is null under ANSI mode
  • [SPARK-38309]: SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics
  • [SPARK-38320]: (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
  • [SPARK-38325]: ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
  • [SPARK-38333]: DPP cause DataSourceScanExec java.lang.NullPointerException
  • [SPARK-38347]: Nullability propagation in transformUpWithNewOutput
  • [SPARK-38363]: Avoid runtime error in Dataset.summary() when ANSI mode is on
  • [SPARK-38379]: Fix Kubernetes Client mode when mounting persistent volume with storage class
  • [SPARK-38407]: ANSI Cast: loosen the limitation of casting non-null complex types
  • [SPARK-38411]: Use UTF-8 when doMergeApplicationListingInternal reads event logs
  • [SPARK-38412]: from and to is swapped in the StateSchemaCompatibilityChecker
  • [SPARK-38446]: Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
  • [SPARK-38528]: NullPointerException when selecting a generator in a Stream of aggregate expressions
  • [SPARK-38542]: UnsafeHashedRelation should serialize numKeys out
  • [SPARK-38570]: Incorrect DynamicPartitionPruning caused by Literal
  • [SPARK-38579]: Requesting Restful API can cause NullPointerException
  • [SPARK-38587]: Validating new location for rename command should use formatted names
  • [SPARK-38614]: Don’t push down limit through window that’s using percent_rank
  • [SPARK-38631]: Arbitrary shell command injection via Utils.unpack()
  • [SPARK-38652]: uploadFileUri should preserve file scheme
  • [SPARK-38655]: OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null
  • [SPARK-38677]: pyspark hangs in local mode running rdd map operation
  • [SPARK-38684]: Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators
  • [SPARK-38787]: Possible correctness issue on stream-stream join when handling edge case
  • [SPARK-38809]: Implement option to skip null values in symmetric hash impl of stream-stream joins
  • [SPARK-38868]: assert_true fails unconditionnaly after left_outer joins
  • [SPARK-38916]: Tasks not killed caused by race conditions between killTask() and launchTask()
  • [SPARK-38922]: TaskLocation.apply throw NullPointerException
  • [SPARK-38931]: RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint
  • [SPARK-38936]: Script transform feed thread should have name
  • [SPARK-38955]: Disable lineSep option in ‘from_csv’ and ‘schema_of_csv’
  • [SPARK-38977]: Fix schema pruning with correlated subqueries
  • [SPARK-38990]: date_trunc and trunc both fail with format from column in inline table
  • [SPARK-38992]: Avoid using bash -c in ShellBasedGroupsMappingProvider
  • [SPARK-39030]: Rename sum to avoid shading the builtin Python function
  • [SPARK-39061]: Incorrect results or NPE when using Inline function against an array of dynamically created structs
  • [SPARK-39083]: Fix FsHistoryProvider race condition between update and clean app data
  • [SPARK-39084]: df.rdd.isEmpty() results in unexpected executor failure and JVM crash
  • [SPARK-39104]: Null Pointer Exeption on unpersist call
  • [SPARK-39107]: Silent change in regexp_replace’s handling of empty strings
  • [SPARK-39174]: Catalogs loading swallows missing classname for ClassNotFoundException
  • [SPARK-39259]: Timestamps returned by now() and equivalent functions are not consistent in subqueries
  • [SPARK-39283]: Spark tasks stuck forever due to deadlock between TaskMemoryManager and UnsafeExternalSorter
  • [SPARK-39293]: The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map
  • [SPARK-39340]: DS v2 agg pushdown should allow dots in the name of top-level columns
  • [SPARK-39376]: Do not output duplicated columns in star expansion of subquery alias of NATURAL/USING JOIN
  • [SPARK-39393]: Parquet data source only supports push-down predicate filters for non-repeated primitive types
  • [SPARK-39419]: When the comparator of ArraySort returns null, it should fail.
  • [SPARK-39422]: SHOW CREATE TABLE should suggest ‘AS SERDE’ for Hive tables with unsupported serde configurations
  • [SPARK-39447]: Only non-broadcast query stage can propagate empty relation
  • [SPARK-39476]: Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
  • [SPARK-39496]: Inline eval path cannot handle null structs
  • [SPARK-39505]: Escape log content rendered in UI
  • [SPARK-39543]: The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1
  • [SPARK-39548]: CreateView Command with a window clause query hit a wrong window definition not found issue
  • [SPARK-39570]: inline table should allow expressions with alias
  • [SPARK-39575]: ByteBuffer forget to rewind after get in AvroDeserializer
  • [SPARK-39650]: Streaming Deduplication should not check the schema of “value”
  • [SPARK-39672]: NotExists subquery failed with conflicting attributes
  • [SPARK-39758]: NPE on invalid patterns from the regexp functions

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive