Spark Release 3.4.1
Spark 3.4.1 is a maintenance release containing stability fixes. This release is based on the branch-3.4 maintenance branch of Spark. We strongly recommend all 3.4 users to upgrade to this stable release.
Notable changes
- [SPARK-44383]: Fix the trim logic did’t handle ASCII control characters correctly
- [SPARK-37829]: Dataframe.joinWith outer-join should return a null value for unmatched row
- [SPARK-42078]: Add
CapturedException
to utils
- [SPARK-42290]: Fix the OOM error can’t be reported when AQE on
- [SPARK-42421]: Use the utils to get the switch for dynamic allocation used in local checkpoint
- [SPARK-42475]: Fix PySpark connect Quickstart binder link
- [SPARK-42826]: Update migration notes for pandas API on Spark
- [SPARK-43043]: Improve the performance of MapOutputTracker.updateMapOutput
- [SPARK-43050]: Fix construct aggregate expressions by replacing grouping functions
- [SPARK-43067]: Correct the location of error class resource file in Kafka connector
- [SPARK-43069]: Use
sbt-eclipse
instead of sbteclipse-plugin
- [SPARK-43071]: Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation
- [SPARK-43072]: Include TIMESTAMP_NTZ type in ANSI Compliance doc
- [SPARK-43075]: Change
gRPC
to grpcio
when it is not installed.
- [SPARK-43083]: Mark
*StateStoreSuite
as ExtendedSQLTest
- [SPARK-43085]: Support column DEFAULT assignment for multi-part table names
- [SPARK-43098]: Fix correctness COUNT bug when scalar subquery has group by clause
- [SPARK-43113]: Evaluate stream-side variables when generating code for a bound condition
- [SPARK-43125]: Fix Connect Server Can’t Handle Exception With Null Message
- [SPARK-43126]: Mark two Hive UDF expressions as stateful
- [SPARK-43139]: Fix incorrect column names in sql-ref-syntax-dml-insert-table.md
- [SPARK-43141]: Ignore generated Java files in checkstyle
- [SPARK-43156]: Fix
COUNT(*) is null
bug in correlated scalar subquery
- [SPARK-43157]: Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects
- [SPARK-43158]: Set upperbound of pandas version for Binder integration
- [SPARK-43249]: Fix missing stats for SQL Command
- [SPARK-43281]: Fix concurrent writer does not update file metrics
- [SPARK-43284]: Switch back to url-encoded strings
- [SPARK-43293]:
__qualified_access_only
should be ignored in normal columns
- [SPARK-43313]: Adding missing column DEFAULT values for MERGE INSERT actions
- [SPARK-43336]: Casting between Timestamp and TimestampNTZ requires timezone
- [SPARK-43337]: Asc/desc arrow icons for sorting column does not get displayed in the table column
- [SPARK-43340]: Handle missing stack-trace field in eventlogs
- [SPARK-43342]: Revert SPARK-39006 Show a directional error message for executor PVC dynamic allocation failure
- [SPARK-43374]: Move protobuf-java to BSD 3-clause group and update the license copy
- [SPARK-43378]: Properly close stream objects in deserializeFromChunkedBuffer
- [SPARK-43395]: Exclude macOS tar extended metadata in make-distribution.sh
- [SPARK-43398]: Executor timeout should be max of idle shuffle and rdd timeout
- [SPARK-43404]: Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error
- [SPARK-43414]: Fix flakiness in Kafka RDD suites due to port binding configuration issue
- [SPARK-43425]: Add
TimestampNTZType
to ColumnarBatchRow
- [SPARK-43441]:
makeDotNode
should not fail when DeterministicLevel is absent
- [SPARK-43450]: Add more
_metadata
filter test cases
- [SPARK-43471]: Handle missing hadoopProperties and metricsProperties
- [SPARK-43483]: Adds SQL references for OFFSET clause
- [SPARK-43510]: Fix YarnAllocator internal state when adding running executor after processing completed containers
- [SPARK-43517]: Add a migration guide for namedtuple monkey patch
- [SPARK-43522]: Fix creating struct column name with index of array
- [SPARK-43527]: Fix
catalog.listCatalogs
in PySpark
- [SPARK-43541]: Propagate all
Project
tags in resolving of expressions and missing columns
- [SPARK-43547]: Update “Supported Pandas API” page to point out the proper pandas docs
- [SPARK-43587]: Run
HealthTrackerIntegrationSuite
in a dedicated JVM
- [SPARK-43589]: Fix
cannotBroadcastTableOverMaxTableBytesError
to use bytesToString
- [SPARK-43718]: Set nullable correctly for keys in USING joins
- [SPARK-43719]: Handle
missing row.excludedInStages
field
- [SPARK-43751]: Document
unbase64
behavior change
- [SPARK-43758]: Update Hadoop 2 dependency manifest
- [SPARK-43759]: Expose TimestampNTZType in pyspark.sql.types
- [SPARK-43760]: Nullability of scalar subquery results
- [SPARK-43802]: Fix codegen for unhex and unbase64 with failOnError=true
- [SPARK-43894]: Fix bug in df.cache()
-
[SPARK-43956]: Fix the bug doesn’t display column’s sql for Percentile[Cont |
Disc] |
- [SPARK-43973]: Structured Streaming UI should display failed queries correctly
- [SPARK-43976]: Handle the case where modifiedConfigs doesn’t exist in event logs
- [SPARK-44018]: Improve the hashCode and toString for some DS V2 Expression
- [SPARK-44038]: Update YuniKorn docs with v1.3
- [SPARK-44040]: Fix compute stats when AggregateExec node above QueryStageExec
Dependency Changes
While being a maintenance release we did still upgrade some dependencies in this release they are:
You can consult JIRA for the detailed changes.
We would like to acknowledge all community members for contributing patches to this release.
Spark News Archive