Spark Release 3.1.2
Spark 3.1.2 is a maintenance release containing stability fixes. This release is based on the branch-3.1 maintenance branch of Spark. We strongly recommend all 3.1 users to upgrade to this stable release.
Notable changes
- [SPARK-32924]: Web UI sort on duration is wrong
- [SPARK-33482]: V2 Datasources that extend FileScan preclude exchange reuse
- [SPARK-34361]: Dynamic allocation on K8s kills executors with running tasks
- [SPARK-34392]: Invalid ID for offset-based ZoneId since Spark 3.0
- [SPARK-34417]: DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot
- [SPARK-34436]: DPP support LIKE ANY/ALL
- [SPARK-34473]: Avoid NPE in DataFrameReader.schema(StructType)
- [SPARK-34482]: Correct the active SparkSession for streaming query
- [SPARK-34490]: Table maybe resolved as a view if the table is dropped
- [SPARK-34497]: JDBC connection provider is not removing kerberos credentials from JVM security context
- [SPARK-34504]: Avoid unnecessary view resolving and remove the
performCheck
flag
- [SPARK-34515]: Fix NPE if InSet contains null value during getPartitionsByFilter
- [SPARK-34534]: New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss or correctness
- [SPARK-34543]: Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
- [SPARK-34545]: Fix inconsistent results when applying 2 Python UDFs with different return type to 2 columns together
- [SPARK-34547]: Resolve using child metadata attributes as fallback
- [SPARK-34550]: Skip InSet null value during push filter to Hive metastore
- [SPARK-34555]: Resolve metadata output from DataFrame
- [SPARK-34556]: Checking duplicate static partition columns doesn’t respect case sensitive conf
- [SPARK-34561]: Cannot drop/add columns from/to a dataset of v2
DESCRIBE TABLE
- [SPARK-34567]: CreateTableAsSelect should have metrics update too
- [SPARK-34577]: Cannot drop/add columns from/to a dataset of v2
DESCRIBE NAMESPACE
- [SPARK-34584]: When insert into a partition table with a illegal partition value, DSV2 behavior different as others
- [SPARK-34596]: NewInstance.doGenCode should not throw malformed class name error
- [SPARK-34599]: INSERT INTO OVERWRITE doesn’t support partition columns containing dot for DSv2
- [SPARK-34607]: NewInstance.resolved should not throw malformed class name error
- [SPARK-34610]: Fix Python UDF used in GroupedAggPandasUDFTests
- [SPARK-34613]: Fix view does not capture disable hint config
- [SPARK-34630]: Add type hints of pyspark.version and pyspark.sql.Column.contains
- [SPARK-34639]: Always remove unnecessary Alias in Analyzer.resolveExpression
- [SPARK-34674]: Spark app on k8s doesn’t terminate without call to sparkContext.stop() method
- [SPARK-34681]: Full outer shuffled hash join when building left side produces wrong result
- [SPARK-34682]: Regression in “operating on canonicalized plan” check in CustomShuffleReaderExec
- [SPARK-34697]: Allow DESCRIBE FUNCTION and SHOW FUNCTIONS explain about string concatenation operator
- [SPARK-34713]: Group by CreateStruct with ExtractValue fails analysis
- [SPARK-34714]: collect_list(struct()) fails when used with GROUP BY
- [SPARK-34719]: Fail if the view query has duplicated column names
- [SPARK-34723]: Correct parameter type for subexpression elimination under whole-stage
- [SPARK-34724]: Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
- [SPARK-34727]: Difference in results of casting float to timestamp
- [SPARK-34731]: ConcurrentModificationException in EventLoggingListener when redacting properties
- [SPARK-34737]: Discrepancy between TIMESTAMP_SECONDS and cast from float
- [SPARK-34756]: Fix FileScan equality check
- [SPARK-34763]: col(), $”" and df("name") should handle quoted column names properly
- [SPARK-34766]: Do not capture maven config for views
- [SPARK-34768]: Respect the default input buffer size in Univocity
- [SPARK-34770]: InMemoryCatalog.tableExists should not fail if database doesn’t exist
- [SPARK-34772]: RebaseDateTime loadRebaseRecords should use Spark classloader instead of context
- [SPARK-34776]: Catalyst error on on certain struct operation (Couldn’t find gen_alias)
- [SPARK-34790]: Fail in fetch shuffle blocks in batch when i/o encryption is enabled
- [SPARK-34794]: Nested higher-order functions broken in DSL
- [SPARK-34796]: Codegen compilation error for query with LIMIT operator and without AQE
- [SPARK-34803]: Pass the raised ImportError if pandas or pyarrow fail to import
- [SPARK-34811]: Redact fs.s3a.access.key like secret and token
- [SPARK-34814]: LikeSimplification should handle NULL
- [SPARK-34829]: Fix higher order function results
- [SPARK-34833]: Apply right-padding correctly for correlated subqueries
- [SPARK-34834]: There is a potential Netty memory leak in TransportResponseHandler
- [SPARK-34840]: Fix cases of corruption in merged shuffle blocks that are pushed
- [SPARK-34845]: computeAllMetrics may return partial metrics when some child metrics are missing
- [SPARK-34876]: Non-nullable aggregates can return NULL in a correlated subquery
- [SPARK-34897]: Support reconcile schemas based on index after nested column pruning
- [SPARK-34909]: conv() does not convert negative inputs to unsigned correctly
- [SPARK-34922]: Use better CBO cost function
- [SPARK-34923]: Metadata output should not always be propagated
- [SPARK-34926]: PartitionUtils.getPathFragment should handle null value
- [SPARK-34939]: Throw fetch failure exception when unable to deserialize broadcasted map statuses
- [SPARK-34948]: Add ownerReference to executor configmap to fix leakages
- [SPARK-34949]: Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
- [SPARK-34963]: Nested column pruning fails to extract case-insensitive struct field from array
- [SPARK-34970]: Redact map-type options in the output of explain()
- [SPARK-35014]: A foldable expression could not be replaced by an AttributeReference
- [SPARK-35019]: Improve type hints on pyspark.sql.*
- [SPARK-35045]: Add an internal option to control input buffer in univocity
- [SPARK-35079]: Transform with udf gives incorrect result
- [SPARK-35080]: Correlated subqueries with equality predicates can return wrong results
- [SPARK-35087]: Fix some columns in table
Aggregated Metrics by Executor
of stage-detail page
- [SPARK-35093]: AQE columnar mismatch on exchange reuse
- [SPARK-35096]: foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive
- [SPARK-35106]: HadoopMapReduceCommitProtocol performs bad rename when dynamic partition overwrite is used
- [SPARK-35117]: UI progress bar no longer highlights in progress tasks
- [SPARK-35127]: When switching between different stage-detail pages, the entry in newly-opened page may be blank
- [SPARK-35136]: Initial null value of LiveStage.info can lead to NPE
- [SPARK-35142]:
OneVsRest
classifier uses incorrect data type for rawPrediction
column
- [SPARK-35168]: mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
- [SPARK-35213]: Corrupt DataFrame for certain withField patterns
- [SPARK-35226]: JDBC datasources should accept refreshKrb5Config parameter
- [SPARK-35227]: Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
- [SPARK-35244]: invoke should throw the original exception
- [SPARK-35278]: Invoke should find the method with correct number of parameters
- [SPARK-35288]: StaticInvoke should find the method without exact argument classes match
- [SPARK-35359]: Insert data with char/varchar datatype will fail when data length exceed length limitation
- [SPARK-35381]: Fix lambda variable name issues in nested DataFrame functions in R APIs
- [SPARK-35382]: Fix lambda variable name issues in nested DataFrame functions in Python APIs
- [SPARK-35411]: Essential information missing in TreeNode json string
- [SPARK-35482]: case sensitive block manager port key should be used in BasicExecutorFeatureStep
- [SPARK-35493]: spark.blockManager.port does not work for driver pod
Dependency Changes
While being a maintence release we did still upgrade some dependencies in this release they are:
You can consult JIRA for the detailed changes.
We would like to acknowledge all community members for contributing patches to this release.
Spark News Archive