Interface DataWriter<T>
- All Superinterfaces:
- AutoCloseable,- Closeable
- All Known Subinterfaces:
- DeltaWriter<T>
DataWriterFactory.createWriter(int, long) and is
 responsible for writing data for an input RDD partition.
 One Spark task has one exclusive data writer, so there is no thread-safe concern.
 write(Object) is called for each record in the input RDD partition. If one record fails
 the write(Object), abort() is called afterwards and the remaining records will
 not be processed. If all records are successfully written, commit() is called.
 
 Once a data writer returns successfully from commit() or abort(), Spark will
 call Closeable.close() to let DataWriter doing resource cleanup. After calling Closeable.close(),
 its lifecycle is over and Spark will not use it again.
 
 If this data writer succeeds(all records are successfully written and commit()
 succeeds), a WriterCommitMessage will be sent to the driver side and pass to
 BatchWrite.commit(WriterCommitMessage[]) with commit messages from other data
 writers. If this data writer fails(one record fails to write or commit() fails), an
 exception will be sent to the driver side, and Spark may retry this writing task a few times.
 In each retry, DataWriterFactory.createWriter(int, long) will receive a
 different `taskId`. Spark will call BatchWrite.abort(WriterCommitMessage[])
 when the configured number of retries is exhausted.
 
 Besides the retry mechanism, Spark may launch speculative tasks if the existing writing task
 takes too long to finish. Different from retried tasks, which are launched one by one after the
 previous one fails, speculative tasks are running simultaneously. It's possible that one input
 RDD partition has multiple data writers with different `taskId` running at the same time,
 and data sources should guarantee that these data writers don't conflict and can work together.
 Implementations can coordinate with driver during commit() to make sure only one of
 these data writers can commit successfully. Or implementations can allow all of them to commit
 successfully, and have a way to revert committed data writers without the commit message, because
 Spark only accepts the commit message that arrives first and ignore others.
 
 Note that, Currently the type T can only be
 InternalRow.
- Since:
- 3.0.0
- 
Method SummaryModifier and TypeMethodDescriptionvoidabort()Aborts this writer if it is failed.commit()Commits this writer after all records are written successfully, returns a commit message which will be sent back to driver side and passed toBatchWrite.commit(WriterCommitMessage[]).default CustomTaskMetric[]Returns an array of custom task metrics.voidWrites one record.default voidWrites one record with metadata.default voidWrites all records provided by the given iterator.
- 
Method Details- 
writeWrites one record with metadata.This method is used by group-based row-level operations to pass back metadata for records that are updated or copied. New records added during a MERGE operation are written using write(Object)as there is no metadata associated with those records.If this method fails (by throwing an exception), abort()will be called and this data writer is considered to have been failed.- Throws:
- IOException- if failure happens during disk/network IO like writing files.
- Since:
- 4.0.0
 
- 
writeWrites one record.If this method fails (by throwing an exception), abort()will be called and this data writer is considered to have been failed.- Throws:
- IOException- if failure happens during disk/network IO like writing files.
 
- 
writeAllWrites all records provided by the given iterator. By default, it calls thewrite(T, T)method for each record in the iterator.If this method fails (by throwing an exception), abort()will be called and this data writer is considered to have been failed.- Throws:
- IOException- if failure happens during disk/network IO like writing files.
- Since:
- 4.0.0
 
- 
commitCommits this writer after all records are written successfully, returns a commit message which will be sent back to driver side and passed toBatchWrite.commit(WriterCommitMessage[]).The written data should only be visible to data source readers after BatchWrite.commit(WriterCommitMessage[])succeeds, which means this method should still "hide" the written data and ask theBatchWriteat driver side to do the final commit viaWriterCommitMessage.If this method fails (by throwing an exception), abort()will be called and this data writer is considered to have been failed.- Throws:
- IOException- if failure happens during disk/network IO like writing files.
 
- 
abortAborts this writer if it is failed. Implementations should clean up the data for already written records.This method will only be called if there is one record failed to write, or commit()failed.If this method fails(by throwing an exception), the underlying data source may have garbage that need to be cleaned by BatchWrite.abort(WriterCommitMessage[])or manually, but these garbage should not be visible to data source readers.- Throws:
- IOException- if failure happens during disk/network IO like writing files.
 
- 
currentMetricsValuesReturns an array of custom task metrics. By default it returns empty array. Note that it is not recommended to put heavy logic in this method as it may affect writing performance.
 
-