pyspark.sql.DataFrameReader.jdbc#
- DataFrameReader.jdbc(url, table, column=None, lowerBound=None, upperBound=None, numPartitions=None, predicates=None, properties=None)[source]#
Construct a
DataFramerepresenting the database table namedtableaccessible via JDBC URLurland connectionproperties.Partitions of the table will be retrieved in parallel if either
columnorpredicatesis specified.lowerBound,upperBoundandnumPartitionsis needed whencolumnis specified.If both
columnandpredicatesare specified,columnwill be used.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- tablestr
the name of the table
- columnstr, optional
alias of
partitionColumnoption. Refer topartitionColumnin Data Source Option for the version you use.- predicateslist, optional
a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the
DataFrame- propertiesdict, optional
a dictionary of JDBC database connection arguments. Normally at least properties “user” and “password” with their corresponding values. For example { ‘user’ : ‘SYSTEM’, ‘password’ : ‘mypassword’ }
- Returns
- Other Parameters
- Extra options
For the extra options, refer to Data Source Option for the version you use.
Notes
Don’t create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.