Description:
The function queries data in Hudi tables using file retrieval methods.
Syntax:
spark_hudi(con,tableName, [startTime], [endTime])
Note:
External library function (See External Library Guide).
The function queries data in Hudi tables using file retrieval methods. Hudi supports three query types: snapshot query, incremental query, and read-optimized query. Default is snapshot query.
Parameters startTime and endTime are only valid when @i option works to perform a query within the time interval specified by the automatically generated field (_hoodie_commit_time, whose format is yyyyMMddHHmmssSSS) for the Hudi table. When both parameters are present, the interval value is a left-closed and right-open one.
Parameter:
con |
The database connection string, which supports Spark connection using Hudi format and that using Hudi format used with S3. |
tableName |
A table name, which forms a url along with parameter warehouse in configuration file .properties, such as hdfs://localhost:9000/user/hive/warehouse/tableName, which shows the location of the table in HDFS. |
startTime |
The starting time, whose default value is 0. |
endTime |
The ending time, whose default value is the current time. |
Option:
@i |
Incremental query. |
@o |
Read-optimized query |
@d |
Do not display the automatically generated field for the Hudi table. |
Return value:
Table sequence
Example:
|
|
|
|
|
Connect to a Spark database using Hudi format. |
|
|
Perform snapshot query. |
|
=spark_hudi@i(A1, "huditb1","20250426150362624","20250427164306435") |
Perform incremental query and return data within the specified time interval. |
|
=spark_hudi@i(A1,
"huditb1",,"20250427164306435") |
Return data within a time interval from 0 to the specified ending time. |
|
|
Return data within a time interval from the specified starting time to the current time, and do not display the auto-generated field for the Hudi table. |
|
|
|
|
|
Connect to a Spark database using Hudi format used with S3. |
|
|
Perform read-optimized query. |
|
|
|