spark_hudi()

Read(67) Label: query, hudi, time stamp,

Description:

The function queries data in Hudi tables using file retrieval methods.

Syntax:

spark_hudi(con,tableName, [startTime], [endTime])

Note:

External library function (See External Library Guide).

 

The function queries data in Hudi tables using file retrieval methods. Hudi supports three query types: snapshot query, incremental query, and read-optimized query. Default is snapshot query.

 

Parameters startTime and endTime are only valid when @i option works to perform a query within the time interval specified by the automatically generated field (_hoodie_commit_time, whose format is yyyyMMddHHmmssSSS) for the Hudi table. When both parameters are present, the interval value is a left-closed and right-open one.

Parameter:

con

The database connection string, which supports Spark connection using Hudi format and that using Hudi format used with S3.

tableName

A table name, which forms a url along with parameter warehouse in configuration file .properties, such as hdfs://localhost:9000/user/hive/warehouse/tableName,  which shows the location of the table in HDFS.

startTime

The starting time, whose default value is 0.

endTime

The ending time, whose default value is the current time.

Option:

@i

Incremental query.

@o

Read-optimized query

@d

Do not display the automatically generated field for the Hudi table.

Return value:

Table sequence

Example:

 

A

 

1

=spark_open("hudi.properties")

Connect to a Spark database using Hudi format.

2

=spark_hudi(A1, "huditb1")

Perform snapshot query.

3

=spark_hudi@i(A1, "huditb1","20250426150362624","20250427164306435")

Perform incremental query and return data within the specified  time interval.

4

=spark_hudi@i(A1, "huditb1",,"20250427164306435")

Return data within a time interval from 0 to the specified ending time.

5

=spark_hudi@id(A1, "huditb1","20250427164306435")

Return data within a time interval from the specified starting time to the current time, and do not display the auto-generated field for the Hudi table.

6

>spark_close(A1)

 

7

=spark_open("hudi-s3.properties")

Connect to a Spark database using Hudi format used with S3.

8

=spark_hudi@o(A7, "huditb1")

Perform read-optimized query.

9

>spark_close(A7)