parallel()

Read(33) Label: parquet, partition, segment,

Description:

Retrieve data from a partition-based Parquet file.

Syntax:

f.parallel([col,…]; [partitionFilter/...]; [colFilter]; [n])

Note:

External library function (See External Library Guide).

 

The function retrieves data from a partition-based Parquet file with segment-based parallel processing; data in each partition should be ordered.

Parameter:

f

A file object.

col

To-be-retrieved fields; return all fields by default.

partitionFilter

A partition filtering condition, which has a k=v structure, such as year=2024 and year=2023/month=10.

colFilter

A field filtering condition, which uses a comparison operator such as >,>=,<,<=,=,!=, not, in and like; this parameter becomes invalid when @v option works.

n

A positive integer representing the number of records to be retrieved; return all records when this parameter is absent. This parameter becomes invalid when @c option works.

Option:

@c

Return a cursor.

Return value:

Table sequence

Example:

 

A

 

1

=file("F:/tmp/mytest.parquet")

Open a local Parquet file.

2

=A1.parallel()

Retrieve data from A1’s file and return all fields.

3

=A1.parallel@c()

Return a cursor.

4

=file("hdfs://localhost:9000/user/hive/warehouse/test1")

 

5

=A4.parallel("id","product","store";"year=2023/month=10";"product < 20";10)

Retrieve the specified fields from the partition where year=2023 and month=10 and only return the first 10 records meeting the filtering condition.