spark_read()

Read(2214) Label: read, file,

Description:

Read a file stored in a local Spark database or a Spark standalone.

Syntax:

spark_read(con,sfile,k:v,...)

Note:

External library function (See External Library Guide).

 

The function reads content of a file stored in a local Spark database or a Spark standalone, and returns result as a table sequence.

Parameter:

con

Database connection string, which can be a local connection or a connection to Spark standalone.

sfile

File name.

k:v

Separator setting for txt and csv files. For example, to set "#" as separator of a txt file, the parameters will be "sep":"#". By default, use comma as the separator for a txt file, and semicolon as the separator for a csv file.

Option:

@c

Read content of a file and return result as a cursor.

@t

Read the first row of a text file as field names; by default, use the automatically generated field names c0, _c1….

@x

Close Spark database connection.

Return value:

Table sequence/Cursor

Example:

 

A

 

1

=spark_open()

Connect to a local Spark database.

2

=spark_read(A1,"D:/people.txt","sep":" ")

Read a tab-separated txt file.

3

=spark_read@c(A1,"D:/student.csv","sep":",")

Read a comma-separated csv file and return a cursor.

4

=spark_read@t(A1,"D:/score.txt","sep":"\t")

Read a tab-separated txt file and use the first row as field names.

5

=spark_read(A1,"D:/people.json")

Read content of file people.json.

6

>spark_close(A1)

Close the Spark database connection.

7

=spark_open("spark.properties")

Connect to a Spark database.

8

=spark_read@x(A7,"hdfs://localhost:9000/user/hive/warehouse/people.csv")

Read people.csv stored in the Spark database and the close the connection.