Spark

Read(1479) Label: sparkcli,

1. The directory containing files of this external library is: installation directory\esProcext\lib\SparkCli. The Raqsoft core jar for this external library is scu-spark-cli-2.10.jar.

antlr-runtime-3.5.2.jar

antlr4-runtime-4.9.3.jar

arrow-memory-core-12.0.1.jar

arrow-vector-12.0.1.jar

avro-1.12.0.jar

avro-ipc-1.11.2.jar

avro-mapred-1.11.2.jar

breeze_2.12-2.1.0.jar

chill_2.12-0.10.0.jar

chill-java-0.10.0.jar

commons-collections-3.2.2.jar

commons-compiler-3.1.9.jar

commons-lang-2.6.jar

commons-lang3-3.12.0.jar

commons-text-1.10.0.jar

datanucleus-api-jdo-4.2.4.jar

datanucleus-core-4.1.17.jar

datanucleus-rdbms-4.1.19.jar

derby-10.14.2.0.jar

guava-14.0.1.jar

hadoop-aws-3.2.0.jar

hadoop-client-api-3.2.0.jar

hadoop-common-3.2.0.jar

hadoop-client-runtime-3.2.0.jar

hadoop-yarn-server-web-proxy-3.2.0.jar

hive-common-2.3.9.jar

hive-exec-2.3.9-core.jar

hive-jdbc-2.3.9.jar

hive-metastore-2.3.9.jar

hive-serde-2.3.9.jar

hive-shims-0.23-2.3.9.jar

hive-shims-common-2.3.9.jar

hive-storage-api-2.8.1.jar

htrace-core4-4.1.0-incubating.jar

iceberg-spark-runtime-3.5_2.12-1.7.0.jar

jackson-annotations-2.15.2.jar

jackson-core-2.15.2.jar

jackson-core-asl-1.9.13.jar

jackson-databind-2.15.2.jar

jackson-datatype-jsr310-2.15.2.jar

jackson-mapper-asl-1.9.13.jar

jackson-module-scala_2.12-2.15.2.jar

jakarta.servlet-api-4.0.3.jar

janino-3.1.9.jar

javax.jdo-3.2.0-m3.jar

jersey-container-servlet-2.40.jar

jersey-container-servlet-core-2.40.jar

jersey-server-2.40.jar

joda-time-2.15.2.jar

json4s-ast_2.12-3.7.0-M11.jar

json4s-core_2.12-3.7.0-M11.jar

json4s-jackson_2.12-3.7.0-M11.jar

json4s-scalap_2.12-3.7.0-M11.jar

jsr305-3.0.0.jar

kryo-shaded-4.0.2.jar

libfb303-0.9.3.jar

libthrift-0.12.0.jar

llz4-java-1.8.0.jar

log4j-1.2-api-2.20.0.jar

metrics-core-4.2.19.jar

metrics-graphite-4.2.19.jar

metrics-jmx-4.2.19.jar

metrics-json-4.2.19.jar

metrics-jvm-4.2.19.jar

minlog-1.3.0.jar

netty-buffer-4.1.96.Final.jar

netty-codec-4.1.96.Final.jar

netty-common-4.1.96.Final.jar

netty-handler-4.1.96.Final.jar

netty-transport-4.1.96.Final.jar

netty-transport-native-unix-common-4.1.96.Final.jar

objenesis-3.2.jar

orc-core-1.9.4-shaded-protobuf.jar

paranamer-2.8.jar

parquet-column-1.13.1.jar

parquet-common-1.13.1.jar

parquet-encoding-1.13.1.jar

parquet-format-structures-1.13.1.jar

parquet-hadoop-1.13.1.jar

parquet-jackson-1.13.1.jar

RoaringBitmap-0.9.47.jar

scala-compiler-2.12.18.jar

scala-library-2.12.18.jar

scala-reflect-2.12.18.jar

scala-xml_2.12-2.1.0.jar

slf4j-api-2.0.7.jar

slf4j-simple-1.7.31.jar

snappy-java-1.1.8.3.jar

spark-catalyst_2.12-3.5.3.jar

spark-common-utils_2.12-3.5.3.jar

spark-core_2.12-3.5.3.jar

spark-graphx_2.12-3.5.3.jar

spark-hive_2.12-3.5.3.jar

spark-hive-thriftserver_2.12-3.5.3.jar

spark-kvstore_2.12-3.5.3.jar

spark-launcher_2.12-3.5.3.jar

spark-mllib_2.12-3.5.3.jar

spark-mllib-local_2.12-3.5.3.jar

spark-network-common_2.12-3.5.3.jar

spark-network-shuffle_2.12-3.5.3.jar

spark-repl_2.12-3.5.3.jar

spark-sketch_2.12-3.5.3.jar

spark-sql_2.12-3.5.3.jar

spark-sql-api_2.12-3.5.3.jar

spark-streaming_2.12-3.5.3.jar

spark-tags_2.12-3.5.3.jar

spark-unsafe_2.12-3.5.3.jar

spark-yarn_2.12-3.5.3.jar

stax2-api-3.1.4.jar

stax-api-1.0.1.jar

stream-2.9.6.jar

transaction-api-1.1.jar

univocity-parsers-2.9.1.jar

woodstox-core-5.0.3.jar

xbean-asm9-shaded-4.23.jar

zaws-java-sdk-bundle-1.11.375.jar

zhudi-spark3.5-bundle_2.12-0.15.0.jar

zstd-jni-1.5.5-4.jar

Note: The third-party jars are encapsulated in the compression package and users can choose appropriate ones for specific scenarios.

 

2. Download the following four files from the web and place them in installation directory\bin:

hadoop.dll

hadoop.lib

libwinutils.lib

winutils.exe

Note: The above files are required under Windows environment, but not under Linux. There are x86 winutils.exe and x64 winutils.exe depending on different OS versions.

 

3. You can put different configuration files (.properties) in scu-spark-cli-2.10.jar according to your specific needs. For the time being SparkCli library supports the local connection, normal connection, connection using Hudi/Iceberg formats, and connection using Hudi/Iceberg formats used with S3.

Configuration files for different types of connection are shown below. You can make the configurations as needed:

(1) You do not need to put the configuration file in the above-mentioned jar for the local connection;

(2) Configure the file (spark.properties) used for a normal connection as follows:

fs.default.name=hdfs://localhost:9000/

hive.metastore.uris=thrift://localhost:9083

hive.metastore.local=false

hive.metastore.warehouse.dir=/user/hive/warehouse

(3) Configure the file (hudi.properties) used for a connection using Hudi format as follows:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog

spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension

spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar

spark.jars.packages=org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0

spark.sql.catalog.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse

spark.io.compression.codec=snappy

hive.metastore.uris=thrift://localhost:9083

master=local[*]

(4) Configure the file (iceberg.properties) used for a connection using Iceberg format as follows:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog

spark.sql.catalog.local.type=hadoop

spark.sql.catalog.local.warehouse=hdfs://localhost:9000/user/hive/warehouse

spark.io.compression.codec=lz4

hive.metastore.uris=thrift://localhost:9083

(5) Configure the file (hudi-s3.properties) used for a connection using Hudi format used with S3 as follows:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn

spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO

spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP

spark.hadoop.fs.s3a.region=cn-north-1

spark.hadoop.fs.s3a.path.style.access=true

spark.hadoop.fs.s3a.connection.ssl.enabled=false

spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

spark.sql.catalog.warehouse.dir=s3a://mytest/hudi

(6) Configure the file (iceberg-s3.properties) used for a connection using Iceberg format used with S3 as follows:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn

spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO

spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP

spark.hadoop.fs.s3a.region=cn-north-1

spark.hadoop.fs.s3a.path.style.access=true

spark.hadoop.fs.s3a.connection.ssl.enabled=false

spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

hive.metastore.schema.verification=false

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog

spark.sql.catalog.local.type=hadoop

spark.sql.catalog.local.warehouse=s3a://mytest/iceberg

 

4. A JRE version of JDK11 or above is required. You need to modify the startup file (startup.bat/start.sh) manually before the connection:

l  startup.sh

#!/bin/bash

source [installation directory]/esproc/esProc/bin/setEnv.sh

$EXEC_JAVA $(jvm_args=$(sed -n 's/.*jvm_args=\(.*\).*/\1/p' "$START_HOME"/esProc/bin/config.txt)

echo " $jvm_args") -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp "$START_HOME"/esProc/classes:"$START_HOME"/esProc/lib/*:"$START_HOME"/common/jdbc/* -Duser.language="$language" -Dstart.home="$START_HOME"/esProc  com.scudata.ide.spl.EsprocEE

l  startup.bat

@echo off

call "[installation directory]\esProc\bin\setEnv.bat"

start "dm" %EXECJAVAW% -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp %START_HOME%\esProc\classes;%RAQCLASSPATH% -Duser.language=zh -Dstart.home=%START_HOME%\esProc -Djava.net.useSystemProxies=true  com.scudata.ide.spl.EsprocEE

 

5. Users can manually change the size of memory if the default size isn’t large enough for needs. Under Windows, make the change in config.txt when starting esProc through .exe file; and in .bat file when starting the application through the .bat file. Make the modification in .sh file under Linux.

To modify the config.txt file under Windows:

java_home=C:\ProgramFiles\Java\JDK1.7.0_11;esproc_port=48773;btx_port=41735;gtm_port=41737;jvm_args=-Xms256m -XX:PermSize=256M -XX:MaxPermSize=512M -Xmx9783m -Duser.language=zh

 

 

 

6. esProc provides functions spark_open(), spark_query(), spark_hudi(),spark_close(), spark_read() and spark_shell() to access the Spark systems. Look them up inHelp-Function referenceto find their uses.