1. The directory containing files of this external library is: installation directory\esProcext\lib\SparkCli. The Raqsoft core jar for this external library is scu-spark-cli-2.10.jar.
antlr-runtime-3.5.2.jar
antlr4-runtime-4.9.3.jar
arrow-memory-core-12.0.1.jar
arrow-vector-12.0.1.jar
avro-1.12.0.jar
avro-ipc-1.11.2.jar
avro-mapred-1.11.2.jar
breeze_2.12-2.1.0.jar
chill_2.12-0.10.0.jar
commons-collections-3.2.2.jar
commons-compiler-3.1.9.jar
commons-lang-2.6.jar
commons-lang3-3.12.0.jar
commons-text-1.10.0.jar
datanucleus-api-jdo-4.2.4.jar
datanucleus-core-4.1.17.jar
datanucleus-rdbms-4.1.19.jar
derby-10.14.2.0.jar
guava-14.0.1.jar
hadoop-aws-3.2.0.jar
hadoop-client-api-3.2.0.jar
hadoop-common-3.2.0.jar
hadoop-client-runtime-3.2.0.jar
hadoop-yarn-server-web-proxy-3.2.0.jar
hive-common-2.3.9.jar
hive-exec-2.3.9-core.jar
hive-jdbc-2.3.9.jar
hive-metastore-2.3.9.jar
hive-serde-2.3.9.jar
hive-shims-0.23-2.3.9.jar
hive-shims-common-2.3.9.jar
hive-storage-api-2.8.1.jar
htrace-core4-4.1.0-incubating.jar
iceberg-spark-runtime-3.5_2.12-1.7.0.jar
jackson-annotations-2.15.2.jar
jackson-core-2.15.2.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.15.2.jar
jackson-datatype-jsr310-2.15.2.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-scala_2.12-2.15.2.jar
jakarta.servlet-api-4.0.3.jar
janino-3.1.9.jar
javax.jdo-3.2.0-m3.jar
jersey-container-servlet-2.40.jar
jersey-container-servlet-core-2.40.jar
jersey-server-2.40.jar
joda-time-2.15.2.jar
json4s-ast_2.12-3.7.0-M11.jar
json4s-core_2.12-3.7.0-M11.jar
json4s-jackson_2.12-3.7.0-M11.jar
json4s-scalap_2.12-3.7.0-M11.jar
jsr305-3.0.0.jar
kryo-shaded-4.0.2.jar
libfb303-0.9.3.jar
libthrift-0.12.0.jar
llz4-java-1.8.0.jar
log4j-1.2-api-2.20.0.jar
metrics-core-4.2.19.jar
metrics-graphite-4.2.19.jar
metrics-jmx-4.2.19.jar
metrics-json-4.2.19.jar
metrics-jvm-4.2.19.jar
minlog-1.3.0.jar
netty-buffer-4.1.96.Final.jar
netty-codec-4.1.96.Final.jar
netty-common-4.1.96.Final.jar
netty-handler-4.1.96.Final.jar
netty-transport-4.1.96.Final.jar
netty-transport-native-unix-common-4.1.96.Final.jar
objenesis-3.2.jar
orc-core-1.9.4-shaded-protobuf.jar
paranamer-2.8.jar
parquet-column-1.13.1.jar
parquet-common-1.13.1.jar
parquet-encoding-1.13.1.jar
parquet-format-structures-1.13.1.jar
parquet-hadoop-1.13.1.jar
parquet-jackson-1.13.1.jar
RoaringBitmap-0.9.47.jar
scala-compiler-2.12.18.jar
scala-library-2.12.18.jar
scala-reflect-2.12.18.jar
scala-xml_2.12-2.1.0.jar
slf4j-api-2.0.7.jar
slf4j-simple-1.7.31.jar
snappy-java-1.1.8.3.jar
spark-catalyst_2.12-3.5.3.jar
spark-common-utils_2.12-3.5.3.jar
spark-core_2.12-3.5.3.jar
spark-graphx_2.12-3.5.3.jar
spark-hive_2.12-3.5.3.jar
spark-hive-thriftserver_2.12-3.5.3.jar
spark-kvstore_2.12-3.5.3.jar
spark-launcher_2.12-3.5.3.jar
spark-mllib_2.12-3.5.3.jar
spark-mllib-local_2.12-3.5.3.jar
spark-network-common_2.12-3.5.3.jar
spark-network-shuffle_2.12-3.5.3.jar
spark-repl_2.12-3.5.3.jar
spark-sketch_2.12-3.5.3.jar
spark-sql_2.12-3.5.3.jar
spark-sql-api_2.12-3.5.3.jar
spark-streaming_2.12-3.5.3.jar
spark-tags_2.12-3.5.3.jar
spark-unsafe_2.12-3.5.3.jar
spark-yarn_2.12-3.5.3.jar
stax2-api-3.1.4.jar
stax-api-1.0.1.jar
stream-2.9.6.jar
transaction-api-1.1.jar
univocity-parsers-2.9.1.jar
woodstox-core-5.0.3.jar
xbean-asm9-shaded-4.23.jar
zaws-java-sdk-bundle-1.11.375.jar
zhudi-spark3.5-bundle_2.12-0.15.0.jar
zstd-jni-1.5.5-4.jar
Note: The third-party jars are encapsulated in the compression package and users can choose appropriate ones for specific scenarios.
2. Download the following four files from the web and place them in installation directory\bin:
hadoop.dll
hadoop.lib
libwinutils.lib
winutils.exe
Note: The above files are required under Windows environment, but not under Linux. There are x86 winutils.exe and x64 winutils.exe depending on different OS versions.
3. You can put different configuration files (.properties) in scu-spark-cli-2.10.jar according to your specific needs. For the time being SparkCli library supports the local connection, normal connection, connection using Hudi/Iceberg formats, and connection using Hudi/Iceberg formats used with S3.
Configuration files for different types of connection are shown below. You can make the configurations as needed:
(1) You do not need to put the configuration file in the above-mentioned jar for the local connection;
(2) Configure the file (spark.properties) used for a normal connection as follows:
fs.default.name=hdfs://localhost:9000/
hive.metastore.uris=thrift://localhost:9083
hive.metastore.local=false
hive.metastore.warehouse.dir=/user/hive/warehouse
(3) Configure the file (hudi.properties) used for a connection using Hudi format as follows:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar
spark.jars.packages=org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0
spark.sql.catalog.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse
spark.io.compression.codec=snappy
hive.metastore.uris=thrift://localhost:9083
master=local[*]
(4) Configure the file (iceberg.properties) used for a connection using Iceberg format as follows:
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=hdfs://localhost:9000/user/hive/warehouse
spark.io.compression.codec=lz4
hive.metastore.uris=thrift://localhost:9083
(5) Configure the file (hudi-s3.properties) used for a connection using Hudi format used with S3 as follows:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn
spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO
spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP
spark.hadoop.fs.s3a.region=cn-north-1
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.catalog.warehouse.dir=s3a://mytest/hudi
(6) Configure the file (iceberg-s3.properties) used for a connection using Iceberg format used with S3 as follows:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn
spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO
spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP
spark.hadoop.fs.s3a.region=cn-north-1
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
hive.metastore.schema.verification=false
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=s3a://mytest/iceberg
4. A JRE version of JDK11 or above is required. You need to modify the startup file (startup.bat/start.sh) manually before the connection:
l startup.sh
#!/bin/bash
source [installation directory]/esproc/esProc/bin/setEnv.sh
$EXEC_JAVA $(jvm_args=$(sed -n 's/.*jvm_args=\(.*\).*/\1/p' "$START_HOME"/esProc/bin/config.txt)
echo " $jvm_args") -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp "$START_HOME"/esProc/classes:"$START_HOME"/esProc/lib/*:"$START_HOME"/common/jdbc/* -Duser.language="$language" -Dstart.home="$START_HOME"/esProc com.scudata.ide.spl.EsprocEE
l startup.bat
@echo off
call "[installation directory]\esProc\bin\setEnv.bat"
start "dm" %EXECJAVAW% -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp %START_HOME%\esProc\classes;%RAQCLASSPATH% -Duser.language=zh -Dstart.home=%START_HOME%\esProc -Djava.net.useSystemProxies=true com.scudata.ide.spl.EsprocEE
5. Users can manually change the size of memory if the default size isn’t large enough for needs. Under Windows, make the change in config.txt when starting esProc through .exe file; and in .bat file when starting the application through the .bat file. Make the modification in .sh file under Linux.
To modify the config.txt file under Windows:
java_home=C:\ProgramFiles\Java\JDK1.7.0_11;esproc_port=48773;btx_port=41735;gtm_port=41737;jvm_args=-Xms256m -XX:PermSize=256M -XX:MaxPermSize=512M -Xmx9783m -Duser.language=zh
6. esProc provides functions spark_open(), spark_query(), spark_hudi(),spark_close(), spark_read() and spark_shell() to access the Spark systems. Look them up in【Help】-【Function reference】to find their uses.