esProc

Tutorial

Function Reference

Code Reference

User Reference

External Library Guide

Data File Tool Manual

DQL Tutorial

Cluster Server Manager Manual

SPL WIN Manual
YModel

User Reference

JSON-style Parameter Guide
ReportLite

User Reference
NaturalSPL

User Reference
Official Website
http://doc.esproc.com/WEB-INF/layout/application.jsp1

Overview
Function

abs

acos

asc

asin

atan

avg

between

case

ceiling

char

code_find

concat

contain

cos

count

date

datetime

day

day_of_year

elapse

exp

fill

findsubs

floor

getsubs

hour

icount

if

ifn

integer

interval

isalpha

isdigit

islower

isnull

isupper

left

len

lg

like

ln

lower

max

mid

min

minute

mod

month

month_days

month_end

month_start

notnull

now

number

nvl

pad

parse

pi

power

quarter

quarter_days

quarter_end

quarter_start

rand

rands

real

replace

right

round

second

sign

sin

sqrt

string

sum

tan

time

today

trim

upper

week

week_end

week_start

week_of_year

year

year_days

year_end

year_start

ymonth
Action

align

array

calculate

compute

const

derive

distinct

expand

file

filter

join

key

list

match

pivot

rank

segment

set

sort

SQL

summarize

table

Previous | Next

summarize

Read（63） Like（0） Label: summarize, aggregation,

Mode Determination (Highest Priority, Must Execute First)

Summarize has two mutually exclusive aggregation modes: 1. single aggregation 2. double aggregation

Single aggregation means: a summary column (a set of summary parameters) executes only one value aggregation algorithm, i.e., an aggregation algorithm that reduces multiple records to a single value, specifically: first, last, sum, avg, count, icount, variance, std_deviation, max, min. Syntax:
[summarize {[condition <filter_condition>] [<aggregated_expression>] [value_aggregation_algorithm] [delimeter <symbol>] [as <new_column_name>]}] [group {[<group_expression>] [as <new_column_name>]}]

Double aggregation means: a summary column (a set of summary parameters) needs to perform two aggregations. The first is a record aggregation algorithm, which reduces multiple records to one record or multiple records with the same value, specifically: argmax, argmin. For example, the record with the maximum order amount; when the maximum value is not repeated, the result is one record; when the maximum value is repeated, the result is multiple records. After the first aggregation algorithm, the second value aggregation algorithm (the aggregation algorithm that reduces multiple records to a single value) must be immediately executed to get the final result. For instance, the first aggregation finds multiple records with the maximum order amount, and the second aggregation calculates the sum of amounts from these records; the complete algorithm is to find the sum of amounts of the records with the maximum order amount. NLC's double aggregation is similar to SQL's KEEP function. Syntax:
[summarize {[condition <filter_condition>] [<aggregated_expression>] [record_aggregation_algorithm] [<aggregated_expression>] [value_aggregation_algorithm] [delimeter <symbol>] [as <new_column_name>]}] [group {[<group_expression>] [as <new_column_name>]}]

Determination Rules (Must Execute Strictly):

If the aggregation calculation expressed by the natural language includes: argmax, argmin

→ must use double aggregation.

Conversely, if the aggregation calculation expressed by the natural language does not include: argmax, argmin, but only: first, last, sum, avg, count, icount, variance, std_deviation, max, min.

→ must use single aggregation

If both appear or semantics conflict:

→ output error, guessing is not allowed

If argmax or argmin is not explicitly expressed:

→ default to single aggregation.

Parameter Structure Description

The parameters of this action are composed of two parts: summary and group. The summary part's parameter name is "summary", must be omitted. It consists of one or more sets of identically structured parameters, each set representing a summary column, composed of 5 (single aggregation) or 7 parameters (double aggregation). For single aggregation, one set of parameters is: [condition <filter_condition>] [<aggregated_expression>] [value_aggregation_algorithm] [delimeter <symbol>] [as <new_column_name>]; for double aggregation, one set of parameters is: [condition <filter_condition>] [<aggregated_expression>] [record_aggregation_algorithm] [<aggregated_expression>] [value_aggregation_algorithm] [delimeter <symbol>] [as <new_column_name>]. The group part's parameter name is "group", cannot be omitted, and also consists of one or more sets of identically structured parameters, each set representing a group column, composed of 2 parameters: [<group_expression>] [as <new_column_name>]. First, explain the summary part parameters.

Parameter Descriptions

Parameter: aggregated_expression

The expression targeted when performing aggregation calculation on data within a group, usually an expression related to the original column, including a single column (a type of expression). For example: UnitPrice*Quantity, an expression containing multiple columns; Amount, a single column. Required parameter; type is expression; parameter name must be omitted, parameter value cannot be omitted. Note, this parameter must be used together with the value_aggregation_algorithm or record_aggregation_algorithm. A certain summary column has only one aggregated_expression parameter for single aggregation, and definitely two aggregated_expression parameters for double aggregation. Note, this parameter does not support cross-row calculation and aggregation calculation, i.e., the expression cannot contain relative position calculations like F[i], F[a:b], nor aggregate calculations like sum, average of a set.

Parameter: value_aggregation_algorithm

A fixed algorithm that aggregates data within a group to produce a single numeric value. Required parameter; enum type; parameter name must be omitted, parameter value cannot be omitted. This parameter must be used together with the aggregated_expression parameter.

The enum values are as follows:

first, last: i.e., the first item, the last item, usually only meaningful with prior sorting.

sum, avg, count, icount, variance, std_deviation, max, min: understood by common sense.

count-icount: calculate whether duplicates exist, i.e., "count minus icount", equals 0 means no duplicates, greater than 0 means duplicates exist.

concat: concatenate members with a delimiter, which is specified by the [delimeter<symbol>] parameter. Note, other aggregation algorithms do not require a delimiter.

> Find the maximum and minimum amounts of Order_example_table.

NLC: summarize Amount max, Amount min

Result:

maxAmount minAmount

20000 231

Explanation: maxAmount and Amount min are the column names of the summary result, automatically generated by the system. If you want to specify column names, use the as <new_column_name> parameter.

Parameter: record_aggregation_algorithm

A fixed algorithm that aggregates data within a group to produce a single record or multiple records with the same value. Required parameter; enum type; parameter name must be omitted, parameter value cannot be omitted. This parameter is definitely for double aggregation, must be used together with the aggregated_expression parameter and the value_aggregation_algorithm parameter.

The enum values are as follows:

argmax, argmin: the record corresponding to the maximum or minimum of the aggregated_expression; if multiple maximums exist, multiple records are returned.

Example:

Based on Order_example_table, find the maximum amount and the sum of amounts for the record with the latest order date.

NLC: summarize OrderDate max Amount max, Amount sum

Result:

maxOrderDatemaxAmount sumAmount

20000 4500000

Explanation: "maxOrderDatemaxAmount" and "sumAmount" are the column names of the summary result, automatically generated by the system. If you want to specify column names, use the as <new_column_name> parameter.

Parameter: delimeter <symbol>

When the aggregation algorithm is concat, use this parameter to concatenate the members within the group (specified by the aggregated_expression parameter). Optional parameter, default parameter value is null; type is string; parameter name cannot be omitted.

Example:

Concatenate the ClientID field of Order_example_table using commas

NLC: summarize ClientID concat delimeter comma

Result:

concatClientID

WVF,UFS,JFS,DSG,JFE,OLF,PAER,DY,JDR,WBG,GCD,WVF,UFS

Parameter: condition <filter_condition>

Before aggregation, records within a group can be filtered first. Optional parameter; type is conditional expression; parameter name cannot be omitted.

Example:

Filter records that meet the conditional expression "OrderDate year==2019 or OrderDate year==2020", then find the record with the maximum amount

NLC: summarize Amount argmax

Result:

OrderID ClientID SellerId Amount OrderDate

87 WF 15 19000 2019-03-04

42 VET 12 19000 2020-04-08

Parameter: group_expression

The expression used for grouping, can be a single field (a type of expression). For example, calculate the first 6 digits of the ID field as the grouping field "Region". When grouping by multiple columns, multiple sets of parameters are needed, each with one group column parameter. Required parameter (when the group part's parameters exist); type is expression; parameter name must be omitted. Note, this parameter does not support cross-row calculation and aggregation calculation, i.e., the expression cannot contain relative position calculations like F[i], F[a:b], nor aggregate calculations like sum, average of a set.

Example:

Group the OrderAnalysis_table by Year and ClientID, find the maximum and minimum amounts for each group.

NLC: summarize Amount max, Amount min; group Year, ClientID

Result:

Year ClientID maxAmount minAmount

2019 WF 19000 2100

2020 WF 12800 2200

2021 WF 14100 1300

2019 TEF 13000 2200

2020 ETF 23000 3100

Parameter: as <new_column_name>

Summary results and group columns can have new column names automatically generated by the system, or specified using this parameter. Optional parameter; type is (column) identifier; parameter name cannot be omitted. Note that the group part also has a parameter with the same name, with similar meaning and usage.

Example:

Find the maximum and minimum amounts of Order_example_table, named LargeOrderAmount and SmallOrderAmount respectively.

NLC: summarize Amount max, Amount min

Result:

LargeOrderAmount SmallOrderAmount

20000 231

Note: if the as <new_column_name> parameter is omitted, it means automatic naming.