compute

Read(1) Label: formula, summary, cross,

Syntax:
 {<formula> [summary] [cross] [as <new_column_name>]} [partition]

Key Rules

Ø  The parameters [summary] and [cross] are mutually exclusive, only one can be chosen, cannot appear together.

Ø  {} indicates that it can be repeated multiple times. In the syntax of this action, it specifically means multiple computed columns can be created, each with its own set of parameters <formula> [summary] [cross] [as <new_column_name>]. Different computed columns and <partition> are separated by semicolons.

Parameter: <formula>

Usually an expression based on one or more fields, also allowing special expressions like constants and null values. Required parameter; expression type; parameter name must be omitted.

Example:
calculate the target amount for Order_example_table using the expression "if OrderDate year equals 2022 and OrderDate month equals 3 then Amount*1.1".

NLC: compute (if OrderDate year equals 2022 and OrderDate month equals 3 then Amount*1.1) , as TargetAmount

Parameter: [summary]

[summary] is to continue with an aggregation calculation based on the parameter <formula>. Note that any <formula> can use this parameter; when <formula> is of the relative interval F[a:b] form (a != b), the value is a set, and this parameter must be used (the parameter [cross] cannot be used), e.g., OrderAmount[-2:1], which represents the set of OrderAmount from the 2nd record before the current position to the 1st record after, a total of 4 records; when the <formula> is not of the relative interval form, this parameter is not mandatory (i.e., choose one between this parameter and [cross]), e.g., OrderAmount, OrderAmount*0.1, UnitPrice*Quantity. When using non-relative-interval (non-set) aggregation within the same partition, each row may produce the same calculation result; when using relative-interval (set) aggregation, the calculation results of each row are usually different. Without the partition parameter, it can be considered as having only one partition. Optional parameter; enum type; parameter name must be omitted, parameter value cannot be omitted. The explanations of the enum values are as follows:

first, last: the first and last item respectively, often require prior sorting.

sum, avg, count, icount, variance, std_deviation, max, min: understood by common sense. When the parameter value (enum value) of [summary] is count, the <formula> can be omitted, or assigned as * (asterisk), to comply with SQL conventions.

count-icount: i.e., "count minus icount", equals 0 means no duplicates, greater than 0 means duplicates exist.

concat: concatenate the members of a set using null into a large string.

Example:

Concatenate the ClientID of each salesperson in Order_example_table into a new column "AllClientIDsOfSales"

NLC: compute ClientID, concat, as AllClientIDsOfSales; partition Salesperson

Example:

Calculate the 5-day moving average for a stock price table, fill into new column MA5.

NLC: compute ClosePrice[-2:2] , avg, as MA5

Example:
for each customer in Order_example_table, use the Amount field to calculate the total order amount for each customer, and fill it into the new column "TotalAmountPerCustomer" for each record.

NLC: compute Amount, sum, as TotalAmountPerCustomer; partition ClientID

Example:
Based on the previous example, continue calculating, write the result of the expression "TotalAmountPerCustomer*0.1" into a new column "Reward".

NLC: compute Reward*0.1, as Reward  //The result is the same whether partitioned or not.

Note that the above two NLC statements can be combined into the following one (not all cases can be substituted), changing the calculation order but the result remains the same.

NLC: compute Amount*0.1, sum, as Reward; partition ClientID

 

Parameter: [cross]

Performs cross-row calculation based on the parameter <formula>. Unlike aggregation, the calculation result for each row within a partition is different. Optional parameter; enum type; parameter name must be omitted, parameter value cannot be omitted. The enum values are as follows:

proportion: understood by common sense.

inc, inc_ratio: inc_ratio is the growth ratio of the current item compared to the previous item, value/value[-1]-1.

cum, cum_proportion: cum_proportion is the proportion of the current cumulative value to the total sum.

Example:
Calculate the proportion of each customer's order amount to the total amount of that customer in Order_example_table.

NLC: compute Amount, proportion, as AmountProportion; partition ClientID

Partial results:

OrderID  ClientID  SellerId  Amount  OrderDate  AmountProportion

136  ARO 25  899.0  2024-09-23  1.0

16  BDR 27  2464.8  2022-04-30  0.5760493596335421

81  BDR 29  1168.0  2023-08-25  0.2729737309526035

108  BDR 12  480.0  2024-04-03  0.11218098532298774

139  BDR 30  166.0  2024-10-11  0.038795924090866594

93  BON  6  2564.4  2023-11-04  1.0

14  BSF  26  448.0  2022-04-12  0.031124079477560095

79  BSF  9  982.0  2023-08-10  0.06822287064054468

106  BSF  27  10741.6  2024-03-10  0.7462553841878561

137  BSF  18  2222.4  2024-09-26  0.15439766569403918

Example:
Create multiple computed columns, newField1 is the proportion of each customer's order amount to the total amount of that customer, newField2 is the average of each customer's order amount.

NLC: compute Amount, proportion, as newField1; Amount, avg, as newField2; partition ClientID

Parameter: [as <new_column_name>]

The new column name created for the computed column. Required parameter; type is column identifier; parameter name cannot be omitted.

Parameter: [partition]

Calculate by partition, partitions do not affect each other, similar to SQL's PARTITION BY. Optional parameter; identifier type; parameter name cannot be omitted.

Example:
For Order_example_table already sorted by OrderDate, for each customer's records use the expression " Amount[-1]*1.1" to perform cross-row calculation, and fill the result into the new column TargetAmount, ensuring customers do not affect each other.

NLC: compute Amount[-1]*1.1, as TargetAmount; partition ClientID.