Formula language¶

Basic usage
Boolean values
Reading column values
Variables typing
Operators
Array and object operations
Object notations
Array functions
Boolean functions
Date functions
Math functions
Object functions
String functions
Value access functions
Control structures
Tests

DSS includes a language to write formulas, much like a spreadsheet.

Formulas can be used:

In data preparation, to create new columns, filter rows or flag rows
More generally, to filter rows in many places of DSS:
- In the Filtering recipe, to filter rows
- In Machine Learning, to define the extracts to use for train and test set
- In the Python and Javascript APIs, to obtain partial extracts from the datasets
- In the Public API, to obtain partial extracts from the datasets
- In the grouping, window, join and stack recipes, to perform pre and post filtering

Basic usage ¶

Note

We also have a Howto that lists common use cases with examples

Formulas define an expression, that applies row per row.

Assuming that you have a dataset with columns N1 (numeric), N2 (numeric) and S (string) , here are a few example formulas:

2 + 2
N1 + N2
min(N1, N2) # Returns the smallest of N1 and N2
replace(S, 'old', 'new') # Returns the value of S with ‘old’ replaced by ‘new’
if (N1 > N2, 'big', 'small') # Returns big is N1 > N2, small otherwise

Boolean values ¶

The formula language uses “True” and “False”, with quotes, as boolean values for true and false.

Reading column values ¶

In almost all formulas, you’ll need to read the values of the columns for the current row.

When the column has a “simple” name (i.e: starting by a letter, contains only letters, numbers and underscores), you just need to use the name of the column in the fomula: N1 + 4

For other cases, you can use:

strval("column with spaces") : returns the value of “column with spaces” as a string
numval("column with spaces") : returns the value of “column with spaces” as a number

Variables typing ¶

Variables in the formula language can have one of the following types: string, integer, decimal, array, object, boolean, date

Regardles of the schema of the dataset, or the meanings of the columns (in the case of working on dynamically-generated columns), the following rules are applied:

If column values are “standard” decimal, they are automatically parsed to decimal
If column values are “standard” integer, they are automatically parsed to integer
Else, they are kept as string

Columns containing dates, arrays, objects or booleans are not automatically converted to the matching Formula type. However:

all functions that require an array will automatically attempt to convert a string input to array (using the regular DSS JSON syntax)
all functions that require a date will automatically attempt to convert a string input to date (using the ISO-8601 format)

In other words, if the column “begin_date” is a date (and thus contains properly-formatted ISO-8601):

type(begin_date) returns “string”
inc(begin_date, 2, "days") works as expected because the inc function performed automatic conversion to date
To manually obtain an array or object, use the parseJson function
To manually obtain a date, use the asDate function

Operators ¶

The formula language supports the classical arithmetic operators:

Regular math operators: +, -, *, /
Comparison operators (evaluate to booleans): >, >=, <, <=, ==, !=
Arithmetic operators: // (integer division) and % (modulo)
Boolean operators: &&, ||
The + operator also performs string concatenation.
The comparison operators (and only them) can operate on dates. For dates arithmetic, see the diff and inc functions.

Array and object operations ¶

Formula support accessing array elements and object keys using the traditional Python/Javascript syntax:

array[0]
object["key"]
object.key (only valid if ‘key’ is a valid identifier, i.e. matches [A-Za-z0-9_]*)

Note: this requires that you actually have an array or object. You might need to use the parseJson function (see above paragraph about typing)

Object notations ¶

For all functions, you can use them the “regular” way: replace(str, 'a', 'b') or in the “object” way: str.replace('a', 'b')

In object notation, the first argument to the function is replaced by its ‘context’.

For example, the two syntaxes are equivalent:

length(trim(replace(foo, 'a', 'b')))

foo.replace('a', 'b').trim().length()

Array functions ¶

arrayContains(array a, item) boolean¶

Returns whether the array a contains item

arrayDedup(array a) array¶

Returns array a with duplicates removed

arrayIndexOf(array a, item) int¶

the index (starting from 0) of item in the array a. Return -1 if item is not in a

arrayLen(array a) int¶

Returns the length of the array a

arrayReverse(array a) array¶

Reverses array a

arraySort(array a) array¶

Sorts array a

get(o, number or string from, optional number to) Depends on actual arguments¶

If o has fields, returns the field named ‘from’ of o. If o is an array, returns o[from, to]. if o is a string, returns o.substring(from, to)

join(array a, string sep) string¶

Returns the string obtained by joining the elements of array a with the separator sep

For example, the expression join(date_elements, '-') on a column date_elements produces:

date_elements	output
date_elements	output
[2007, 7, 15]	2007-7-15
[2007, 7, 15]	2007-7-15
[2016, 1, 8]	2016-1-8
[2016, 1, 8]	2016-1-8

objectKeys(object o) array¶

Returns the keys of an object as an array

objectValues(object o) array¶

Returns the keys of an object as an array

slice(o, number from, optional number to) Depends on actual arguments¶

If o is an array, returns o[from, to]. If o is a string, returns o.substring(from, to)

substring(o, number from, optional number to) Depends on actual arguments¶

If o is an array, returns o[from, to]. If o is a string, returns o.substring(from, to)

Boolean functions ¶

and(boolean a, boolean b) boolean¶

Returns a AND b

not(boolean b) boolean¶

Returns the opposite of b

or(boolean a, boolean b) boolean¶

Returns a OR b

Returns o converted to a date. If you don’t give a format, ISO-8601 is used. You can give an ordered list of possible formats using this syntax: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

datePart(date d, string part) number¶

Returns part of a date. The available parts are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’), ‘weekday’ (the name of the day of the week, capitalized), ‘time’ (number of milliseconds since epoch)

Returned parts are always in local timezone and english

diff(o1, o2, optional string unit) number¶

Returns the difference between two dates in given time units. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

inc(date d, number value, string unit) date¶

Returns a date incremented or decremented by the given amount in the given unit of time. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

now() date¶

Returns the current time

trunc(date d, string unit) date¶

Truncates a date to a given unit. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

Math functions ¶

abs(number d) number¶

Returns the absolute value of a number

acos(number d) number¶

Returns the arc cosine of an angle, in the range 0 through PI

asin(number d) number¶

Returns the arc sine of an angle in the range of -PI/2 through PI/2

atan(number d) number¶

Returns the arc tangent of an angle in the range of -PI/2 through PI/2

atan2(number x, number y) number theta¶

Converts rectangular coordinates (x, y) to polar (r, theta)

ceil(number d) number¶

Returns the ceiling of a number

combin(number n, number k) number¶

Returns the number of combinations for n elements as divided into k

cos(number d) number¶

Returns the trigonometric cosine of an angle

cosh(number d) number¶

Returns the hyperbolic cosine of a value

dec2hex(long) string¶

Returns an hexadecimal representation of the input number

degrees(number d) number¶

Converts an angle from radians to degrees.

even(number d) number¶

Rounds the number up to the nearest even integer

exp(number n) number¶

Returns the exponential of a number

fact(number i) number¶

Returns the factorial of a number

factn(number i) number¶

Returns the factorial of a number

floor(number d) number¶

Returns the floor of a number

gcd(number d, number e) number¶

Returns the greatest common denominator of the two numbers

hash(string) long¶

Returns a 64 bits numerical hash of the input (not crypto-secure)

hex2dec(string) long¶

Returns a decimal representation of an hexadecimal string

lcm(number d, number e) number¶

Returns the greatest common denominator of the two numbers

ln(number n) number¶

Returns the natural log of a number

log(number n) number¶

Returns the base 10 log of a number

max(number a, number b) number¶

Returns the greater of two numbers

min(number a, number b) number¶

Returns the smaller of two numbers

mod(number a, number b) number¶

Returns a modulus b

multinomial(number d1, number d2 …) number¶

Calculates the multinomial of a series of numbers

odd(number d) number¶

Rounds the number up to the nearest odd integer

pow(number a, number b) number¶

Returns a to the power of b

quotient(number numerator, number denominator) number¶

Returns the integer portion of a division

radians(number d) number¶

Converts an angle in degrees to radians

rand(optional int min, optional int max) double or int¶

Without arguments, returns a random float between 0 and 1. With min and max arguments, returns a random integer between min (inclusive) and max (exclusive)

round(number n) number¶

Returns the rounding of number to the nearest integer

sin(number d) number¶

Returns the trigonometric sine of an angle

sinh(number d) number¶

Returns the hyperbolic sine of an angle

sum(array a) number¶

Sums the numbers of an array. Skips non-number elements from the array.

tan(number d) number¶

Returns the trigonometric tangent of an angle

tanh(number d) number¶

Returns the hyperbolic tangent of a value

toNumber(o) number¶

Returns o converted to a number

Object functions ¶

hasField(o, string name) boolean¶

Returns whether o has field name

htmlAttr(Element e, String s) string¶

Selects a value from an attribute on an Html Element

htmlText(Element e) string¶

Selects the text from within an element (including all child elements)

innerHtml(Element e) string¶

The innerHtml of an HTML element

jsonize(value) JSON literal value¶

Quotes a value as a JSON literal value

objectDel(key, [key…]) object¶

Removes one or several keys from an object and returns it. The keys must not be null

objectNew(k1, v1, k2, v2, …) object¶

Creates a new object, optionally pre-filled with key/values. Must get an even number of arguments, as successive key-value pairs. Giving 0 arguments is possible and will return an empty object (you can use objectPut to add to it)

objectPut(object o, key, value) object¶

Adds a key/value pair to an object and returns it. key must not be null

ownText(Element e) string¶

Gets the text owned by this HTML element only; does not get the combined text of all children.

parseHtml(string s) HTML object¶

Parses a string as HTML

parseJson(string s) object or array¶

Parses a JSON string as an object or array

select(Element e, String s) HTML Elements¶

Selects an element from an HTML elementn using selector syntax

type(object o) string¶

Returns the type of o

String functions ¶

chomp(string str, string tail) string¶

Removes tail from the end of str if it’s there, otherwise leave it alone.

contains(string s, string frag) boolean¶

Returns whether s contains frag

endsWith(string s, string sub) boolean¶

Returns whether s ends with sub

escape(string s, string mode) string¶

Escapes a string depending on the given escaping mode. Supported modes: ‘html’,’xml’,’csv’,’url’,’javascript’

format(string format, object… args) string¶

Formats a string using printf-like formatting. For example '%4d-%02d'.format(2004,2) gives '2004-02'.

This uses the Java Formatter syntax

fromBase64(string s, optional string charset) string¶

Returns the string whose Base64 representation is given. By default, the string is read using the UTF-8 charset.

indexOf(string s, string sub) number¶

Returns the index of the first ocurrence of sub in s. Returns -1 if there is no such occurrence.

lastIndexOf(string s, string sub) number¶

Returns the index of the last ocurrence of sub in s. Returns -1 if there is no such occurrence.

length(array or string o) number¶

Returns the length of o

match(string, string or regexp) array of strings¶

Returns an array of the groups matching the given regular expression

For example, match('hello world', 'he(.*).*(rl)d') returns ['ll', 'rl']

md5(string s) string¶

Returns the MD5 hash of a string

partition(string s, string or regex frag, optional boolean omitFragment) array¶

Returns an array of strings [a,frag,b] where a is the part before the first occurrence of frag in s and b is the part after the occurrence. If omitFragment is true, frag is not returned in the array.

replace(string s, string or regex f, string replacement) string¶

Replaces all occurrences of a substring or regex in a string

replaceChars(string s, string f, string r) string¶

Returns the string obtained by replacing all chars in f with the char in s at that same position

For example, replaceChars('abcba', 'bc', 'yz') returns ayzya The function can be used to delete characters, by replacing them by nothing, like in: replaceChars('abcba', 'bc', 'y') returns ayya

rpartition(string s, string or regex frag, optional boolean omitFragment) array¶

Returns an array of strings [a,frag,b] where a is the part before the last occurrence of frag in s and b is the part after the occurrence. If omitFragment is true, frag is not returned in the array.

sha1(string s) string¶

Returns the SHA-1 hash of a string

split(string s, string or regex sep, optional boolean preserveAllTokens) array¶

Returns the array of strings obtained by splitting s with separator sep. If preserveAllTokens is true, then empty segments are preserved.

splitByCharType(string s) array¶

Returns an array of strings obtained by splitting s grouping consecutive chars by their unicode type

splitByLengths(string s, number length1, …) array¶

Returns the array of strings obtained by splitting a string into substrings with the given lengths

For example, splitByLengths('abcdefgh', 2, 3, 1) returns ['ab','cde','f'] The function generates one element in the array for each length passed as parameter. If the input string is not long enough, the last elements are empty.

startsWith(string s, string sub) boolean¶

Returns whether s starts with sub

strip(string s) string¶

Returns copy of the string, with leading and trailing whitespace omitted.

toBase64(string s, optional string charset) string¶

Returns the Base64 representation of a string. By default, the string is written using the UTF-8 charset.

toLowercase(string s) string¶

Converts a string to lowercase

toString(o, string format (optional)) string¶

Returns o converted to a string

toTitlecase(string s) string¶

Converts a string to titlecase

toUppercase(string s) string¶

Converts a string to uppercase

trim(string s) string¶

Returns copy of the string, with leading and trailing whitespace omitted.

unescape(string s, string mode) string¶

Unescapes all escaped parts of the string depending on the given escaping mode. Available modes: ‘html’,’xml’,’csv’,’url’,’javascript’

unicode(string s) string¶

Returns the input string as an array of the unicode codepoints (numbers)

unicodeType(string s) string¶

Returns an array of strings describing each character of the input string in their full unicode notation

Value access functions ¶

numval(object o) number¶

Returns the numerical value of a given cell or expression. If the value of the cell or expression is not numerical, an empty value is returned.

For example, with columns type (with values “high” and “low”), cost_high and cost_low, the expression numval('cost_' + type) produces:

cost_low	cost_high	type	output
cost_low	cost_high	type	output
200	500	low	200.0
200	500	low	200.0
100	400	high	400.0
100	400	high	400.0
xx	400	low
xx	400	low

strval(object o, optional string defaultValue) string¶

Returns the numerical value of a given cell or expression, or the default value if the value of the cell or expression is empty.

For example, with columns type (with values “high” and “low”), code_high and code_low, the expression strval('code_' + type, 'N/A') produces:

code_low	code_high	type	output
code_low	code_high	type	output
200	500	low	200
200	500	low	200
100	4x0	high	4x0
100	4x0	high	4x0
	400	low	N/A
	400	low	N/A

Control structures ¶

Control structures allow you to perform advanced operations.

Beware: control structures cannot use object notation !

filter(array a, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array if the result is truish.

Note

filter(myarray, v, v < 2) is equivalent to this Python syntax: [v for v in myarray if v < 2] and returns the array with only the elements below 2

forEeach(array a, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array.

Note

forEach(myarray, v, v + 2) is equivalent to this Python syntax: [v+2 for v in myarray]

forEachIndex(array a, variable i, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v and its index to variable name i, evaluates expression e, and pushes the result onto the result array.

Note

forEachIndex(myarray, i, v, v + i) is equivalent to this Javascript syntax: myarray.map(function(v, i) { return v+i ;})

forRange(from, to, step, variable v, expression e) (returns: array)¶

Iterates, tarting at from, incrementing by step each time while less than to. At each iteration, binds the variable v to the iteration value, evaluates expression e, and pushes the result onto the result array.

Note

forRange(0, 100, 3, v, v * 2) is equivalent to this Python syntax: [v * 2 for v in xrange(0, 100, 3)] and returns [0, 6, 12, 18 …. 198]

if(boolean, expression_true, expression_false) (returns: anything)¶

Evaluates to expression_true if the condition is true, to expression_false otherwise

objectFilter(expression a, variable k, variable v, expression test) object¶

Evaluates expression a to an object. Then for each element (k, v) of this object, binds its key to variable name k, its value to variable name v, evaluates expression test which should return a boolean. If the boolean is true, pushes (k, v) onto the result object.

with(expression o, variable v, expression e) (returns: any)¶

Evaluates expression o and binds its value to variable name v. Then evaluates expression e and returns that result.

The with control allows you to “split” a very big expression into more manageable chunks. It also makes reusing the result of a complex computation easier to read and faster to process.

Tests ¶

Beware: tests cannot use object notation

isBlank(expression o) boolean¶

Returns whether o is null or an empty string

isError(expression o) boolean¶

Returns whether o is an error

isNonBlank(expression o) boolean¶

Returns whether o is not null and not an empty string

isNotNull(expression o) boolean¶

Returns whether o is not null

isNull(expression o) boolean¶

Returns whether o is null

isNumeric(expression o) boolean¶

Returns whether o can represent a number

Formula language¶

arrayContains(array a, item) boolean¶

arrayDedup(array a) array¶

arrayIndexOf(array a, item) int¶

arrayLen(array a) int¶

arrayReverse(array a) array¶

arraySort(array a) array¶

get(o, number or string from, optional number to) Depends on actual arguments¶

join(array a, string sep) string¶

objectKeys(object o) array¶

objectValues(object o) array¶

slice(o, number from, optional number to) Depends on actual arguments¶

substring(o, number from, optional number to) Depends on actual arguments¶

and(boolean a, boolean b) boolean¶

not(boolean b) boolean¶

or(boolean a, boolean b) boolean¶

asDate(o, format1, format2, … (all optional)) date¶

datePart(date d, string part) number¶

diff(o1, o2, optional string unit) number¶

inc(date d, number value, string unit) date¶

now() date¶

trunc(date d, string unit) date¶

abs(number d) number¶

acos(number d) number¶

asin(number d) number¶

atan(number d) number¶

atan2(number x, number y) number theta¶

ceil(number d) number¶

combin(number n, number k) number¶

cos(number d) number¶

cosh(number d) number¶

dec2hex(long) string¶

degrees(number d) number¶

even(number d) number¶

exp(number n) number¶

fact(number i) number¶

factn(number i) number¶

floor(number d) number¶

gcd(number d, number e) number¶

hash(string) long¶

hex2dec(string) long¶

lcm(number d, number e) number¶

ln(number n) number¶

log(number n) number¶

max(number a, number b) number¶

min(number a, number b) number¶

mod(number a, number b) number¶

multinomial(number d1, number d2 …) number¶

odd(number d) number¶

pow(number a, number b) number¶

quotient(number numerator, number denominator) number¶

radians(number d) number¶

rand(optional int min, optional int max) double or int¶

round(number n) number¶

sin(number d) number¶

sinh(number d) number¶

sum(array a) number¶

tan(number d) number¶

tanh(number d) number¶

toNumber(o) number¶

hasField(o, string name) boolean¶

htmlAttr(Element e, String s) string¶

htmlText(Element e) string¶

innerHtml(Element e) string¶

jsonize(value) JSON literal value¶

objectDel(key, [key…]) object¶

objectNew(k1, v1, k2, v2, …) object¶

objectPut(object o, key, value) object¶

ownText(Element e) string¶

parseHtml(string s) HTML object¶

parseJson(string s) object or array¶

select(Element e, String s) HTML Elements¶

type(object o) string¶

chomp(string str, string tail) string¶

contains(string s, string frag) boolean¶

endsWith(string s, string sub) boolean¶

escape(string s, string mode) string¶

format(string format, object… args) string¶

fromBase64(string s, optional string charset) string¶

indexOf(string s, string sub) number¶