Custom scenarios API¶
This is the documentation of the API for Custom scenarios.
A quick description of custom scenarios can be found in Definitions. More details and usage samples are also available in Custom scenarios
The Scenario is the main class you’ll use to interact with DSS in your Python custom scenario.
-
class
dataiku.scenario.
Scenario
¶ Handle to the current (running) scenario.
-
add_report_item
(object_ref, partition, report_item)¶ When used in the code of a custom step, adds a report item to the current step run
-
build_dataset
(dataset_name, project_key=None, build_mode='RECURSIVE_BUILD', partitions=None, step_name=None, async=False, fail_fatal=True)¶ Executes the build of a dataset
Parameters: partitions – Can be given as a partitions spec. Variables expansion is supported
-
build_folder
(folder_id, project_key=None, build_mode='RECURSIVE_BUILD', step_name=None, async=False, fail_fatal=True)¶ Executes the build of a folder
Parameters: folder_id – the identifier of the folder (!= its name)
-
clear_dataset
(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)¶ Executes a ‘clear’ operation on a dataset
Parameters: partitions – Can be given as a partitions spec. Variables expansion is supported
-
clear_folder
(folder_id, project_key=None, step_name=None, async=False, fail_fatal=True)¶ Executes a ‘clear’ operation on a managed folder
-
compute_dataset_metrics
(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)¶ Computes the metrics defined on a dataset
Parameters: partitions – Can be given as a partitions spec. Variables expansion is supported
-
create_jupyter_export
(notebook_id, execute_notebook=False, name=None, async=False)¶ Create a new export from a jupyter notebook
Parameters: - notebook_id – identifier of the notebook
- execute_notebook – should the notebook be executed prior to the export
-
execute_sql
(connection, sql, step_name=None, async=False, fail_fatal=True)¶ Executes a sql query
Parameters: - connection – name of the DSS connection to run the query one
- sql – the query to run
-
get_all_variables
()¶ Returns a dictionary of all variables (including the scenario-specific values)
-
get_build_state
()¶ Gets a handle to query previous builds
-
get_message_sender
(channel_id)¶ Gets a sender for reporting messages, using one of DSS’s Messaging channels
-
get_previous_steps_outputs
()¶ Returns the results of the steps previously executed in this scenario run. For example, if a SQL step ran before in the scenario, and its name is ‘the_sql’, then the list returned by this function will be like:
[ ... { 'stepName': 'the_sql', 'result': { 'success': True, 'hasResultset': True, 'columns': [ {'type': 'int8', 'name': 'a'}, {'type': 'varchar', 'name': 'b'} ], 'totalRows': 2, 'rows': [ ['1000', 'min'], ['2500', 'max'] ], 'log': '', 'endedOn': 0, 'totalRowsClipped': False } }, ... ]
Important note: the exact structure of each type of step run output is not precisely defined, and may vary from a DSS release to another
-
get_trigger_name
()¶ Returns the name (if defined) of the trigger that launched this scenario run
-
get_trigger_params
()¶ Returns a dictionary of the params set by the trigger that launched this scenario run
-
get_trigger_type
()¶ Returns the type of the trigger that launched this scenario run
-
new_build_flowitems_step
(step_name=None, build_mode='RECURSIVE_BUILD')¶ Creates and returns a helper to prepare a multi-item “build” step.
Returns: a BuildFlowItemsStepDefHelper
object
-
package_api_service
(service_id, package_id, transmogrify=False, name=None, async=False)¶ Make a package for an API service.
Parameters: - service_id – identifier of the API service
- package_id – identifier for the created package
- transmogrify – if True, make the package_id unique by appending a number (if not unique already)
-
run_dataset_checks
(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)¶ Runs the checks defined on a dataset
Parameters: partitions – Can be given as a partitions spec. Variables expansion is supported
-
run_global_variables_update
(update_code=None, step_name=None, async=False, fail_fatal=True)¶ Run the code for updating the DSS instance’s variable defined in the global settings.
Parameters: update_code – custom code to run instead of the one defined in the global settings
-
run_scenario
(scenario_id, project_key=None, name=None, async=False)¶ Run a scenario.
Parameters: scenario_id – identifier of the scenario (can be different from its name)
-
run_step
(step, async=False, fail_fatal=True)¶ Run a step in this scenario.
There are 2 behaviors, depending on the value of the parameter ‘async’:
- if async=False (the default), then the function waits until the step has finished running and returns the result of the step
- if async=True, then the function launches a step run and returns immediately a StepHandle object, on which the user will need to call is_done() or wait_for_completion()
-
set_global_variables
(step_name=None, async=False, fail_fatal=True, **kwargs)¶ Sets variables on the DSS instance. The variables are passed as named parameters to this function. For example:
s.set_global_variables(var1=’value1’, var2=True)
will add 2 variables var1 and var2 in the instance’s variables, with values ‘value1’ and True respectively
-
set_project_variables
(project_key=None, step_name=None, async=False, fail_fatal=True, **kwargs)¶ Sets variables on the project. The variables are passed as named parameters to this function. For example:
s.set_project_variables(‘PROJ’, var1=’value1’, var2=True)
will add 2 variables var1 and var2 in the project’s variables, with values ‘value1’ and True respectively
-
set_scenario_variables
(**kwargs)¶ Define additional variables in this scenario run
-
synchronize_hive_metastore
(dataset_name, project_key=None, step_name=None, async=False, fail_fatal=True)¶ Synchronizes the Hive metastore from the dataset definition for a single dataset (all partitions).
-
train_model
(model_id, project_key=None, build_mode='RECURSIVE_BUILD', step_name=None, async=False, fail_fatal=True)¶ Executes the train of a saved model
Parameters: model_id – the identifier of the model (!= its name)
-
update_from_hive_metastore
(dataset_name, project_key=None, step_name=None, async=False, fail_fatal=True)¶ Update a single dataset definition (all partitions) from its table in the Hive metastore .
-
-
class
dataiku.scenario.
BuildFlowItemsStepDefHelper
(scenario, step_name=None, build_mode='RECURSIVE_BUILD')¶ Helper to build the definition of a ‘Build Flow Items’ step. Multiple items can be added
-
add_dataset
(dataset_name, project_key=None, partitions=None)¶ Add a dataset to build
Parameters: - dataset_name – name of the dataset
- partitions – partition spec
-
add_folder
(folder_id, project_key=None)¶ Add a folder to build
Parameters: folder_id – identifier of a folder (!= its name)
-
add_model
(model_id, project_key=None)¶ Add a saved model to build
Parameters: model_id – identifier of a saved model (!= its name)
-
get_step
()¶ Get the step definition
-