Wrangle Class

class ds_discovery.components.wrangling.Wrangle(property_manager: Any, intent_model: Any, default_save: bool | None = None, reset_templates: bool | None = None, template_path: str | None = None, template_module: str | None = None, template_source_handler: str | None = None, template_persist_handler: str | None = None, align_connectors: bool | None = None)
add_column_description(column_name: str, description: str, save: bool | None = None)

adds a description note that is included in with the ‘report_column_catalog’

add_connector_contract(connector_name: str, connector_contract: ConnectorContract, template_aligned: bool | None = None, save: bool | None = None)

Sets a named connector contract

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • connector_contract – a Connector Contract for the properties persistence

  • template_aligned – the connector aligns with the template so changes to the template

  • save – override of the default save action set at initialisation.

Returns:

if load is True, returns a Pandas.DataFrame else None

add_connector_from_template(connector_name: str, uri_file: str, template_name: str, save: bool | None = None, **kwargs)

Adds a connector using settings from a template connector. By default a self.TEMPLATE_SOURCE and self.TEMPLATE_PERSIST are added at initialisation

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • uri_file – the name of the file to append to the end of the default path

  • template_name – the name of the template connector

  • save – override of the default save action set at initialisation.

  • kwargs – any kwargs to add to the default connector

Returns:

add_connector_persist(connector_name: str, uri_file: str, save: bool | None = None, **kwargs)

Adds a connector using settings from the self.TEMPLATE_PERSIST template connector. self.TEMPLATE_PERSIST are added at initialisation

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • uri_file – the name of the file to append to the end of the default path

  • save – override of the default save action set at initialisation.

  • kwargs – any kwargs to add to the default connector

Returns:

add_connector_source(connector_name: str, uri_file: str, save: bool | None = None, **kwargs)

Adds a connector using settings from the self.TEMPLATE_SOURCE template connector.

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • uri_file – the name of the file to append to the end of the default path

  • save – override of the default save action set at initialisation.

  • kwargs – any kwargs to add to the default connector

Returns:

add_connector_uri(connector_name: str, uri: str, save: bool | None = None, template_aligned: bool | None = None, **kwargs)

Sets the contract giving the full uri path. This is a shortcut of set_source_contract(…), not requiring a ConnectorContract to be set up and using the default module and handler values.

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • uri – a fully qualified uri of the source data

  • template_aligned – the connector aligns with the template so changes to the template

  • save – (optional) if True, save to file. Default is True

add_intent_level_description(level: [<class 'int'>, <class 'str'>], text: str, save: bool | None = None)

sets description to the augmented knowledge ‘intent’ to a level

Parameters:
  • level – the intent level to add the comment to

  • text – the description text

  • save – (optional) override of the default save action set at initialisation.

add_notes(catalog: str, label: [<class 'str'>, <class 'list'>], text: str, constraints: list | None = None, save=None)
adds a note to the augmented knowledge.

if no label is given then a journal date of ‘year-month’ is provided if no catalog is given then the default catalogue name is given

Parameters:
  • catalog – a catalog name

  • label – a sub key label or list of labels to separate different information strands

  • text – the text to add

  • constraints – (optional) a list of allowed label values, if None then any value allowed

  • save – if True, save to file. Default is True

add_run_book(run_levels: [<class 'str'>, <class 'list'>], book_name: str | None = None, save: bool | None = None)

sets a named run book, the run levels are a list of levels and the order they are run in

Parameters:
  • run_levels – the name or list of levels to be run

  • book_name – (optional) the name of the run_book. defaults to ‘primary_run_book’

  • save – (optional) override of the default save action set at initialisation.

add_run_book_level(run_level: str, book_name: str | None = None, save: bool | None = None)

adds a single runlevel to the end of a run_book. If the name already exists it will be replaced

Parameters:
  • run_level – the run_level to add.

  • book_name – (optional) the name of the run_book. defaults to ‘primary_run_book’

  • save – (optional) override of the default save action set at initialisation.

backup_canonical(connector_name: str, canonical: Any, uri: str, **kwargs)

persists the canonical to the referenced connector as a backup using the URI to replace the current Connector Contract URI.

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • canonical – the canonical data to persist

  • uri – an alternative uri to the one in the ConnectorContract

  • kwargs – arguments to be passed to the handler on persist

static canonical_report(canonical, stylise: bool = True, inc_next_dom: bool = False, report_header: str | None = None, condition: str | None = None)

The Canonical Report is a data dictionary of the canonical providing a reference view of the dataset’s attribute properties

Parameters:
  • canonical – the DataFrame to view

  • stylise – if True present the report stylised.

  • inc_next_dom – (optional) if to include the next dominate element column

  • report_header – (optional) filter on a header where the condition is true. Condition must exist

  • condition – (optional) the condition to apply to the header. Header must exist. examples: ‘ > 0.95’, “.str.contains(‘shed’)”

Returns:

create_snapshot(suffix: str | None = None, version: str | None = None, save: bool | None = None)

creates a snapshot of contracts configuration. The name format will be <contract_name>_#<suffix>.

Parameters:
  • suffix – (optional) adds the suffix to the end of the contract name. if None then date & time used

  • version – (optional) changes the version number of the current contract

  • save – override of the default save action set at initialisation.

Returns:

a list of current contract snapshots

delete_snapshot(snapshot_name: str, save: bool | None = None)

deletes a snapshot

Parameters:
  • snapshot_name – the name of the snapshot

  • save – override of the default save action set at initialisation.

Returns:

True if successful, False is not found or not deleted

property discover: DataDiscovery

The components instance

classmethod discovery_pad() DataDiscovery

A class method to use the Components discovery methods as a scratch pad

classmethod from_env(task_name: str, default_save=None, reset_templates: bool | None = None, align_connectors: bool | None = None, default_save_intent: bool | None = None, default_intent_level: bool | None = None, order_next_available: bool | None = None, default_replace_intent: bool | None = None, uri_pm_repo: str | None = None, has_contract: bool | None = None, **kwargs)

Class Factory Method that builds the connector handlers taking the property contract path from the os.environ['HADRON_PM_PATH'] or, if not found, uses the system default,

  • for Linux and IOS ‘/tmp/components/contracts

  • for Windows os.environ['AppData']\components\contracts

The following environment variables can be set:
  • HADRON_PM_PATH: the property contract path, if not found, uses the system default

  • HADRON_PM_REPO: the property contract should be initially loaded from a read only repo site such as github

  • HADRON_PM_TYPE: a file type for the property manager. If not found sets as json

  • HADRON_PM_MODULE: a default module package, if not set uses component default

  • HADRON_PM_HANDLER: a default handler. if not set uses component default

This method calls to the Factory Method from_uri(...) returning the initialised class instance

Parameters:
  • task_name – The reference name that uniquely identifies a task or subset of the property manager

  • default_save – (optional) if the configuration should be persisted

  • reset_templates – (optional) reset connector templates from environ variables. Default True

  • align_connectors – (optional) resets aligned connectors to the template. default Default True

  • default_save_intent – (optional) The default action for saving intent in the property manager

  • default_intent_level – (optional) the default level intent should be saved at

  • order_next_available – (optional) if the default behaviour for the order should be next available order

  • default_replace_intent – (optional) the default replace existing intent behaviour

  • uri_pm_repo – The read only repo link that points to the raw data path to the contracts repo directory

  • has_contract – (optional) indicates the instance should have a property manager domain contract

  • kwargs – to pass to the property ConnectorContract as its kwargs

Returns:

the initialised class instance

classmethod from_memory(has_contract: bool | None = None, default_save_intent: bool | None = None, default_intent_level: bool | None = None, order_next_available: bool | None = None, default_replace_intent: bool | None = None, **kwargs)

Class Factory Method that creates a light touch in memory instance that leaves no residue when closed. This factory method can load a reference contract from a remote repo as a foundation.

param default_save_intent:

(optional) The default action for saving intent in the property manager

param default_intent_level:

(optional) the default level intent should be saved at

param order_next_available:

(optional) if the default behaviour for the order should be next available order

param default_replace_intent:

(optional) the default replace existing intent behaviour

param has_contract:

(optional) indicates the instance should have a property manager domain contract

param kwargs:

to pass to the property ConnectorContract as its kwargs

return:

the initialised class instance

classmethod from_uri(task_name: str, uri_pm_path: str, creator: str, uri_pm_repo: str | None = None, pm_file_type: str | None = None, pm_module: str | None = None, pm_handler: str | None = None, pm_kwargs: dict | None = None, default_save=None, reset_templates: bool | None = None, template_path: str | None = None, template_module: str | None = None, template_source_handler: str | None = None, template_persist_handler: str | None = None, align_connectors: bool | None = None, default_save_intent: bool | None = None, default_intent_level: bool | None = None, order_next_available: bool | None = None, default_replace_intent: bool | None = None, has_contract: bool | None = None) Wrangle

Class Factory Method to instantiates the components application. The Factory Method handles the instantiation of the Properties Manager, the Intent Model and the persistence of the uploaded properties. See class inline docs for an example method

param task_name:

The reference name that uniquely identifies a task or subset of the property manager

param uri_pm_path:

A URI that identifies the resource path for the property manager.

param creator:

A user name for this task activity.

param uri_pm_repo:

(optional) A repository URI to initially load the property manager but not save to.

param pm_file_type:

(optional) defines a specific file type for the property manager

param pm_module:

(optional) the module or package name where the handler can be found

param pm_handler:

(optional) the handler for retrieving the resource

param pm_kwargs:

(optional) a dictionary of kwargs to pass to the property manager

param default_save:

(optional) if the configuration should be persisted. default to ‘True’

param reset_templates:

(optional) reset connector templates from environ variables. Default True (see report_environ())

param template_path:

(optional) a template path to use if the environment variable does not exist

param template_module:

(optional) a template module to use if the environment variable does not exist

param template_source_handler:

(optional) a template source handler to use if no environment variable

param template_persist_handler:

(optional) a template persist handler to use if no environment variable

param align_connectors:

(optional) resets aligned connectors to the template. default Default True

param default_save_intent:

(optional) The default action for saving intent in the property manager

param default_intent_level:

(optional) the default level intent should be saved at

param order_next_available:

(optional) if the default behaviour for the order should be next available order

param default_replace_intent:

(optional) the default replace existing intent behaviour

param has_contract:

(optional) indicates the instance should have a property manager domain contract

return:

the initialised class instance

get_persist_contract() ConnectorContract

gets the persist connector contract that can be used as the next chain source. If the uri contains environment variables it is NOT parsed at load

get_persist_uri() str

gets the persist connector contract uri that be used as the next chain source. If the uri contains environment variables it is parsed

property intent_model: WrangleIntentModel

The intent model instance

load_canonical(connector_name: str, reset_changed: bool | None = None, has_changed: bool | None = None, return_empty: bool | None = None, **kwargs) DataFrame

returns the canonical of the referenced connector

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • reset_changed – (optional) resets the has_changed boolean to True

  • has_changed – (optional) tests if the underline canonical has changed since last load else error returned

  • return_empty – (optional) if has_changed is set, returns an empty canonical if set to True

  • kwargs – arguments to be passed to the handler on load

load_persist_canonical(reset_changed: bool | None = None, has_changed: bool | None = None, return_empty: bool | None = None, **kwargs) DataFrame

loads the clean pandas.DataFrame from the clean folder for this contract

Parameters:
  • reset_changed – (optional) resets the has_changed boolean to True

  • has_changed – (optional) tests if the underline canonical has changed since last load else error returned

  • return_empty – (optional) if has_changed is set, returns an empty canonical if set to True

  • kwargs – arguments to be passed to the handler on load

load_source_canonical(reset_changed: bool | None = None, has_changed: bool | None = None, return_empty: bool | None = None, **kwargs) DataFrame

returns the contracted source data as a DataFrame

Parameters:
  • reset_changed – (optional) resets the has_changed boolean to True

  • has_changed – (optional) tests if the underline canonical has changed since last load else error returned

  • return_empty – (optional) if has_changed is set, returns an empty canonical if set to True

  • kwargs – arguments to be passed to the handler on load

property notes_catalog: list

returns the list of allowed catalog names

persist_canonical(connector_name: str, canonical: Any, **kwargs)

persists the canonical to the referenced connector. same as save_canonical

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • canonical – the canonical data to persist

  • kwargs – arguments to be passed to the handler on persist

property pm: WranglePropertyManager

The properties manager instance

property pm_name: str

The contract name of this transition instance

pm_persist(save=None)

Saves the current configuration to file

pm_reset(save: bool | None = None)

resets the contract back to a default. This does not remove the Property Manager Connector Contract or any snapshots

Parameters:

save – override of the default save action set at initialisation.

pm_transfer(transfer_connector: [<class 'str'>, <class 'aistac.handlers.abstract_handlers.ConnectorContract'>])

Takes a copy of the pm contract and saves it to a new location defined by the connector contract. This can be used to publish a property manager to a new location, change its format or as a backup

Parameters:

transfer_connector – the name of an existing connector contract or a ConnectorContract

recover_snapshot(snapshot_name: str, overwrite: bool | None = None, save: bool | None = None) bool

recovers a snapshot back to the current. The snapshot must be from this root contract. by default the original root contract will be overwritten unless the overwrite is set to False. if overwrite is False a timestamped snapshot is created

:param snapshot_name:the name of the snapshot (use self.contract_snapshots to get list of names) :param overwrite: (optional) if the original contract should be overwritten. Default to True :param save: override of the default save action set at initialisation. :return: True if the contract was recovered, else False

remove_canonical(connector_name: str, **kwargs)

removes the current persisted canonical.

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • kwargs – arguments to be passed to the handler on remove

remove_connector_contract(connector_name: str, save: bool | None = None)

removes a named connector contract

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • save – override of the default save action set at initialisation.

remove_intent(intent_param: [<class 'str'>, <class 'dict'>] = None, level: [<class 'int'>, <class 'str'>] = None, save: bool = None)
removes part or all the intent contract.
  • If no params all intent is removed

  • if only intent then all references in all params of that named intent will be removed

  • if only level then that level is removed

  • if both level and intent then that specific intent on that level is removed

Parameters:
  • intent_param – (optional) removes the method contract

  • level – (optional) removes the level contract

  • save – (optional) override of the default save action set at initialisation.

Returns:

True if removed, False if not

remove_notes(catalog: str, label: str | None = None, save=None)

removes a all entries for a labeled note

Parameters:
  • catalog – the type of note to delete, if left empty all notes removed

  • label – (Optional) the name of the label to be removed

  • save – (Optional) if True, save to file. Default is True

Returns:

True is successful, False if not

remove_run_book(book_name: str | None = None, save: bool | None = None) bool

removes named run book. If no runbook is given then all run books are removed

Parameters:
  • book_name – (optional) the name of the run_book. defaults to primary_run_book’

  • save – (optional) override of the default save action set at initialisation.

Returns:

True if removed, False if not

static report2dict(report: str, file_type: str = None, versioned: bool = None, stamped: str = None, prefix: str = None, suffix: str = None, path: [<class 'str'>, <class 'list'>] = None) dict

a utility method to help build analytics conditions by aligning method parameters with dictionary format.

Parameters:
  • report – The name of the report

  • file_type – (optional) an alternative file extension to the default ‘json’ format

  • versioned – (optional) if the component version should be included as part of the pattern

  • stamped – (optional) A string of the timestamp options [‘days’, ‘hours’, ‘minutes’, ‘seconds’, ‘ns’]

  • prefix – (optional) a prefix to put at the front of the file pattern to replace the default

  • suffix – (optional) a suffix to put at the end of the file pattern and extension

  • path – (optional) a file path that precedes the prefix and file pattern. uses os.path.join so takes a list

Returns:

a dictionary for an individual element

report_canonical_schema(schema: [<class 'str'>, <class 'dict'>] = None, roots: [<class 'str'>, <class 'list'>] = None, sections: [<class 'str'>, <class 'list'>] = None, elements: [<class 'str'>, <class 'list'>] = None, stylise: bool = True)

presents the current canonical schema

Parameters:
  • schema – (optional) the name of the schema

  • roots – (optional) one or more tree roots

  • sections – (optional) the section under the root

  • elements – (optional) the element in the section

  • stylise – if True present the report stylised.

Returns:

pd.DataFrame

report_column_catalog(column_name: [str, list] = None, stylise: bool = True)

generates a report on the source contract

Parameters:
  • column_name – (optional) filters on specific column names.

  • stylise – (optional) returns a stylised DataFrame with formatting

Returns:

pd.DataFrame

report_connectors(connector_filter: [<class 'str'>, <class 'list'>] = None, inc_pm: bool = None, inc_template: bool = None, stylise: bool = True)

generates a report on the source contract

Parameters:
  • connector_filter – (optional) filters on the connector name.

  • inc_pm – (optional) include the property manager connector

  • inc_template – (optional) include the template connectors

  • stylise – (optional) returns a stylised DataFrame with formatting

Returns:

pd.DataFrame

report_environ(hide_not_set: bool = True, stylise: bool = True)

generates a report on all the intent

Parameters:
  • hide_not_set – hide environ keys that are not set.

  • stylise – returns a stylised dataframe with formatting

Returns:

pd.Dataframe

report_intent(levels: [<class 'str'>, <class 'int'>, <class 'list'>] = None, stylise: bool = True)

generates a report on all the intent

Parameters:
  • levels – (optional) a filter on the levels. passing a single value will report a single parameterised view

  • stylise – (optional) returns a stylised dataframe with formatting

Returns:

pd.Dataframe

report_notes(catalog: [<class 'str'>, <class 'list'>] = None, labels: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, re_ignore_case: bool = False, stylise: bool = True, drop_dates: bool = False)

generates a report on the notes

Parameters:
  • catalog – (optional) the catalog to filter on

  • labels – (optional) s label or list of labels to filter on

  • regex – (optional) a regular expression on the notes

  • re_ignore_case – (optional) if the regular expression should be case sensitive

  • stylise – (optional) returns a stylised dataframe with formatting

  • drop_dates – (optional) excludes the ‘date’ column from the report

Returns:

pd.Dataframe

report_run_book(stylise: bool = True)

generates a report on all the intent

Parameters:

stylise – returns a stylised dataframe with formatting

Returns:

pd.Dataframe

report_task(stylise: bool = True)

generates a report on the source contract

Parameters:

stylise – (optional) returns a stylised DataFrame with formatting

Returns:

pd.DataFrame

reset_template_connectors(save: bool | None = None)

resets connector contracts with template path and handler where they are template aligned. (see set_connector_aligned)

Parameters:

save – override of the default save action set at initialisation.

run_component_pipeline(intent_levels: [str, int, list] = None, run_book: str = None, use_default: bool = None, seed: int = None, reset_changed: bool = None, has_changed: bool = None, **kwargs)

Runs the components pipeline from source to persist.

Parameters:
  • intent_levels – a single or list of intent levels to run

  • run_book – a saved runbook to run

  • use_default – if the default runbook should be used if it exists

  • seed – a seed value for this run

  • reset_changed – (optional) resets the has_changed boolean to True

  • has_changed – (optional) tests if the underline canonical has changed since last load else error returned

  • kwargs – any additional kwargs

save_canonical(connector_name: str, canonical: Any, **kwargs)

saves the canonical to the referenced connector. Same as persist_canonical

Parameters:
  • connector_name – the name or label to identify and reference the connector

  • canonical – the canonical data to persist

  • kwargs – arguments to be passed to the handler on persist

save_canonical_schema(schema_name: str | None = None, canonical: DataFrame | None = None, schema_tree: list | None = None, exclude_associate: list | None = None, detail_numeric: bool | None = None, strict_typing: bool | None = None, category_limit: int | None = None, save: bool | None = None)

Saves the canonical schema to the Property contract. The default loads the clean canonical but optionally a canonical can be passed to base the schema on and optionally a name given other than the default

Parameters:
  • schema_name – (optional) the name of the schema to save

  • canonical – (optional) the canonical to base the schema on

  • schema_tree – (optional) an analytics dict (see Discovery.analyse_association(…)

  • exclude_associate – (optional) a list of dot notation tree of items to exclude from iteration (e.g. [‘age.gender.salary’] will cut ‘salary’ branch from gender and all sub branches)

  • detail_numeric – (optional) if numeric columns should have detail stats, slowing analysis. default False

  • strict_typing – (optional) stops objects and string types being seen as categories. default True

  • category_limit – (optional) a global cap on categories captured. default is 10

  • save – (optional) if True, save to file. Default is True

save_persist_canonical(canonical, auto_connectors: bool | None = None, **kwargs)

Saves the canonical to the clean files folder, auto creating the connector from template if not set

save_report_canonical(reports: [<class 'str'>, <class 'list'>], report_canonical: [<class 'dict'>, <class 'pandas.core.frame.DataFrame'>], replace_connectors: bool | None = None, auto_connectors: bool | None = None, save: bool | None = None, **kwargs)

saves one or a list of reports using the TEMPLATE_PERSIST connector contract. Though a report can be of any name, for convention and consistency each component has a set of REPORT constants <Component>.REPORT_<NAME> where <Component> is the component Class name and <name> is the name of the report_canonical.

The reports can be a simple string name or a list of names. The name list can be a string or a dictionary providing more detailed parameters on how to represent the report. These parameters keys are

  • key report: the name of the report

  • key file_type: (optional) a file type other than the default .json

  • key versioned: (optional) if the filename should be versioned

  • key stamped: (optional) A string of the timestamp options [‘days’, ‘hours’, ‘minutes’, ‘seconds’, ‘ns’]

Some examples

self.REPORT_SCHEMA
[self.REPORT_NOTES, self.REPORT_SCHEMA]
[self.REPORT_NOTES, {'report': self.REPORT_SCHEMA, 'uri_file': '<file_name>'}]
[{'report': self.REPORT_NOTES, 'file_type': 'json'}]
[{'report': self.REPORT_SCHEMA, 'file_type': 'csv', 'versioned': True, 'stamped': days}]
Parameters:
  • reports – a report name or list of report names to save

  • report_canonical – a relating canonical to base the report on

  • auto_connectors – (optional) if a connector should be created automatically

  • replace_connectors – (optional) replace any existing report connectors with these reports

  • save – (optional) if True, save to file. Default is True

  • kwargs – additional kwargs to pass to a Connector Contract

classmethod scratch_pad() WrangleIntentModel

A class method to use the Components intent methods as a scratch pad

set_connector_aligned(connector_names: [<class 'str'>, <class 'list'>], aligned: bool, save: bool | None = None)

modifies the uri of a connector contract and resets

Parameters:
  • connector_names – a name or list of names of connector contract to modify

  • aligned – if the connector contract is aligned to the template connector contract

  • save – override of the default save action set at initialisation.

set_connector_version(connector_names: [<class 'str'>, <class 'list'>], version: str, save: bool | None = None)

modifies the uri of a connector contract and resets

Parameters:
  • connector_names – a name or list of names of connector contract to modify

  • version – the new version number

  • save – override of the default save action set at initialisation.

set_description(description: str, save=None)

sets the description of this component task :param description: a brief description of this component task :param save: override of the default save action set at initialisation.

set_persist(uri_file: str | None = None, save: bool | None = None, **kwargs)

sets the persist contract CONNECTOR_PERSIST using the TEMPLATE_PERSIST connector contract

Parameters:
  • uri_file – (optional) the uri_file is appended to the template path

  • save – (optional) if True, save to file. Default is True

set_persist_contract(connector_contract: ConnectorContract, save: bool | None = None)

Sets the persist contract.

Parameters:
  • connector_contract – a Connector Contract for the persisted data

  • save – (optional) if True, save to file. Default is True

set_persist_uri(uri: str, save: bool | None = None, template_aligned: bool | None = None, **kwargs)

Sets the persist contract giving the full uri path. This is a shortcut of set_persist_contract(…), not requiring a ConnectorContract to be set up and using the default module and handler values.

Parameters:
  • uri – a fully qualified uri of the persist data

  • template_aligned – the connector aligns with the template so changes to the template

  • save – (optional) if True, save to file. Default is True

set_report_persist(reports: [<class 'str'>, <class 'list'>], project: str = None, path: [<class 'str'>, <class 'list'>] = None, prefix: str = None, suffix: str = None, file_type: str = None, versioned: bool = None, stamped: str = None, save: bool = None, **kwargs) list

sets the report persist using the TEMPLATE_PERSIST connector contract, there are preset constants that should be used. These constance can be in the form <class>.REPORT_<NAME> or <instance>.REPORT_<NAME> where <NAME> is the name of the report and can be found in this class. Examples of reports might be:

Transition.REPORT_SCHEMA
[self.REPORT_NOTES, self.REPORT_SCHEMA]
[builder.REPORT_NOTES, {'report': builder.REPORT_SCHEMA, 'uri_file': '<file_name>'}]
[{'report': Wrangle.REPORT_NOTES, 'file_type': 'json'}]
[{'report': self.REPORT_SCHEMA, 'file_type': 'csv', 'versioned': True, 'stamped': 'days'}]

if a report is presented as dict, the method signature parameters will be overwritten by the report values. This allows globally parameter to apply generally but allow single reports to be modified at a granular level.

to ensure dict reports have the correct keys the util method ‘report2dict(…)’ can be used.

Parameters:
  • reports – (optional) the name(s) of the report connector to set (see class REPORT_* constants)

  • project – (optional) an alternative project string that replaces ‘hadron’

  • path – (optional) a file path that precedes the prefix and file pattern. uses os.path.join so takes a list

  • prefix – (optional) a prefix to put at the front of the file pattern to replace the default

  • suffix – (optional) a suffix to put at the end of the file pattern and extension

  • file_type – (optional) a global file extension to the default ‘json’ format

  • versioned – (optional) if all reports should include a version

  • stamped – (optional) A string of the timestamp options [‘days’, ‘hours’, ‘minutes’, ‘seconds’, ‘ns’]

  • save – (optional) if True, save to file. Default is True

  • kwargs – (optional) additional parameters to send as kwargs for the Connect Contract

Returns:

a list of connector names created from the reports

set_source(uri_file: str, save: bool | None = None, **kwargs)

sets the source contract CONNECTOR_SOURCE using the TEMPLATE_SOURCE connector contract,

Parameters:
  • uri_file – the uri_file is appended to the template path

  • save – (optional) if True, save to file. Default is True

set_source_contract(connector_contract: ConnectorContract, template_aligned: bool | None = None, save: bool | None = None)

Sets the source contract using the class CONNECTOR_SOURCE constant

Parameters:
  • connector_contract – a Connector Contract for the source data

  • template_aligned – the connector aligns with the template so changes to the template

  • save – (optional) if True, save to file. Default is True

set_source_uri(uri: str, save: bool | None = None, template_aligned: bool | None = None, **kwargs)

Sets the source contract giving the full uri path. This is a shortcut of set_source_contract(…), not requiring a ConnectorContract to be set up and using the default module and handler values.

Parameters:
  • uri – a fully qualified uri of the source data

  • template_aligned – the connector aligns with the template so changes to the template

  • save – (optional) if True, save to file. Default is True

set_status(status: str, save=None)

sets the status of this component task. Suggested status might be ‘discovery’, ‘stable’, ‘production’ :param status: the status to be set, :param save: override of the default save action set at initialisation.

set_version(version: str, save=None)

sets the version :param version: the version to be set :param save: override of the default save action set at initialisation.

setup_bootstrap(domain: str | None = None, project_name: str | None = None, path: str | None = None, file_type: str | None = None, description: str | None = None)

Creates a bootstrap Transition setup. Note this does not set the source

Parameters:
  • domain – (optional) The domain this simulators sits within e.g. ‘Healthcare’ or ‘Financial Services’

  • project_name – (optional) a project name that will replace the hadron naming on file prefix

  • path – (optional) a path added to the template path default

  • file_type – (optional) a file_type for the persisted file, default is ‘parquet’

  • description – (optional) a description of the component instance to overwrite the default

property tools: WrangleIntentModel

The intent model instance

upload_notes(canonical: dict, catalog: str, label_key: str, text_key: str, constraints: list | None = None, save=None)

Allows bulk upload of notes.

Parameters:
  • canonical – a dictionary of where the key is the label and value is the text

  • catalog – (optional) the section these notes should be put in

  • label_key – the dictionary key name for the labels

  • text_key – the dictionary key name for the text

  • constraints – (optional) the limited list of acceptable labels. If not in list then ignored

  • save – if True, save to file. Default is True

property visual: Visualisation

The visualisation instance

classmethod visual_pad() Visualisation

A class method to use the Components visualisation methods as a scratch pad