kedro.runner.ParallelRunner¶
-
class
kedro.runner.ParallelRunner(max_workers=None, is_async=False)[source]¶ ParallelRunneris anAbstractRunnerimplementation. It can be used to run thePipelinein parallel groups formed by toposort. Please note that this runner implementation validates dataset using the_validate_catalogmethod, which checks if any of the datasets are single process only using the _SINGLE_PROCESS dataset attribute.Methods
create_default_data_set(ds_name)Factory method for creating the default dataset for the runner.
run(pipeline, catalog, hook_manager[, …])Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.run_only_missing(pipeline, catalog, hook_manager)Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.-
__init__(max_workers=None, is_async=False)[source]¶ Instantiates the runner by creating a Manager.
- Parameters
max_workers (
Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers).is_async (
bool) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
- Raises
ValueError – bad parameters passed
-
create_default_data_set(ds_name)[source]¶ Factory method for creating the default dataset for the runner.
- Parameters
ds_name (
str) – Name of the missing dataset.- Return type
_SharedMemoryDataSet- Returns
An instance of
_SharedMemoryDataSetto be used for all unregistered datasets.
-
run(pipeline, catalog, hook_manager, session_id=None)¶ Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.- Parameters
pipeline (
Pipeline) – ThePipelineto run.catalog (
DataCatalog) – TheDataCatalogfrom which to fetch data.hook_manager (
PluginManager) – ThePluginManagerto activate hooks.session_id (
Optional[str]) – The id of the session.
- Raises
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Return type
Dict[str,Any]- Returns
Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.
-
run_only_missing(pipeline, catalog, hook_manager)¶ Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.- Parameters
pipeline (
Pipeline) – ThePipelineto run.catalog (
DataCatalog) – TheDataCatalogfrom which to fetch data.hook_manager (
PluginManager) – ThePluginManagerto activate hooks.
- Raises
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Return type
Dict[str,Any]- Returns
Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.
-