Versioning and Memoization (Experimental)

Dagster allows for code versioning and memoization of previous outputs based upon that versioning. Listed here are APIs related to versioning and memoization.

Versioning

class dagster.VersionStrategy[source]

Abstract class for defining a strategy to version ops and resources.

When subclassing, get_op_version must be implemented, and get_resource_version can be optionally implemented.

get_op_version should ingest an OpVersionContext, and get_resource_version should ingest a ResourceVersionContext. From that, each synthesize a unique string called a version, which will be tagged to outputs of that solid in the pipeline. Providing a VersionStrategy instance to a job will enable memoization on that job, such that only steps whose outputs do not have an up-to-date version will run.

class dagster.SourceHashVersionStrategy[source]

VersionStrategy that checks for changes to the source code of ops and resources.

Only checks for changes within the immediate body of the op/resource’s decorated function (or compute function, if the op/resource was constructed directly from a definition).

class dagster.OpVersionContext(op_def, op_config)[source]

Provides execution-time information for computing the version for an op. .. attribute:: op_def

The definition of the op to compute a version for.

type:

OpDefinition

op_config

The parsed config to be passed to the op during execution.

Type:

Any

class dagster.ResourceVersionContext(resource_def, resource_config)[source]

Provides execution-time information for computing the version for a resource.

resource_def

The definition of the resource whose version will be computed.

Type:

ResourceDefinition

resource_config

The parsed config to be passed to the resource during execution.

Type:

Any

Memoization

class dagster.MemoizableIOManager[source]

Base class for IO manager enabled to work with memoized execution. Users should implement the load_input and handle_output methods described in the IOManager API, and the has_output method, which returns a boolean representing whether a data object can be found.

abstract has_output(context)[source]

The user-defined method that returns whether data exists given the metadata.

Parameters:

context (OutputContext) – The context of the step performing this check.

Returns:

True if there is data present that matches the provided context. False otherwise.

Return type:

bool

See also: dagster.IOManager.

dagster.MEMOIZED_RUN_TAG

Provide this tag to a run to toggle memoization on or off. {MEMOIZED_RUN_TAG: "true"} toggles memoization on, while {MEMOIZED_RUN_TAG: "false"} toggles memoization off.