Multi-Assets#

A multi-asset represents a set of assets that are all updated by the same operation.

Relevant APIs#

NameDescription
@multi_assetA decorator used to define multi-assets.

Overview#

When working with software-defined assets, it's sometimes inconvenient or impossible to map each persisted asset to a unique op. A multi-asset is a way to define a single operation that will produce the contents of multiple data assets at once.

Multi-assets may be useful in the following scenarios:

  • A single call to an API results in multiple tables being updated (e.g. Airbyte, Fivetran, dbt)
  • A DataFrame is transformed into multiple assets, and serializing it between separate ops would be prohibitively inefficient
  • The same in-memory object is used to compute multiple assets

Defining multi-assets#

The function responsible for computing the contents of any software-defined asset is an op. Multi-assets are responsible for updating multiple assets, so the underlying op will have multiple outputs -- one for each associated asset.

A basic multi-asset#

The easiest way to create a multi-asset is with the @multi_asset decorator. This decorator functions similarly to the @asset decorator, but requires an outs parameter specifying each output asset of the function.

from dagster import Out, multi_asset


@multi_asset(
    outs={
        "my_string_asset": Out(),
        "my_int_asset": Out(),
    }
)
def my_function():
    return "abc", 123

By default, the names of the outputs will be used to form the asset keys of the multi-asset. The decorated function will be used to create the op for these assets and must emit an output for each of them. In this case, we can emit multiple outputs by returning a tuple of values, one for each asset.

Subsetting multi-assets#

By default, it is assumed that the computation inside of a multi-asset will always produce the contents all of the associated assets. This means that attempting to execute a set of assets that produces some, but not all, of the assets defined by a given multi-asset will result in an error.

Sometimes, the underlying computation is sufficiently flexible to allow for computing an arbitrary subset of the assets associated with it. In these cases, set the is_required attribute of the outputs to False, and set the can_subset parameter of the decorator to True.

Inside the body of the function, we can use context.selected_output_names or context.selected_asset_keys to find out which computations should be run.

from dagster import Out, Output, multi_asset


@multi_asset(
    outs={
        "a": Out(is_required=False),
        "b": Out(is_required=False),
    },
    can_subset=True,
)
def split_actions(context):
    if "a" in context.selected_output_names:
        yield Output(value=123, output_name="a")
    if "b" in context.selected_output_names:
        yield Output(value=456, output_name="b")

Because our outputs are now optional, we can no longer rely on returning a tuple of values, as we don't know in advance which values will be computed. Instead, we explicitly yield each output that we're expected to create.

Dependencies inside multi-assets#

When a multi-asset is created, it is assumed that each output asset depends on all of the input assets. This may not always be the case, however.

In these situations, you may optionally provide a mapping from each output asset to the set of AssetKeys that it depends on. This information is used to display lineage information in Dagit and for parsing selections over your asset graph.

from dagster import AssetKey, Out, Output, multi_asset


@multi_asset(
    outs={"c": Out(), "d": Out()},
    internal_asset_deps={
        "c": {AssetKey("a")},
        "d": {AssetKey("b")},
    },
)
def my_complex_assets(a, b):
    # c only depends on a
    yield Output(value=a + 1, output_name="c")
    # d only depends on b
    yield Output(value=b + 1, output_name="d")