A multi-asset represents a set of assets that are all updated by the same operation.
Name | Description |
---|---|
@multi_asset | A decorator used to define multi-assets. |
When working with software-defined assets, it's sometimes inconvenient or impossible to map each persisted asset to a unique op. A multi-asset is a way to define a single operation that will produce the contents of multiple data assets at once.
Multi-assets may be useful in the following scenarios:
The function responsible for computing the contents of any software-defined asset is an op. Multi-assets are responsible for updating multiple assets, so the underlying op will have multiple outputs -- one for each associated asset.
The easiest way to create a multi-asset is with the @multi_asset
decorator. This decorator functions similarly to the @asset
decorator, but requires an outs
parameter specifying each output asset of the function.
from dagster import Out, multi_asset
@multi_asset(
outs={
"my_string_asset": Out(),
"my_int_asset": Out(),
}
)
def my_function():
return "abc", 123
By default, the names of the outputs will be used to form the asset keys of the multi-asset. The decorated function will be used to create the op for these assets and must emit an output for each of them. In this case, we can emit multiple outputs by returning a tuple of values, one for each asset.
By default, it is assumed that the computation inside of a multi-asset will always produce the contents all of the associated assets. This means that attempting to execute a set of assets that produces some, but not all, of the assets defined by a given multi-asset will result in an error.
Sometimes, the underlying computation is sufficiently flexible to allow for computing an arbitrary subset of the assets associated with it. In these cases, set the is_required
attribute of the outputs to False
, and set the can_subset
parameter of the decorator to True
.
Inside the body of the function, we can use context.selected_output_names
or context.selected_asset_keys
to find out which computations should be run.
from dagster import Out, Output, multi_asset
@multi_asset(
outs={
"a": Out(is_required=False),
"b": Out(is_required=False),
},
can_subset=True,
)
def split_actions(context):
if "a" in context.selected_output_names:
yield Output(value=123, output_name="a")
if "b" in context.selected_output_names:
yield Output(value=456, output_name="b")
Because our outputs are now optional, we can no longer rely on returning a tuple of values, as we don't know in advance which values will be computed. Instead, we explicitly yield
each output that we're expected to create.
When a multi-asset is created, it is assumed that each output asset depends on all of the input assets. This may not always be the case, however.
In these situations, you may optionally provide a mapping from each output asset to the set of AssetKey
s that it depends on. This information is used to display lineage information in Dagit and for parsing selections over your asset graph.
from dagster import AssetKey, Out, Output, multi_asset
@multi_asset(
outs={"c": Out(), "d": Out()},
internal_asset_deps={
"c": {AssetKey("a")},
"d": {AssetKey("b")},
},
)
def my_complex_assets(a, b):
# c only depends on a
yield Output(value=a + 1, output_name="c")
# d only depends on b
yield Output(value=b + 1, output_name="d")