Use the output of a block to dynamically create more blocks.
Krichkorn Oparad
This feature saves me when I need to securely transfer multiple media files from one location to another (SCP), especially when the files are spread across different directories and across the network. Typically, one would have to loop through each file, but secure copy (SCP) operations can fail for various reasons, and dealing with exceptions can be error-prone. An alternative approach would be to use an API trigger (like the one provided by Mage AI) to receive POST requests for each file that needs to be securely copied. However, this method can be highly resource-intensive, particularly when dealing with thousands of pipeline runs. Dynamic block, which allows me to manage source and destination specifications in a DataFrame and then dynamically execute SCP operations without having to handle exceptions manually (thanks to the “retry incomplete block” feature), is a game-changer. Not only does it save me from the hassle of exception handling, but it also keeps computational resources in check by avoiding unnecessary pipeline triggers. Thank you for your world-class project. Keep up the excellent work.![]()
List[List[Dict]]
).
For example, a data loader block returns the following output:
Note The transformer block is returning a list as its output data. The list of items is used as positional arguments in any downstream block of this transformer block.
Key | Description | Required |
---|---|---|
block_uuid | This value is used in combination with the downstream block’s original UUID to construct a unique UUID across all dynamic blocks. This value must be unique within the same list of metadata dictionaries. | Yes |
anonymize_user_data
and the metadata for the 1st
dynamically created block is { "block_uuid": "for_user_1" }
, then the dynamically created block’s
UUID is anonymize_user_data:for_user_1
.
The convention is [original_block_uuid]:[metadata_block_uuid]
.
anonymize_user_data
and the following code:
Dynamic block UUID | Return output data |
---|---|
anonymize_user_data:for_user_1 | [{ "id": 100, "name": "user_1" }] |
anonymize_user_data:for_user_2 | [{ "id": 200, "name": "user_2" }] |
anonymize_user_data:for_user_3 | [{ "id": 300, "name": "user_3" }] |
anonymize_user_data
has 2 downstream blocks with UUID
clean_column_names
and compute_engagement_score
, the following blocks will also be
dynamically created:
anonymize_user_data:for_user_1
anonymize_user_data:for_user_2
anonymize_user_data:for_user_3
clean_column_names
, compute_engagement_score
) will come from its upstream block
(e.g. anonymize_user_data:for_user_1
, anonymize_user_data:for_user_2
, anonymize_user_data:for_user_3
).
For example, the block clean_column_names:for_user_1
and compute_engagement_score:for_user_1
will have input data with the following shape and value:
anonymize_user_data
to reduce its output, then
there will only be 2 downstream blocks instead of 6.
The 2 downstream blocks will be clean_column_names
and compute_engagement_score
.
The input data to each of these 2 downstream blocks will be:
data_loader.py
: dynamic block
dynamic_sql_loader
: dynamic block, dynamic child
transformer
: dynamic child