Special rules#
Special rules let you compute or override facet values after path/data extraction. They run with access to the already‐collected metadata, the file path/URI, and the full loaded configuration model. A rule only assigns a value if the facet is not already present in the record.
Execution model#
Precedence: specials never clobber values already set by path/data extraction (the early
continuemeans “first win” for any facet).Templating: before evaluation, strings are rendered with Jinja2 using
data(see “Available variables” below).- Evaluation scope: Available variables are define by the
drs_config: datasets: dict: The dictionary of the defined datasetssuffixes: list: The valid path suffixes (drs_settings.suffixes)index_schema: dict: The defined data schema as defined indrs_settings.index_schemaspecial: dict: Global special rules as defined indrs_settings.specialstorage_options: dict: Global storage options as defined indrs_settings.storage_optionsdialect: dict: Dialect settings defined bydrs_settings.dialect
- Evaluation scope: Available variables are define by the
Truthiness: assignment happens only if
resultis truthy. Empty strings, empty lists,0orFalsewill not be assigned. If you need to set such values, return a non-empty representation or adjust the code to checkis not None.
Available variables#
Two namespaces are available depending on the step:
Jinja2 template context:
All current record keys from the
metadata(e.g.{{ variable }},{{ table_id }},{{ time_frequency }}).The file path and URI as
{{ file }}and{{ uri }}.
Python eval context:
The entire parsed model (your TOML) exposed as a nested dictionary:
datasets,suffixes,index_schema,special,storage_options,dialect. For example: -dialect['cordex']['domains']['EUR-11']-datasets['cmip6-fs']['root_path']
Rule types#
Conditional#
Evaluate a boolean Python expression (after Jinja rendering) and choose between two literal values.
TOML CONFIG
[drs_settings.special.time_aggregation]
type = "conditional"
condition = "'pt' in '{{ time_frequency | default(\"day\") | lower }}'"
true = "inst"
false = "mean"
Flow:
Jinja renders the condition using current metadata (
{{ time_frequency }}).The rendered string is eval’d against the model dict (no builtins).
If truthy → assign
true, elsefalse(only if facet not already set).
Lookup#
Lookup is a special type of data attribute lookup that stores the results of the attribute lookup in a nested cache. This allows for efficient retrieval of attributes that have already been retrieved from datasets.
Internally this call the dataset storage backend’s lookup(path, attribute, *tree, **read_kws)
method to fetch values from a cached tree (e.g., mapping CMIP6 table_id + variable_id to
realm or frequency). Items are first rendered via Jinja.
Below you can find the signature of the method that gets involved when applying the lookup rule:
- PathTemplate.lookup(path: str, attribute: str, *tree: str, **read_kws: Any) Any[source]
Get metadata from a lookup table.
This function will read metadata from a pre-defined cache table and if the metadata is not present in the cache table it’ll read the the object store and add the metadata to the cache table.
- Parameters:
path – Path to the object store / file name
attribute – The attribute that is retrieved from the data. variable attributes can be defined by a
.. For example:tas.long_namewould get attributelong_namefrom variabletas.*tree – A tuple representing nested attributes. Attributes are nested for more efficient lookup. (‘atmos’, ‘1hr’, ‘tas’) will translate into a tree of [‘atmos’][‘1hr’][‘tas’]
**read_kws – Keyword arguments passed to open the datasets.
TOML CONFIG
[drs_settings.dialect.cmip6.special.realm]
type = "lookup"
attribute = "realm"
tree = ["{{ table_id }}", "{{ variable_id }}"]
[drs_settings.dialect.cmip6.special.time_frequency]
type = "lookup"
attribute = "frequency"
tree = ["{{ table_id }}", "{{ variable_id }}"]
Note
The backend should memoize lookups in a tree cache so repeated calls across millions of files are O(1) hits after the first read.
read_kwscome fromdialect[standard].data_specs.read_kws(e.g., xarray engine).
Call#
Render a string with Jinja and eval it as a Python expression within the model dict scope. Useful for string composition or referencing config data structures.
TOML CONFIG
[drs_settings.dialect.cordex.special.model]
type = "call"
call = "'{{ driving_model }}-{{ rcm_name }}-{{ rcm_version }}'"
You may also reference config structures as nested dicts in the expression, for example:
TOML CONFIG
[drs_settings.dialect.cordex.special.default_bbox]
type = "call"
call = "dialect['cordex']['domains'].get('{{ domain | upper }}', [0,360,-90,90])"
Order and scoping#
Where to define rules: - Global:
[drs_settings.special.<facet>](applies to all dialects) - Per-dialect:[drs_settings.dialect.<name>.special.<facet>]Which wins: - Specials never overwrite a facet already set by earlier steps. - If you apply global specials first and dialect specials second, the dialect can fill remaining gaps specific to that standard. - If you need a dialect rule to take precedence for a facet that a global rule might also set, ensure the dialect rule runs first (so the global pass will skip, seeing the value already present). Choose your pass order accordingly in your pipeline.
Examples recap#
Global conditional (time aggregation)#
TOML CONFIG
[drs_settings.special.time_aggregation]
type = "conditional"
condition = "'pt' in '{{ time_frequency | default(\"mean\") | lower }}'"
true = "inst"
false = "mean"
CORDEX composite model (call)#
TOML CONFIG
[drs_settings.dialect.cordex.special.model]
type = "call"
call = "'{{ driving_model }}-{{ rcm_name }}-{{ rcm_version }}'"
CMIP6 lookups (realm / frequency)#
TOML CONFIG
[drs_settings.dialect.cmip6.special]
realm.type = "lookup"
realm.tree = ["{{ table_id }}", "{{ variable_id }}"]
realm.attribute = "realm"
time_frequency.type = "lookup"
time_frequency.tree = ["{{ table_id }}", "{{ variable_id }}"]
realm.attribute = "frequency"
Performance notes#
The lookup rule is designed for high repetition: even if filenames are unique, the
(table_id, variable_id)pairs repeat, so cached results eliminate costly I/O.Keep conditional and call expressions simple; they run per file.
Warning
Both
conditionalandcalluse eval with your model dict as the only scope (no Python builtins). Treat configuration as trusted input.Prefer Jinja templating (
{{ ... }}) for string assembly and limit Python expressions to straightforward logic.When using Jinja templating variable quoting is important.
Don’t use this method if you can’t expect consistency of attributes across many files.
Troubleshooting#
Nothing gets assigned: - Ensure the facet isn’t already present from path/data extraction. - Remember: falsy results (
"",[],0,False) are not assigned.Name errors in expressions: - In
conditional/callexpressions, only names from the model dict are available; use Jinja to substitute metadata values first ({{ variable }}).- Name errors:
Check quotes in Jinja templates.