Configuration
A Scarf file is a plain YAML document with a defined structure.
Full schema
name: my-pipeline # required — human-readable name
description: optional text
# External YAML files. Each key becomes a namespace accessible via ${key.*}.
refs:
queries: ./sql/queries.yaml
# Default argument values.
# null marks a required arg that must be supplied at runtime.
args:
language: en
count: 10
output: null # required — CLI will prompt if not given via -p
steps:
- id: step-name # optional; required if referenced by other steps
generator:
name: dotted.path.ClassName
args:
language: ${args.language}
count: ${args.count}
transformer: # chained after generator; values auto-injected
name: dotted.path.fn
args:
extra_arg: value
loader: # runs after transformer; values auto-injected
name: dotted.path.Loader
args:
path: ${args.output}
Step structure
Each step must declare at least one of generator, transformer / transformers, or loader / loaders.
Generator only
Produces values and stores them under the step's id. No side effects.
- id: names
generator:
name: scarfolder.generators.util.Constant
args:
value: Alice
count: 5
Generator + transformer(s)
The transformer runs immediately after the generator. The pipeline automatically injects the generator's output as values — you do not need an explicit ${steps.*} reference.
- id: names
generator:
name: scarfolder.generators.util.Constant
args:
value: alice
count: 5
transformer: scarfolder.transformers.text.capitalize_first
Use transformers (or a YAML list under transformer) to chain multiple transformers in sequence. Each one receives the output of the previous:
- id: greetings
generator:
name: scarfolder.generators.util.Constant
args:
value: hello world
count: 3
transformers:
- name: scarfolder.transformers.text.capitalize_first
- name: scarfolder.transformers.text.format_template
args:
template: "→ {value}"
Standalone transformer
When there is no generator, the transformer is the primary producer and must supply all its inputs explicitly through args:
- id: upper_names
transformer:
name: scarfolder.transformers.text.upper
args:
values: ${steps.names}
Generator + loader
Attach a loader (or loaders list) to consume the step's output as a side effect. Values are auto-injected — path and any other loader-specific args are still declared explicitly:
- generator:
name: scarfolder.generators.util.Range
args:
stop: 10
loader:
name: scarfolder.loaders.file.WriteLines
args:
path: output.txt
Fan out to multiple loaders by using a list:
- generator:
name: scarfolder.generators.util.Constant
args:
value: hello
count: 3
loaders:
- name: scarfolder.loaders.console.Print
- name: scarfolder.loaders.file.WriteLines
args:
path: out.txt
Standalone loader
A loader-only step receives its data explicitly through args:
- loader:
name: scarfolder.loaders.console.Print
args:
values: ${steps.greetings}
Plugin short form
When a plugin has no extra args, write it as a plain string:
# verbose
transformer:
name: scarfolder.transformers.text.upper
args: {}
# short form — identical behaviour
transformer: scarfolder.transformers.text.upper
Args and placeholders
Declaring args
args:
language: en # has a default
count: 10 # has a default
output: null # required — must be supplied via -p or interactive prompt
Supplying args at runtime
scarfolder run pipeline.yaml -planguage=it -pcount=50 -poutput=result.txt
Placeholder syntax
args:
path: ${args.output} # from config defaults or CLI
values: ${steps.names} # output list of a previous step
query: ${queries.insert_user} # key from an external ref file
count: ${count} # shorthand for ${args.count}
token: ${env.API_TOKEN} # OS environment variable
Type preservation: if the entire value is a single placeholder, the resolved Python object is used directly — not its string representation. This is essential for passing lists between steps:
args:
values: ${steps.names} # receives the actual list, not its string form
If the placeholder is embedded inside a string, the result is always a string:
args:
message: "Hello ${args.name}!" # always a string
External refs
Load external YAML files and reference their contents anywhere in the config:
refs:
queries: ./sql/queries.yaml
steps:
- generator:
name: my_pkg.generators.SqlRows
args:
query: ${queries.select_users}
queries.yaml:
select_users: "SELECT id, name FROM users"
insert_user: "INSERT INTO users (name) VALUES (?)"
Ref files are loaded relative to the Scarf file's directory.
Step dependencies
Steps are automatically sorted in topological order. Any ${steps.*} reference anywhere in a step's args — including inside chained transformers and loaders — is picked up as a dependency. Declaration order in the file does not matter:
steps:
- id: full_names
generator:
name: scarfolder.generators.util.Combine
args:
streams:
- ${steps.first_names} # declared below — still fine
- ${steps.last_names}
transformer: scarfolder.transformers.text.join
- id: first_names
generator:
name: scarfolder.generators.util.Constant
args: { value: Alice, count: 3 }
- id: last_names
generator:
name: scarfolder.generators.util.Constant
args: { value: Smith, count: 3 }