Defining and running Pipelines through configuration files¶
Instead of programmatically defining your Pipeline in a script, you can define it through a configuration file. This allows you to run your Pipeline in the same way you would run a script, but without having to write any code.
Running pipelines from configuration files¶
You can run the pipeline in a configuration file by just using the Command Line Interface :
$ brain_pipe config_file.extension
Background¶
A Parser
is used to parse a configuration string into a list of tuples
of DataLoader
and Pipeline
objects. This list will then be provided
to a Runner
to run all of the Pipeline
s.
All parsers can be found in the parsers
module. The most important Parser
classes are:
SimpleDictParser
- parses a dictionary of configuration options into a list ofDataLoader
andPipeline
objects. This is the base class for most other parsers.TextParser
- parses a string of configuration options into a list ofDataLoader
andPipeline
objects. This parser basically converts a text string into a dictionary and then uses theSimpleDictParser
, from which it inherits, to parse the dictionary.FileParser
- parses a file of configuration options into a list ofDataLoader
andPipeline
objects. This parser basically reads a file into text and then uses theTextParser
, from which it inherits, to parse the dictionary.
In addition to the TextParser
, a TemplateTextParser
exists.
This parser allows you to use Jinja2 templates in your
configuration text/files. This allows you to use variables and control structures
that can be filled in at runtime (see also Command Line Interface).
Note
The TemplateFileParser
defines a __file__
and __filedir__
variable pointing to the file that is being parsed and the directory in which
the file is located, respectively.
Note
When using Template
based parsers with the Command Line Interface, all missing variables
will be asked for as command line arguments.
Typical structure for SimpleDictParsers¶
Configuration files for SimpleDictParser
parsers (and subclasses) are dictionaries that require
the following keys:
data_loaders
- a list of dictionaries, each of which defines aDataLoader
object. Aname
key is required for each dictionary to link it to the appropriatePipeline
s object.pipelines
- a list of dictionaries, each of which defines aPipeline
object. Adata_from
key is required that specifies thename
of theDataLoader
object from which thePipeline
should load its data.config
- a dictionary of configuration options used for additional configuration of helper classes like theRunner
and logging.
Note
When the special key callable
is used in a dictionary, the value of that key
will be treated as a callable object. A Finder
will search for all
Callable
objects in the brain_pipe
module and extra paths
defined in the config
dictionary under the extra_paths
keys. The
other keys in the dictionary will be passed as keyword arguments to the
callable object.
Note
If you only want to pass a reference to a callable
, the is_pointer
keyword can be used.
Specifying the parser¶
By default, the CLI (see also Command Line Interface) will try to use the most appropriate
parser for the given input. Currently, the YAMLTemplateFileParser
is the
most common default, as it supports YAML and JSON
files with or without Jinja2 templates.