Defining and running Pipelines through configuration files¶
Instead of programmatically defining your Pipeline in a script, you can define it through a configuration file. This allows you to run your Pipeline in the same way you would run a script, but without having to write any code.
Running pipelines from configuration files¶
You can run the pipeline in a configuration file by just using the Command Line Interface :
$ brain_pipe config_file.extension
Background¶
A Parser is used to parse a configuration string into a list of tuples
of DataLoader and Pipeline objects. This list will then be provided
to a Runner to run all of the Pipeline s.
All parsers can be found in the parsers module. The most important Parser
classes are:
SimpleDictParser- parses a dictionary of configuration options into a list ofDataLoaderandPipelineobjects. This is the base class for most other parsers.TextParser- parses a string of configuration options into a list ofDataLoaderandPipelineobjects. This parser basically converts a text string into a dictionary and then uses theSimpleDictParser, from which it inherits, to parse the dictionary.FileParser- parses a file of configuration options into a list ofDataLoaderandPipelineobjects. This parser basically reads a file into text and then uses theTextParser, from which it inherits, to parse the dictionary.
In addition to the TextParser, a TemplateTextParser exists.
This parser allows you to use Jinja2 templates in your
configuration text/files. This allows you to use variables and control structures
that can be filled in at runtime (see also Command Line Interface).
Note
The TemplateFileParser defines a __file__ and __filedir__
variable pointing to the file that is being parsed and the directory in which
the file is located, respectively.
Note
When using Template based parsers with the Command Line Interface, all missing variables
will be asked for as command line arguments.
Typical structure for SimpleDictParsers¶
Configuration files for SimpleDictParser parsers (and subclasses) are dictionaries that require
the following keys:
data_loaders- a list of dictionaries, each of which defines aDataLoaderobject. Anamekey is required for each dictionary to link it to the appropriatePipelines object.pipelines- a list of dictionaries, each of which defines aPipelineobject. Adata_fromkey is required that specifies thenameof theDataLoaderobject from which thePipelineshould load its data.config- a dictionary of configuration options used for additional configuration of helper classes like theRunnerand logging.
Note
When the special key callable is used in a dictionary, the value of that key
will be treated as a callable object. A Finder will search for all
Callable objects in the brain_pipe module and extra paths
defined in the config dictionary under the extra_paths keys. The
other keys in the dictionary will be passed as keyword arguments to the
callable object.
Note
If you only want to pass a reference to a callable, the is_pointer
keyword can be used.
Specifying the parser¶
By default, the CLI (see also Command Line Interface) will try to use the most appropriate
parser for the given input. Currently, the YAMLTemplateFileParser is the
most common default, as it supports YAML and JSON
files with or without Jinja2 templates.