brain_pipe.save.default.DefaultSave

class DefaultSave(root_dir: str, to_save: ~typing.Mapping[str, ~typing.Any] | None = None, overwrite: bool = False, clear_output: bool = False, filename_fn: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str | None, str | None], str] = <brain_pipe.save.default.DefaultFilenameFn object>, metadata: ~brain_pipe.save.default.SaveMetadata = <brain_pipe.save.default.DefaultSaveMetadata object>, save_fn: ~typing.Callable[[~typing.Any, str], None] | ~typing.Mapping[str, ~typing.Callable[[~typing.Any, str], None]] | None = None, reload_fn: ~typing.Callable[[str], ~typing.Any] | ~typing.Mapping[str, ~typing.Callable[[str], ~typing.Any]] | None = None, check_done: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str, ~typing.Dict[str, ~typing.Any]], str | bool] | None = <brain_pipe.save.default.IsDoneCheck object>, check_reloadable: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str, ~typing.Dict[str, ~typing.Any]], str | bool] | None = <brain_pipe.save.default.IsReloadableCheck object>)

Bases: Save

Default save class.

This class will save data_dicts to disk, but also keep a metadata file (Save.metadata_filename) that contains the information about the mapping between an unprocessed input filename and multiple possible output filenames.

Attributes

DEFAULT_RELOAD_FUNCTIONS

DEFAULT_SAVE_FUNCTIONS

overwrite

Whether to overwrite existing files.

__init__(root_dir: str, to_save: ~typing.Mapping[str, ~typing.Any] | None = None, overwrite: bool = False, clear_output: bool = False, filename_fn: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str | None, str | None], str] = <brain_pipe.save.default.DefaultFilenameFn object>, metadata: ~brain_pipe.save.default.SaveMetadata = <brain_pipe.save.default.DefaultSaveMetadata object>, save_fn: ~typing.Callable[[~typing.Any, str], None] | ~typing.Mapping[str, ~typing.Callable[[~typing.Any, str], None]] | None = None, reload_fn: ~typing.Callable[[str], ~typing.Any] | ~typing.Mapping[str, ~typing.Callable[[str], ~typing.Any]] | None = None, check_done: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str, ~typing.Dict[str, ~typing.Any]], str | bool] | None = <brain_pipe.save.default.IsDoneCheck object>, check_reloadable: ~typing.Callable[[~typing.Dict[str, ~typing.Any], str, ~typing.Dict[str, ~typing.Any]], str | bool] | None = <brain_pipe.save.default.IsReloadableCheck object>)

Create a Save step.

Parameters:
  • root_dir (str) – The root directory where the data should be saved.

  • to_save (Optional[Mapping[str, Any]]) – The data to save. If None, the data_dict is saved entirely. If a mapping between feature names and the key of the data in the data_dict is given, only the data for the provided features is saved.

  • overwrite (bool) – Whether to overwrite existing files.

  • clear_output (bool) – Whether to clear the output data_dict after saving. This can save space when save is the last step in a pipeline.

  • filename_fn (FilenameFnInterface) – A function to generate a filename for the data. The function should take the data_dict, the feature name, the set name and a separator as input and return a filename.

  • save_fn (SaveFnInterface) – A function to save the data. The function should take the data and the filepath as inputs and save the data. If a mapping between file extensions and functions is given, the function corresponding to the file extension is used to save the data. If None, the default save functions (defined in self.DEFAULT_SAVE_FUNCTIONS) are used.

  • reload_fn (ReloadFnInterface) – A function to reload the data. The function should take the filepath as input and return the data. If a mapping between file extensions and functions is given, the function corresponding to the file extension is used to reload the data. If None, the default reload functions (defined in self.DEFAULT_RELOAD_FUNCTIONS) are used.

  • check_done (Optional[CheckInterface]) – A functor to check whether the data has already been saved. If None, no checking is done

  • check_reloadable (Optional[CheckInterface]) – A functor to check whether the data can be reloaded. if None, no checking is done.

Methods

__init__(root_dir[, to_save, overwrite, ...])

Create a Save step.

is_already_done(data_dict)

Check whether the data_dict has already been saved.

is_reloadable(data_dict)

Check whether an already processed data_dict can be reloaded.

parse_dict_keys(key[, name, ...])

Parse a key or a sequence of keys.

reload(data_dict)

Reload the data_dict from the saved file.

is_already_done(data_dict)

Check whether the data_dict has already been saved.

Parameters:

data_dict (Dict[str, Any]) – The data_dict to check.

Returns:

Whether the data_dict has already been saved. This will be checked in the stored metadata.

Return type:

bool

is_reloadable(data_dict: Dict[str, Any]) bool

Check whether an already processed data_dict can be reloaded.

Parameters:

data_dict (Dict[str, Any]) – The data_dict for which we want to reload the already processed version.

Returns:

Whether an already processed data_dict can be reloaded to continue processing.

Return type:

bool

property overwrite

Whether to overwrite existing files.

Returns:

Whether to overwrite existing files.

Return type:

bool

parse_dict_keys(key: str | Sequence[str] | Mapping[str, str], name='key', require_ordered_dict=False) OrderedDict[str, str]

Parse a key or a sequence of keys.

Parameters:
  • key (Union[str, Sequence[str], Mapping[str,str]]) – A key or a sequence of keys.

  • name (str) – The name of the key. Used for error messages.

  • require_ordered_dict (bool) – If True, the key must be an OrderedDict. If False, the key can also be an ordinary dict.

Returns:

A mapping of input keys to output keys.

Return type:

OrderedDict[str, str]

Raises:

TypeError – If the key is not a string, a sequence of strings or a mapping of strings. If the key is a mapping but require_ordered_dict is True and the mapping is not an OrderedDict.

reload(data_dict: Dict[str, Any]) Dict[str, Any]

Reload the data_dict from the saved file.

Parameters:

data_dict (Dict[str, Any]) – The data_dict for which we want to reload the already processed version.

Returns:

The reloaded data_dict.

Return type:

Dict[str, Any]