brain_pipe.pipeline.cache.base.PipelineCache

class PipelineCache(cache_root: str, cache_key: str = 'cache', previous_cache_folder_key: str = 'previous_cache', serializer_fn: ~typing.Callable[[~typing.Any, str], None] = <function pickle_dump_wrapper>, deserializer_fn: ~typing.Callable[[str], ~typing.Any] = <function pickle_load_wrapper>)

Bases: ABC

A cache to store intermediate data_dicts of a PreprocessingPipeline.

__init__(cache_root: str, cache_key: str = 'cache', previous_cache_folder_key: str = 'previous_cache', serializer_fn: ~typing.Callable[[~typing.Any, str], None] = <function pickle_dump_wrapper>, deserializer_fn: ~typing.Callable[[str], ~typing.Any] = <function pickle_load_wrapper>)

Create a PipelineCache.

Parameters:
  • cache_root (str) – Path to the cache folder

  • cache_key (str) – Key to use to store the cache location

  • previous_cache_folder_key (str) – Key to store the previous location of the cache

  • serializer_fn (Callable[[Any, str], None]) – Function that can serialize data at a certain file location. The first argument is the data, second argument is the file location.

  • deserializer_fn (Callable[[str], Any]) – Function that can deserialize data at a certain file location. The first argument is the file location. The return value is the deserialized data.

Methods

__init__(cache_root[, cache_key, ...])

Create a PipelineCache.

find_existing_cache_from_data_dict(step, ...)

Find existing cache files for a step.

find_existing_cache_from_previous_filename(...)

Find existing cache files for a step.

get_cache_dict(path, step, step_index)

Create a dictionary containing the cache information.

get_existing_cache_paths(step, data_dict, ...)

Get all existing cache paths for a step.

get_filename(data_dict)

Get the filename for a data_dict.

get_foldername(step, step_index)

Get the foldername for a step.

get_path(step, data_dict, step_index)

Create a path for a cache file.

load(path)

Load a cache file.

load_from_data_dict(data_dict)

Load a cache file from a data_dict.

predict_filenames_from_data_dict(data_dict)

Predict possible filenames from a data_dict.

predict_filenames_from_previous_filename(...)

Predict possible filenames from a previous filename.

predict_paths_from_data_dict(step, ...)

Predict the paths of the cache files that will be created by a step.

predict_paths_from_previous_filename(step, ...)

Predict the paths of the cache files that will be created by a step.

save(path, data_dict)

Save a cache file.

find_existing_cache_from_data_dict(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str]

Find existing cache files for a step.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • data_dict (Dict[str, Any]) – Dictionary containing the data.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Sequence of possibly existing cache paths

Return type:

Sequence[str]

find_existing_cache_from_previous_filename(step: PipelineStep, previous_filename: str, step_index: int | None) Sequence[str]

Find existing cache files for a step.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • previous_filename (str) – Path to the previous cache file.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Sequence of possibly existing cache paths

Return type:

Sequence[str]

get_cache_dict(path: str, step: PipelineStep, step_index: int | None) Dict[str, Any]

Create a dictionary containing the cache information.

Parameters:
  • path (str) – Path to the cache file

  • step (PipelineStep) – PipelineStep that is being cached

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Dictionary containing the cache information

Return type:

Dict[str, Any]

get_existing_cache_paths(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str]

Get all existing cache paths for a step.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • data_dict (Dict[str, Any]) – Dictionary containing the data.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Sequence of possibly existing cache paths

Return type:

Sequence[str]

abstract get_filename(data_dict: Dict[str, Any]) str

Get the filename for a data_dict.

Parameters:

data_dict (dict)

Returns:

Name of cache file

Return type:

str

abstract get_foldername(step: PipelineStep, step_index: int | None) str

Get the foldername for a step.

Parameters:
  • step (PipelineStep) – PreprocessingStep that is being cached

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Name of cache folder

Return type:

str

get_path(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) str

Create a path for a cache file.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • data_dict (Dict[str, Any]) – Dictionary containing the data.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Path to the cache file

Return type:

str

load(path: str) Dict[str, Any]

Load a cache file.

Parameters:

path (str) – Path to the cache file

Returns:

Dictionary containing the cache information

Return type:

Dict[str, Any]

load_from_data_dict(data_dict: Dict[str, Any]) Dict[str, Any]

Load a cache file from a data_dict.

Parameters:

data_dict (Dict[str, Any]) – Dictionary containing the cache information

Returns:

Dictionary containing the cache information

Return type:

Dict[str, Any]

abstract predict_filenames_from_data_dict(data_dict: Dict[str, Any]) Sequence[Sequence[str]]

Predict possible filenames from a data_dict.

Parameters:

data_dict (Dict[str, Any]) – Dictionary containing the data

Returns:

Sequence of Sequence of possible filenames. In the innermost sequence, all files should (have been) created at the same time

Return type:

Sequence[Sequence[str]]

abstract predict_filenames_from_previous_filename(previous_filename: str) Sequence[str]

Predict possible filenames from a previous filename.

Parameters:

previous_filename (str) – Filename of the previous step

Returns:

Sequence of possible filenames

Return type:

Sequence[str]

predict_paths_from_data_dict(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str]

Predict the paths of the cache files that will be created by a step.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • data_dict – Dictionary containing the data. The path will be predicted from this.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Sequence of predicted cache paths

Return type:

Sequence[str]

predict_paths_from_previous_filename(step: PipelineStep, previous_filename: str, step_index: int | None) Sequence[str]

Predict the paths of the cache files that will be created by a step.

Parameters:
  • step (PipelineStep) – PipelineStep that is being cached

  • previous_filename (str) – Path to the previous cache file. The path will be predicted from this.

  • step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.

Returns:

Sequence of predicted cache paths

Return type:

Sequence[str]

save(path: str, data_dict: Dict[str, Any])

Save a cache file.

Parameters:
  • path (str) – Path to save the cache file to.

  • data_dict (Dict[str, Any]) – Dictionary containing the cache information

Returns:

Dictionary containing the cache information

Return type:

Dict[str, Any]