brain_pipe.pipeline.cache.base.PipelineCache¶
- class PipelineCache(cache_root: str, cache_key: str = 'cache', previous_cache_folder_key: str = 'previous_cache', serializer_fn: ~typing.Callable[[~typing.Any, str], None] = <function pickle_dump_wrapper>, deserializer_fn: ~typing.Callable[[str], ~typing.Any] = <function pickle_load_wrapper>)¶
Bases:
ABC
A cache to store intermediate data_dicts of a PreprocessingPipeline.
- __init__(cache_root: str, cache_key: str = 'cache', previous_cache_folder_key: str = 'previous_cache', serializer_fn: ~typing.Callable[[~typing.Any, str], None] = <function pickle_dump_wrapper>, deserializer_fn: ~typing.Callable[[str], ~typing.Any] = <function pickle_load_wrapper>)¶
Create a PipelineCache.
- Parameters:
cache_root (str) – Path to the cache folder
cache_key (str) – Key to use to store the cache location
previous_cache_folder_key (str) – Key to store the previous location of the cache
serializer_fn (Callable[[Any, str], None]) – Function that can serialize data at a certain file location. The first argument is the data, second argument is the file location.
deserializer_fn (Callable[[str], Any]) – Function that can deserialize data at a certain file location. The first argument is the file location. The return value is the deserialized data.
Methods
__init__
(cache_root[, cache_key, ...])Create a PipelineCache.
find_existing_cache_from_data_dict
(step, ...)Find existing cache files for a step.
Find existing cache files for a step.
get_cache_dict
(path, step, step_index)Create a dictionary containing the cache information.
get_existing_cache_paths
(step, data_dict, ...)Get all existing cache paths for a step.
get_filename
(data_dict)Get the filename for a data_dict.
get_foldername
(step, step_index)Get the foldername for a step.
get_path
(step, data_dict, step_index)Create a path for a cache file.
load
(path)Load a cache file.
load_from_data_dict
(data_dict)Load a cache file from a data_dict.
predict_filenames_from_data_dict
(data_dict)Predict possible filenames from a data_dict.
Predict possible filenames from a previous filename.
predict_paths_from_data_dict
(step, ...)Predict the paths of the cache files that will be created by a step.
predict_paths_from_previous_filename
(step, ...)Predict the paths of the cache files that will be created by a step.
save
(path, data_dict)Save a cache file.
- find_existing_cache_from_data_dict(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str] ¶
Find existing cache files for a step.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
data_dict (Dict[str, Any]) – Dictionary containing the data.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Sequence of possibly existing cache paths
- Return type:
Sequence[str]
- find_existing_cache_from_previous_filename(step: PipelineStep, previous_filename: str, step_index: int | None) Sequence[str] ¶
Find existing cache files for a step.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
previous_filename (str) – Path to the previous cache file.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Sequence of possibly existing cache paths
- Return type:
Sequence[str]
- get_cache_dict(path: str, step: PipelineStep, step_index: int | None) Dict[str, Any] ¶
Create a dictionary containing the cache information.
- Parameters:
path (str) – Path to the cache file
step (PipelineStep) – PipelineStep that is being cached
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Dictionary containing the cache information
- Return type:
Dict[str, Any]
- get_existing_cache_paths(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str] ¶
Get all existing cache paths for a step.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
data_dict (Dict[str, Any]) – Dictionary containing the data.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Sequence of possibly existing cache paths
- Return type:
Sequence[str]
- abstract get_foldername(step: PipelineStep, step_index: int | None) str ¶
Get the foldername for a step.
- Parameters:
step (PipelineStep) – PreprocessingStep that is being cached
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Name of cache folder
- Return type:
- get_path(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) str ¶
Create a path for a cache file.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
data_dict (Dict[str, Any]) – Dictionary containing the data.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Path to the cache file
- Return type:
- abstract predict_filenames_from_data_dict(data_dict: Dict[str, Any]) Sequence[Sequence[str]] ¶
Predict possible filenames from a data_dict.
- abstract predict_filenames_from_previous_filename(previous_filename: str) Sequence[str] ¶
Predict possible filenames from a previous filename.
- predict_paths_from_data_dict(step: PipelineStep, data_dict: Dict[str, Any], step_index: int | None) Sequence[str] ¶
Predict the paths of the cache files that will be created by a step.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
data_dict – Dictionary containing the data. The path will be predicted from this.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Sequence of predicted cache paths
- Return type:
Sequence[str]
- predict_paths_from_previous_filename(step: PipelineStep, previous_filename: str, step_index: int | None) Sequence[str] ¶
Predict the paths of the cache files that will be created by a step.
- Parameters:
step (PipelineStep) – PipelineStep that is being cached
previous_filename (str) – Path to the previous cache file. The path will be predicted from this.
step_index (Optional[int]) – Index of the step in the pipeline. None if the step is not part of a pipeline.
- Returns:
Sequence of predicted cache paths
- Return type:
Sequence[str]