brain_pipe.split.base.Splitter

class Splitter(feature_mapping: Dict[str, Any] | Sequence[str] | str, split_fractions: Sequence[int | float], split_names: Sequence[str], extra_operation: SplitterOperation | None = None, axis=0)

Bases: PipelineStep, ABC

Base class for splitting data into sets.

__init__(feature_mapping: Dict[str, Any] | Sequence[str] | str, split_fractions: Sequence[int | float], split_names: Sequence[str], extra_operation: SplitterOperation | None = None, axis=0)

Create a splitter.

Parameters:
  • feature_mapping (Union[Dict[str, Any], Sequence[str], str]) – A mapping from the data key to the key of the data to split.

  • split_fractions (Sequence[Union[int, float]]) – Fractions of the data to split into the different sets.

  • split_names (Sequence[str]) – Names of the different sets.

  • extra_operation (Optional[SplitterOperation]) – Operation to perform on the split data. If None, no operation is performed.

  • axis (int) – Axis to split the data on.

Methods

__init__(feature_mapping, split_fractions, ...)

Create a splitter.

parse_dict_keys(key[, name, ...])

Parse a key or a sequence of keys.

split(data, shortest_length, split_fraction, ...)

Split the data into sets.

parse_dict_keys(key: str | Sequence[str] | Mapping[str, str], name='key', require_ordered_dict=False) OrderedDict[str, str]

Parse a key or a sequence of keys.

Parameters:
  • key (Union[str, Sequence[str], Mapping[str,str]]) – A key or a sequence of keys.

  • name (str) – The name of the key. Used for error messages.

  • require_ordered_dict (bool) – If True, the key must be an OrderedDict. If False, the key can also be an ordinary dict.

Returns:

A mapping of input keys to output keys.

Return type:

OrderedDict[str, str]

Raises:

TypeError – If the key is not a string, a sequence of strings or a mapping of strings. If the key is a mapping but require_ordered_dict is True and the mapping is not an OrderedDict.

abstract split(data: Any, shortest_length: int, split_fraction: float, start_index: int) Tuple[Any, int]

Split the data into sets.

Parameters:
  • data (Any) – Data to split.

  • shortest_length (int) – Length of the shortest data.

  • split_fraction (float) – Fraction of the data to split into the current set.

  • start_index (int) – Index to start splitting the data from.

Returns:

The split data and the index to start splitting the next data from.

Return type:

Any, int