romcomma.data.storage.Fold§

class Fold(parent, k, **kwargs)[source]§

Bases: Repository

A Fold is defined as a folder containing a data.csv, a meta.json file and a test.csv file. A Fold is a Repository equipped with a test_data pd.DataFrame backed by test.csv.

Additionally, a fold can reduce the dimensionality M of the input X.

Parameters:
__init__(parent, k, **kwargs)[source]§

Initialize Fold by reading existing files. Creation is handled by the classmethod Fold.from_dfs.

Parameters:
  • parent (Repository) – The parent Repository.

  • k (int) – The index of the Fold within parent.

  • M – The number of input columns used. If not 0 &lt M &lt self.M, all columns are used.

Methods

Y_split()

Split this Repository into L Y_splits.

__init__(parent, k, **kwargs)

Initialize Fold by reading existing files.

clean_copy(dst)

Make a clean copy of this repo.

fold_folder(k)

from_csv(folder, csv[, meta])

Create a Repository from a csv file.

from_df(folder, df[, meta])

Create a Repository from a pd.DataFrame.

from_dfs(parent, k, data, test_data[, ...])

Create a Fold from a pd.DataFrame.

into_K_folds(K[, shuffle_before_folding, ...])

Fold this repo into K Folds, indexed by range(K).

read_meta()

rotate_folds(rotation)

Uniformly rotate the Folds in a Repository.

write_meta()

Attributes

CSV_OPTIONS

K

The number of folds contained in this Repository.

L

The number of output columns in self.data.

M

The number of input columns in self.data.

META

N

The number of samples (rows of data).

X

The input X, as an (N,M) design Matrix with column headings.

X_rotation

The rotation matrix applied to the input variables self.X, stored in X_rotation.csv.

Y

The output Y as an (N,L) Matrix with column headings.

Y_splits

Lists the index and path of every Y_split in this Repository.

data

folder

folds

The indices of the folds contained in this Repository.

meta

normalization

test_csv

test_data

test_x

The test_data input x, as an (n,M) design Matrix with column headings.

test_y

The test_data output y as an (n,L) Matrix with column headings.

property test_x: DataFrame§

The test_data input x, as an (n,M) design Matrix with column headings.

property test_y: DataFrame§

The test_data output y as an (n,L) Matrix with column headings.

property X_rotation: ndarray§

The rotation matrix applied to the input variables self.X, stored in X_rotation.csv. Rotations are applied and stored cumulatively.

classmethod from_dfs(parent, k, data, test_data, normalization=None)[source]§

Create a Fold from a pd.DataFrame.

Parameters:
  • parent (Repository) – The parent Repository.

  • k (int) – The index of the fold to be created.

  • data (DataFrame) – Training data.

  • test_data (DataFrame) – Test data.

  • normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Fold

Returns: The Fold created.

property K: int§

The number of folds contained in this Repository.

property L: int§

The number of output columns in self.data.

property M: int§

The number of input columns in self.data.

property N: int§

The number of samples (rows of data).

property X: DataFrame§

The input X, as an (N,M) design Matrix with column headings.

property Y: DataFrame§

The output Y as an (N,L) Matrix with column headings.

Y_split()§

Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.

Raises:

TypeError – if self is a Fold.

property Y_splits: List[Tuple[int, Path]]§

Lists the index and path of every Y_split in this Repository.

clean_copy(dst)§

Make a clean copy of this repo.

Parameters:

dst (Path | str) – The location of the copy.

property folds: range§

The indices of the folds contained in this Repository.

classmethod from_csv(folder, csv, meta=None, **kwargs)§

Create a Repository from a csv file.

Parameters:
Return type:

Repository

Returns: A new Repository located in folder.

classmethod from_df(folder, df, meta=None)§

Create a Repository from a pd.DataFrame.

Parameters:
  • folder (Path | str) – The location (folder) of the Repository.

  • df (DataFrame) – The data to record in [Return].csv.

  • meta (Dict | None) – The metadata to record in [Return].meta.json.

Return type:

Repository

Returns: A new Repository.

into_K_folds(K, shuffle_before_folding=False, normalization=None)§

Fold this repo into K Folds, indexed by range(K).

Parameters:
  • K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.

  • shuffle_before_folding (bool) – Whether to shuffle the data before sampling.

  • normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Repository

Returns: self, for chaining calls. :raises IndexError: Unless 1 &lt= K &lt= N.

rotate_folds(rotation)§

Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.

Parameters:
  • rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.

  • orthogonal (If the matrix supplied has the wrong dimensions or is not) –

  • instead. (a random rotation is generated and used) –

Return type:

Repository

Returns: self, for chaining calls.