romcomma.data.storage.Fold§

class Fold(parent, k, **kwargs)[source]§

Bases: Repository

A Fold is defined as a folder containing a data.csv, a meta.json file and a test.csv file. A Fold is a Repository equipped with a test_data pd.DataFrame backed by test.csv.

Additionally, a fold can reduce the dimensionality M of the input X.

Parameters:

parent (Repository) –
k (int) –

__init__(parent, k, **kwargs)[source]§

Initialize Fold by reading existing files. Creation is handled by the classmethod Fold.from_dfs.

Parameters:

parent (Repository) – The parent Repository.
k (int) – The index of the Fold within parent.
M – The number of input columns used. If not 0 &lt M &lt self.M, all columns are used.

Methods

`Y_split`()	Split this Repository into L Y_splits.
`__init__`(parent, k, **kwargs)	Initialize Fold by reading existing files.
`clean_copy`(dst)	Make a clean copy of this repo.
`fold_folder`(k)
`from_csv`(folder, csv[, meta])	Create a Repository from a csv file.
`from_df`(folder, df[, meta])	Create a Repository from a pd.DataFrame.
`from_dfs`(parent, k, data, test_data[, ...])	Create a Fold from a pd.DataFrame.
`into_K_folds`(K[, shuffle_before_folding, ...])	Fold this repo into K Folds, indexed by range(K).
`read_meta`()
`rotate_folds`(rotation)	Uniformly rotate the Folds in a Repository.
`write_meta`()

Attributes

`CSV_OPTIONS`
`K`	The number of folds contained in this Repository.
`L`	The number of output columns in self.data.
`M`	The number of input columns in self.data.
`META`
`N`	The number of samples (rows of data).
`X`	The input X, as an (N,M) design Matrix with column headings.
`X_rotation`	The rotation matrix applied to the input variables self.X, stored in X_rotation.csv.
`Y`	The output Y as an (N,L) Matrix with column headings.
`Y_splits`	Lists the index and path of every Y_split in this Repository.
`data`
`folder`
`folds`	The indices of the folds contained in this Repository.
`meta`
`normalization`
`test_csv`
`test_data`
`test_x`	The test_data input x, as an (n,M) design Matrix with column headings.
`test_y`	The test_data output y as an (n,L) Matrix with column headings.

property test_x: DataFrame§: The test_data input x, as an (n,M) design Matrix with column headings.

property test_y: DataFrame§: The test_data output y as an (n,L) Matrix with column headings.

property X_rotation: ndarray§: The rotation matrix applied to the input variables self.X, stored in X_rotation.csv. Rotations are applied and stored cumulatively.

classmethod from_dfs(parent, k, data, test_data, normalization=None)[source]§

Create a Fold from a pd.DataFrame.

Parameters:

parent (Repository) – The parent Repository.
k (int) – The index of the fold to be created.
data (DataFrame) – Training data.
test_data (DataFrame) – Test data.
normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Fold

Returns: The Fold created.

property K: int§: The number of folds contained in this Repository.

property L: int§: The number of output columns in self.data.

property M: int§: The number of input columns in self.data.

property N: int§: The number of samples (rows of data).

property X: DataFrame§: The input X, as an (N,M) design Matrix with column headings.

property Y: DataFrame§: The output Y as an (N,L) Matrix with column headings.

Y_split()§

Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.

Raises:: TypeError – if self is a Fold.

property Y_splits: List[Tuple[int, Path]]§: Lists the index and path of every Y_split in this Repository.

clean_copy(dst)§

Make a clean copy of this repo.

Parameters:: dst (Path | str) – The location of the copy.

property folds: range§: The indices of the folds contained in this Repository.

classmethod from_csv(folder, csv, meta=None, **kwargs)§

Create a Repository from a csv file.

Parameters:

folder (Path | str) – The location (folder) of the target Repository.
csv (Path | str) – The file containing the data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
kwargs – Updates Repository.CSV_OPTIONS for reading the csv file, as detailed in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pd.read_csv.html.

Return type:

Repository

Returns: A new Repository located in folder.

classmethod from_df(folder, df, meta=None)§

Create a Repository from a pd.DataFrame.

Parameters:

folder (Path | str) – The location (folder) of the Repository.
df (DataFrame) – The data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.

Return type:

Repository

Returns: A new Repository.

into_K_folds(K, shuffle_before_folding=False, normalization=None)§

Fold this repo into K Folds, indexed by range(K).

Parameters:

K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.
shuffle_before_folding (bool) – Whether to shuffle the data before sampling.
normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Repository

Returns: self, for chaining calls. :raises IndexError: Unless 1 &lt= K &lt= N.

rotate_folds(rotation)§

Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.

Parameters:

rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.
orthogonal (If the matrix supplied has the wrong dimensions or is not) –
instead. (a random rotation is generated and used) –

Return type:

Repository

Returns: self, for chaining calls.