romcomma.data.storage.Repository§

class Repository(folder, **kwargs)[source]§

Bases: object

A repo object is defined as a folder containing a data.csv file and a meta.json file.

These files specify the global dataset to be analyzed. This dataset must be further split into Folds contained within the Repository.

Parameters:: folder (Path | str) –

__init__(folder, **kwargs)[source]§

Parameters:: folder (Path | str) –

Methods

`Y_split`()	Split this Repository into L Y_splits.
`__init__`(folder, **kwargs)
`clean_copy`(dst)	Make a clean copy of this repo.
`fold_folder`(k)
`from_csv`(folder, csv[, meta])	Create a Repository from a csv file.
`from_df`(folder, df[, meta])	Create a Repository from a pd.DataFrame.
`into_K_folds`(K[, shuffle_before_folding, ...])	Fold this repo into K Folds, indexed by range(K).
`read_meta`()
`rotate_folds`(rotation)	Uniformly rotate the Folds in a Repository.
`write_meta`()

Attributes

`CSV_OPTIONS`
`K`	The number of folds contained in this Repository.
`L`	The number of output columns in self.data.
`M`	The number of input columns in self.data.
`META`
`N`	The number of samples (rows of data).
`X`	The input X, as an (N,M) design Matrix with column headings.
`Y`	The output Y as an (N,L) Matrix with column headings.
`Y_splits`	Lists the index and path of every Y_split in this Repository.
`data`
`folder`
`folds`	The indices of the folds contained in this Repository.
`meta`

property X: DataFrame§: The input X, as an (N,M) design Matrix with column headings.

property Y: DataFrame§: The output Y as an (N,L) Matrix with column headings.

property N: int§: The number of samples (rows of data).

property M: int§: The number of input columns in self.data.

property L: int§: The number of output columns in self.data.

property K: int§: The number of folds contained in this Repository.

clean_copy(dst)[source]§

Make a clean copy of this repo.

Parameters:: dst (Path | str) – The location of the copy.

property folds: range§: The indices of the folds contained in this Repository.

into_K_folds(K, shuffle_before_folding=False, normalization=None)[source]§

Fold this repo into K Folds, indexed by range(K).

Parameters:

K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.
shuffle_before_folding (bool) – Whether to shuffle the data before sampling.
normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Repository

Returns: self, for chaining calls. :raises IndexError: Unless 1 &lt= K &lt= N.

rotate_folds(rotation)[source]§

Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.

Parameters:

rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.
orthogonal (If the matrix supplied has the wrong dimensions or is not) –
instead. (a random rotation is generated and used) –

Return type:

Repository

Returns: self, for chaining calls.

Y_split()[source]§

Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.

Raises:: TypeError – if self is a Fold.

property Y_splits: List[Tuple[int, Path]]§: Lists the index and path of every Y_split in this Repository.

classmethod from_df(folder, df, meta=None)[source]§

Create a Repository from a pd.DataFrame.

Parameters:

folder (Path | str) – The location (folder) of the Repository.
df (DataFrame) – The data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.

Return type:

Repository

Returns: A new Repository.

classmethod from_csv(folder, csv, meta=None, **kwargs)[source]§

Create a Repository from a csv file.

Parameters:

folder (Path | str) – The location (folder) of the target Repository.
csv (Path | str) – The file containing the data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
kwargs – Updates Repository.CSV_OPTIONS for reading the csv file, as detailed in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pd.read_csv.html.

Return type:

Repository

Returns: A new Repository located in folder.