romcomma.data.storage.Repository§

class Repository(folder, **kwargs)[source]§

Bases: object

A repo object is defined as a folder containing a data.csv file and a meta.json file.

These files specify the global dataset to be analyzed. This dataset must be further split into Folds contained within the Repository.

Parameters:

folder (Path | str) –

__init__(folder, **kwargs)[source]§
Parameters:

folder (Path | str) –

Methods

Y_split()

Split this Repository into L Y_splits.

__init__(folder, **kwargs)

clean_copy(dst)

Make a clean copy of this repo.

fold_folder(k)

from_csv(folder, csv[, meta])

Create a Repository from a csv file.

from_df(folder, df[, meta])

Create a Repository from a pd.DataFrame.

into_K_folds(K[, shuffle_before_folding, ...])

Fold this repo into K Folds, indexed by range(K).

read_meta()

rotate_folds(rotation)

Uniformly rotate the Folds in a Repository.

write_meta()

Attributes

CSV_OPTIONS

K

The number of folds contained in this Repository.

L

The number of output columns in self.data.

M

The number of input columns in self.data.

META

N

The number of samples (rows of data).

X

The input X, as an (N,M) design Matrix with column headings.

Y

The output Y as an (N,L) Matrix with column headings.

Y_splits

Lists the index and path of every Y_split in this Repository.

data

folder

folds

The indices of the folds contained in this Repository.

meta

property X: DataFrame§

The input X, as an (N,M) design Matrix with column headings.

property Y: DataFrame§

The output Y as an (N,L) Matrix with column headings.

property N: int§

The number of samples (rows of data).

property M: int§

The number of input columns in self.data.

property L: int§

The number of output columns in self.data.

property K: int§

The number of folds contained in this Repository.

clean_copy(dst)[source]§

Make a clean copy of this repo.

Parameters:

dst (Path | str) – The location of the copy.

property folds: range§

The indices of the folds contained in this Repository.

into_K_folds(K, shuffle_before_folding=False, normalization=None)[source]§

Fold this repo into K Folds, indexed by range(K).

Parameters:
  • K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.

  • shuffle_before_folding (bool) – Whether to shuffle the data before sampling.

  • normalization (Path | str | None) – An optional normalization.csv file to use.

Return type:

Repository

Returns: self, for chaining calls. :raises IndexError: Unless 1 &lt= K &lt= N.

rotate_folds(rotation)[source]§

Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.

Parameters:
  • rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.

  • orthogonal (If the matrix supplied has the wrong dimensions or is not) –

  • instead. (a random rotation is generated and used) –

Return type:

Repository

Returns: self, for chaining calls.

Y_split()[source]§

Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.

Raises:

TypeError – if self is a Fold.

property Y_splits: List[Tuple[int, Path]]§

Lists the index and path of every Y_split in this Repository.

classmethod from_df(folder, df, meta=None)[source]§

Create a Repository from a pd.DataFrame.

Parameters:
  • folder (Path | str) – The location (folder) of the Repository.

  • df (DataFrame) – The data to record in [Return].csv.

  • meta (Dict | None) – The metadata to record in [Return].meta.json.

Return type:

Repository

Returns: A new Repository.

classmethod from_csv(folder, csv, meta=None, **kwargs)[source]§

Create a Repository from a csv file.

Parameters:
Return type:

Repository

Returns: A new Repository located in folder.