romcomma.data.storage.Repository§
- class Repository(folder, **kwargs)[source]§
Bases:
object
A
repo
object is defined as a folder containing adata.csv
file and ameta.json
file.These files specify the global dataset to be analyzed. This dataset must be further split into Folds contained within the Repository.
- Parameters:
folder (Path | str) –
Methods
Y_split
()Split this Repository into L Y_splits.
__init__
(folder, **kwargs)clean_copy
(dst)Make a clean copy of this repo.
fold_folder
(k)from_csv
(folder, csv[, meta])Create a Repository from a csv file.
from_df
(folder, df[, meta])Create a Repository from a pd.DataFrame.
into_K_folds
(K[, shuffle_before_folding, ...])Fold this repo into K Folds, indexed by range(K).
read_meta
()rotate_folds
(rotation)Uniformly rotate the Folds in a Repository.
write_meta
()Attributes
CSV_OPTIONS
The number of folds contained in this Repository.
The number of output columns in self.data.
The number of input columns in self.data.
META
The number of samples (rows of data).
The input X, as an (N,M) design Matrix with column headings.
The output Y as an (N,L) Matrix with column headings.
Lists the index and path of every Y_split in this Repository.
data
folder
The indices of the folds contained in this Repository.
meta
- property X: DataFrame§
The input X, as an (N,M) design Matrix with column headings.
- property Y: DataFrame§
The output Y as an (N,L) Matrix with column headings.
- property N: int§
The number of samples (rows of data).
- property M: int§
The number of input columns in self.data.
- property L: int§
The number of output columns in self.data.
- property K: int§
The number of folds contained in this Repository.
- clean_copy(dst)[source]§
Make a clean copy of this repo.
- Parameters:
dst (Path | str) – The location of the copy.
- property folds: range§
The indices of the folds contained in this Repository.
- into_K_folds(K, shuffle_before_folding=False, normalization=None)[source]§
Fold this repo into K Folds, indexed by range(K).
- Parameters:
K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.
shuffle_before_folding (bool) – Whether to shuffle the data before sampling.
normalization (Path | str | None) – An optional normalization.csv file to use.
- Return type:
Returns:
self
, for chaining calls. :raises IndexError: Unless 1 <= K <= N.
- rotate_folds(rotation)[source]§
Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.
- Parameters:
rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.
orthogonal (If the matrix supplied has the wrong dimensions or is not) –
instead. (a random rotation is generated and used) –
- Return type:
Returns:
self
, for chaining calls.
- Y_split()[source]§
Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.
- Raises:
TypeError – if self is a Fold.
- property Y_splits: List[Tuple[int, Path]]§
Lists the index and path of every Y_split in this Repository.
- classmethod from_df(folder, df, meta=None)[source]§
Create a Repository from a pd.DataFrame.
- Parameters:
folder (Path | str) – The location (folder) of the Repository.
df (DataFrame) – The data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
- Return type:
Returns: A new Repository.
- classmethod from_csv(folder, csv, meta=None, **kwargs)[source]§
Create a Repository from a csv file.
- Parameters:
folder (Path | str) – The location (folder) of the target Repository.
csv (Path | str) – The file containing the data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
kwargs – Updates Repository.CSV_OPTIONS for reading the csv file, as detailed in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pd.read_csv.html.
- Return type:
Returns: A new Repository located in folder.