romcomma.data.storage.Fold§
- class Fold(parent, k, **kwargs)[source]§
Bases:
Repository
A Fold is defined as a folder containing a
data.csv
, ameta.json
file and atest.csv
file. A Fold is a Repository equipped with a test_data pd.DataFrame backed bytest.csv
.Additionally, a fold can reduce the dimensionality
M
of the inputX
.- Parameters:
parent (Repository) –
k (int) –
- __init__(parent, k, **kwargs)[source]§
Initialize Fold by reading existing files. Creation is handled by the classmethod Fold.from_dfs.
- Parameters:
parent (Repository) – The parent Repository.
k (int) – The index of the Fold within parent.
M – The number of input columns used. If not 0 < M < self.M, all columns are used.
Methods
Y_split
()Split this Repository into L Y_splits.
__init__
(parent, k, **kwargs)Initialize Fold by reading existing files.
clean_copy
(dst)Make a clean copy of this repo.
fold_folder
(k)from_csv
(folder, csv[, meta])Create a Repository from a csv file.
from_df
(folder, df[, meta])Create a Repository from a pd.DataFrame.
from_dfs
(parent, k, data, test_data[, ...])Create a Fold from a pd.DataFrame.
into_K_folds
(K[, shuffle_before_folding, ...])Fold this repo into K Folds, indexed by range(K).
read_meta
()rotate_folds
(rotation)Uniformly rotate the Folds in a Repository.
write_meta
()Attributes
CSV_OPTIONS
The number of folds contained in this Repository.
The number of output columns in self.data.
The number of input columns in self.data.
META
The number of samples (rows of data).
The input X, as an (N,M) design Matrix with column headings.
The rotation matrix applied to the input variables self.X, stored in X_rotation.csv.
The output Y as an (N,L) Matrix with column headings.
Lists the index and path of every Y_split in this Repository.
data
folder
The indices of the folds contained in this Repository.
meta
normalization
test_csv
test_data
The test_data input x, as an (n,M) design Matrix with column headings.
The test_data output y as an (n,L) Matrix with column headings.
- property test_x: DataFrame§
The test_data input x, as an (n,M) design Matrix with column headings.
- property test_y: DataFrame§
The test_data output y as an (n,L) Matrix with column headings.
- property X_rotation: ndarray§
The rotation matrix applied to the input variables self.X, stored in X_rotation.csv. Rotations are applied and stored cumulatively.
- classmethod from_dfs(parent, k, data, test_data, normalization=None)[source]§
Create a Fold from a pd.DataFrame.
- Parameters:
parent (Repository) – The parent Repository.
k (int) – The index of the fold to be created.
data (DataFrame) – Training data.
test_data (DataFrame) – Test data.
normalization (Path | str | None) – An optional normalization.csv file to use.
- Return type:
Returns: The Fold created.
- property K: int§
The number of folds contained in this Repository.
- property L: int§
The number of output columns in self.data.
- property M: int§
The number of input columns in self.data.
- property N: int§
The number of samples (rows of data).
- property X: DataFrame§
The input X, as an (N,M) design Matrix with column headings.
- property Y: DataFrame§
The output Y as an (N,L) Matrix with column headings.
- Y_split()§
Split this Repository into L Y_splits. Each Y.l is just a Repository containing the lth output only.
- Raises:
TypeError – if self is a Fold.
- property Y_splits: List[Tuple[int, Path]]§
Lists the index and path of every Y_split in this Repository.
- clean_copy(dst)§
Make a clean copy of this repo.
- Parameters:
dst (Path | str) – The location of the copy.
- property folds: range§
The indices of the folds contained in this Repository.
- classmethod from_csv(folder, csv, meta=None, **kwargs)§
Create a Repository from a csv file.
- Parameters:
folder (Path | str) – The location (folder) of the target Repository.
csv (Path | str) – The file containing the data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
kwargs – Updates Repository.CSV_OPTIONS for reading the csv file, as detailed in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pd.read_csv.html.
- Return type:
Returns: A new Repository located in folder.
- classmethod from_df(folder, df, meta=None)§
Create a Repository from a pd.DataFrame.
- Parameters:
folder (Path | str) – The location (folder) of the Repository.
df (DataFrame) – The data to record in [Return].csv.
meta (Dict | None) – The metadata to record in [Return].meta.json.
- Return type:
Returns: A new Repository.
- into_K_folds(K, shuffle_before_folding=False, normalization=None)§
Fold this repo into K Folds, indexed by range(K).
- Parameters:
K (int) – The number of Folds, of absolute value between 1 and N inclusive. An improper Fold, indexed by K and including all data for both training and testing is included by default. To suppress this give K as a negative integer.
shuffle_before_folding (bool) – Whether to shuffle the data before sampling.
normalization (Path | str | None) – An optional normalization.csv file to use.
- Return type:
Returns:
self
, for chaining calls. :raises IndexError: Unless 1 <= K <= N.
- rotate_folds(rotation)§
Uniformly rotate the Folds in a Repository. The rotation (like normalization) applies to each fold, not the repo itself.
- Parameters:
rotation (ndarray | None) – The (M,M) rotation matrix to apply to the inputs. If None, the identity matrix is used.
orthogonal (If the matrix supplied has the wrong dimensions or is not) –
instead. (a random rotation is generated and used) –
- Return type:
Returns:
self
, for chaining calls.