sumnplot.discretisation.Discretiser

class sumnplot.discretisation.Discretiser(variable)[source]

Bases: abc.ABC, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Abstract base class for different discretisation methods.

This abstract base class is a transformer compatible with scikit-learn.

Parameters

variable (str) – Column to discretise in X, when the transform method is called.

__init__(variable)[source]

Methods

__init__(variable)

fit(X[, y, sample_weight])

Calculate cut points for given discretisation approach.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Cut variable in X at cut_points.

abstract fit(X, y=None, sample_weight=None)[source]

Calculate cut points for given discretisation approach.

The cut_points attribute should be set by this method.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(X)[source]

Cut variable in X at cut_points. This function uses the pd.cut method.

A specific null category is added on the cut output.

Parameters

X (pd.DataFrame) – DataFrame containing column to discretise. This column is defined by the variable attribute.

Returns

variable_cut – Discretised variable.

Return type

pd.Series