sumnplot.discretisation.QuantileDiscretiser

class sumnplot.discretisation.QuantileDiscretiser(variable, quantiles=(0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0))[source]

Bases: sumnplot.discretisation.Discretiser

Quantile discretisation.

This tansformer uses cut points defined by quantiles of the given variable.

Note, this transformer handles weighted quantiles.

Parameters

variable (str) – Column to discretise in X, when the transform method is called.
quantiles (tuple, default = (0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)) – Quantiles defining the cut points to bucket variable at.

__init__(variable, quantiles=(0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0))[source]

Methods

`__init__`(variable[, quantiles])
`fit`(X[, y, sample_weight])	Calculate cut points on the input data X.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Cut variable in X at cut_points.

fit(X, y=None, sample_weight=None)[source]

Calculate cut points on the input data X.

Cut points are (potentially weighted) quantiles specified when initialising the transformer.

Parameters

X (pd.DataFrame) – DataFrame containing column to discretise. This column is defined by the variable attribute.
y (pd.Series, default = None) – Response variable. Not used. Only implemented for compatibility with scikit-learn.
sample_weight (pd.Series or np.ndarray, default = None) – Optional, sample weights for each record in X.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(X)

Cut variable in X at cut_points. This function uses the pd.cut method.

A specific null category is added on the cut output.

Parameters: X (pd.DataFrame) – DataFrame containing column to discretise. This column is defined by the variable attribute.
Returns: variable_cut – Discretised variable.
Return type: pd.Series