sumnplot.discretisation.QuantileDiscretiser

class sumnplot.discretisation.QuantileDiscretiser(variable, quantiles=(0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0))[source]

Bases: sumnplot.discretisation.Discretiser

Quantile discretisation.

This tansformer uses cut points defined by quantiles of the given variable.

Note, this transformer handles weighted quantiles.

Parameters
  • variable (str) – Column to discretise in X, when the transform method is called.

  • quantiles (tuple, default = (0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)) – Quantiles defining the cut points to bucket variable at.

__init__(variable, quantiles=(0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0))[source]

Methods

__init__(variable[, quantiles])

fit(X[, y, sample_weight])

Calculate cut points on the input data X.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Cut variable in X at cut_points.

fit(X, y=None, sample_weight=None)[source]

Calculate cut points on the input data X.

Cut points are (potentially weighted) quantiles specified when initialising the transformer.

Parameters
  • X (pd.DataFrame) – DataFrame containing column to discretise. This column is defined by the variable attribute.

  • y (pd.Series, default = None) – Response variable. Not used. Only implemented for compatibility with scikit-learn.

  • sample_weight (pd.Series or np.ndarray, default = None) – Optional, sample weights for each record in X.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(X)

Cut variable in X at cut_points. This function uses the pd.cut method.

A specific null category is added on the cut output.

Parameters

X (pd.DataFrame) – DataFrame containing column to discretise. This column is defined by the variable attribute.

Returns

variable_cut – Discretised variable.

Return type

pd.Series