ranksim.RSPClassifier

class ranksim.RSPClassifier(*, n_filters='auto', max_filters=5000, n_fast_filters=1000, initialize='random', spreading='max', n_iter=5, random_state=None, filter_function='auto', create_distribution=None, **kwargs)[source]

Rank Similarity Probabilistic (RSP) Classifier

Read more in the User Guide.

Parameters
n_filters{‘auto’} or int, default=’auto’

Number of filters to use. ‘auto’ will determine this based on max_filters, n_fast_filters and the size of the input data.

max_filtersint, default=5000

Maximum number of filters to allocate.

Only used when n_filters='auto'.

n_fast_filters: int, default=1000

Minimum number of filters to allocate, unless the input data has fewer samples than this number.

Only used when n_filters='auto'.

initialize{‘random’,’weighted_avg’,’plusplus’}, default=’random’

Type of filter initialization.

  • ‘random’, filters are initialized with a random data point.

  • ‘weighted_avg’, creates filters from similar data, used when

    there are more filters than input data.

  • ‘plusplus’, filters are initialized with dissimilar data as k-means++

spreading{‘max’, ‘weighted_avg’} or None, default=’max’

Determines how data is spread between filters during training

  • ‘max’, the data point is allocated to the maximum responding

    filter.

  • ‘weighted_avg’, the weighted average of a fixed number of data

    points are allocated to the maximum responding filter, used when there are more filters than data.

n_iterint, default=5

Number of iterations/sweeps over the training dataset to perform during training.

random_stateint, RandomState instance, default=None

Determines random number generation for filter initialization. Pass an int for reproducible results across multiple function calls.

filter_function{‘auto’} or callable, default=’auto’

Function which determines the weights from subsections of the input data. ‘auto’ performs a mean and rank, optionally drawn from a distribution.

create_distribution{‘confusion’}, callable or None, default=None

Creates a distribution to draw ranks from.

  • ‘confusion’ is a distribution based on the confusibility of

    features in the input data.

Note: the ‘confusion’ option is extremely slow.

Attributes
classes_ndarray or list of ndarray of shape (n_classes,)

Class labels for each output.

filters_ndarray of shape (n_filters_, n_features)

Weights of the calculated filters.

filter_labels_list of ndarray of shape (n_classes,)

Label of the datapoints used to make the filter.

n_filters_int

Number of filters.

n_iter_int

The number of iterations run by the spreading function.

n_outputs_int

Number of outputs.

filterFactory_class

Class used to create the filters.

Examples

>>> from multifilter import RSPClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=1000, random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>>> clf = RSPClassifier().fit(X_train,y_train)
>>> clf.predict_proba(X_test[:1,:])
array([[0.43370805, 0.56629195]])
>>> clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
>>> clf.score(X_test, y_test)
0.888

Methods

fit(X, y)

Fit RSP classifier from the training dataset.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict using the RSP classifier

predict_proba(X[, n_best])

Probability estimates for RSP classifier

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.