ranksim
.RSPClassifier¶
- class ranksim.RSPClassifier(*, n_filters='auto', max_filters=5000, n_fast_filters=1000, initialize='random', spreading='max', n_iter=5, random_state=None, filter_function='auto', create_distribution=None, **kwargs)[source]¶
Rank Similarity Probabilistic (RSP) Classifier
Read more in the User Guide.
- Parameters
- n_filters{‘auto’} or int, default=’auto’
Number of filters to use. ‘auto’ will determine this based on max_filters, n_fast_filters and the size of the input data.
- max_filtersint, default=5000
Maximum number of filters to allocate.
Only used when
n_filters='auto'
.- n_fast_filters: int, default=1000
Minimum number of filters to allocate, unless the input data has fewer samples than this number.
Only used when
n_filters='auto'
.- initialize{‘random’,’weighted_avg’,’plusplus’}, default=’random’
Type of filter initialization.
‘random’, filters are initialized with a random data point.
- ‘weighted_avg’, creates filters from similar data, used when
there are more filters than input data.
‘plusplus’, filters are initialized with dissimilar data as k-means++
- spreading{‘max’, ‘weighted_avg’} or None, default=’max’
Determines how data is spread between filters during training
- ‘max’, the data point is allocated to the maximum responding
filter.
- ‘weighted_avg’, the weighted average of a fixed number of data
points are allocated to the maximum responding filter, used when there are more filters than data.
- n_iterint, default=5
Number of iterations/sweeps over the training dataset to perform during training.
- random_stateint, RandomState instance, default=None
Determines random number generation for filter initialization. Pass an int for reproducible results across multiple function calls.
- filter_function{‘auto’} or callable, default=’auto’
Function which determines the weights from subsections of the input data. ‘auto’ performs a mean and rank, optionally drawn from a distribution.
- create_distribution{‘confusion’}, callable or None, default=None
Creates a distribution to draw ranks from.
- ‘confusion’ is a distribution based on the confusibility of
features in the input data.
Note: the ‘confusion’ option is extremely slow.
- Attributes
- classes_ndarray or list of ndarray of shape (n_classes,)
Class labels for each output.
- filters_ndarray of shape (n_filters_, n_features)
Weights of the calculated filters.
- filter_labels_list of ndarray of shape (n_classes,)
Label of the datapoints used to make the filter.
- n_filters_int
Number of filters.
- n_iter_int
The number of iterations run by the spreading function.
- n_outputs_int
Number of outputs.
- filterFactory_class
Class used to create the filters.
Examples
>>> from multifilter import RSPClassifier >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> X, y = make_classification(n_samples=1000, random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) >>> clf = RSPClassifier().fit(X_train,y_train) >>> clf.predict_proba(X_test[:1,:]) array([[0.43370805, 0.56629195]]) >>> clf.predict(X_test[:5, :]) array([1, 0, 1, 0, 1]) >>> clf.score(X_test, y_test) 0.888
Methods
fit
(X, y)Fit RSP classifier from the training dataset.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict using the RSP classifier
predict_proba
(X[, n_best])Probability estimates for RSP classifier
score
(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_params
(**params)Set the parameters of this estimator.