User guide: Using the Rank Similarity estimators¶
Rank Similarity Transform¶
The RankSimilarityTransform
(RST) is a very fast non-linear trasform. It’s
created using the responses of rank similarity filters made from the input data.
Use it with the fit and transform methods from scikit-learn:
at
fit
, some parameters can be learned fromX
andy
;at
transform
,X
will be transformed, using the parameters learned duringfit
.
Alternatively you can directly use a combination of fit
and transform
called fit_transform
:
RST can make non-linear problems solvable by a linear classifier. For example using the raw Olivetti faces dataset and a linear support vector machine.
>>> from ranksim import RankSimilarityTransform
>>> from sklearn.svm import LinearSVC
>>> from sklearn.datasets import fetch_olivetti_faces
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.model_selection import train_test_split
>>> X, y = fetch_olivetti_faces(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>>> pipe = make_pipeline(RankSimilarityTransform(n_filters=100,
... random_state=10),
... LinearSVC(random_state=10))
>>> pipe.fit(X_train, y_train)
Pipeline(...)
>>> pipe.score(X_test, y_test)
0.85
Examples:
Transform example: an example of transformation using RST.
Classification¶
The rank similarity classifiers are very fast non-linear classifiers. They use
the responses of rank similarity filters made from the input data to classify
new samples. Both RankSimilarityClassifier
and RSPClassifier
are able to perform
binary and multi-class classification.
Use it with the fit and predict methods from scikit-learn:
at
fit
, some parameters can be learned fromX
andy
;at
predict
, predictions will be computed usingX
using the parameters learned duringfit
.at
predict_proba
, will output some probabilities instead.
The predict method can then be used by the score method:
at
score
, compute the accuracy score of the predictions.
RankSimilarityClassifier
and RSPClassifier
are fit using two arrays: an array X
of shape (n_samples, n_features) holding the training samples, and an array y of
class labels (strings or integers), of shape (n_samples):
>>> from ranksim import RankSimilarityClassifier
>>> X = [[0, 1], [1, 0]]
>>> y = [0, 1]
>>> clf = RankSimilarityClassifier()
>>> clf.fit(X, y)
RankSimilarityClassifier()
Rank Similarity Classifier¶
The RankSimilarityClassifier
works by fitting each class of data
separately. This makes it suitable for multiclass data. More classes can
actually make it faster because it splits the data into smaller segments.
Examples:
Classifier example: an example of classification using rank similarity classifier.
RSPClassifier¶
The rank similarity probabilistic classifier (RSPClassifier
) fits all
data together and then calculates posthoc the probability that each each filter
belongs to a certain class. This makes it suitable for both multilabel and
multiclass data.