5. Multi-label deep learning with scikit-multilearn

Deep learning methods have expanded in the python community with many tutorials on performing classification using neural networks, however few out-of-the-box solutions exist for multi-label classification with deep learning, scikit-multilearn allows you to deploy single-class and multi-class DNNs to solve multi-label problems via problem transformation methods. Two main deep learning frameworks exist for Python: keras and pytorch, you will learn how to use any of them for multi-label problems with scikit-multilearn. Let’s start with loading some data.

In [1]:
import numpy
import sklearn.metrics as metrics
from skmultilearn.dataset import load_dataset

X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')
X_test, y_test, _, _ = load_dataset('emotions', 'test')
emotions:train - exists, not redownloading
emotions:test - exists, not redownloading

5.1. Keras

Keras is a neural network library that supports multiple backends, most notably the well-established tensorflow, but also the popular on Windows: CNTK, as scikit-multilearn supports both Windows, Linux and MacOSX, you can you a backend of choice, as described in the backend selection tutorial. To install Keras run:

pip install -U keras

5.1.1. Single-class Keras classifier

We train a two-layer neural network using Keras and tensortflow as backend (feel free to use others), the network is fairly simple 12 x 8 RELU that finish with a sigmoid activator optimized via binary cross entropy. This is a case from the Keras example page. Note that the model creation function must create a model that accepts an input dimension and outpus a relevant output dimension. The Keras wrapper from scikit-multilearn will pass relevant dimensions upon fitting.

In [2]:
from keras.models import Sequential
from keras.layers import Dense

def create_model_single_class(input_dim, output_dim):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=input_dim, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(output_dim, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
Using TensorFlow backend.

Let’s use it with a problem transformation method which converts multi-label classification problems to single-label single-class problems, ex. Binary Relevance which trains a classifier per label. We will use 10 epochs and disable verbosity.

In [8]:
from skmultilearn.problem_transform import BinaryRelevance
from skmultilearn.ext import Keras

KERAS_PARAMS = dict(epochs=10, batch_size=100, verbose=0)

clf = BinaryRelevance(classifier=Keras(create_model_single_class, False, KERAS_PARAMS), require_dense=[True,True])
clf.fit(X_train, y_train)
result = clf.predict(X_test)
Out[8]:
0.42574257425742573

5.1.2. Multi-class Keras classifier

We now train a multi-class neural network using Keras and tensortflow as backend (feel free to use others) optimized via categorical cross entropy. This is a case from the Keras multi-class tutorial. Note again that the model creation function must create a model that accepts an input dimension and outpus a relevant output dimension. The Keras wrapper from scikit-multilearn will pass relevant dimensions upon fitting.

In [9]:
def create_model_multiclass(input_dim, output_dim):
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=input_dim, activation='relu'))
    model.add(Dense(output_dim, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

We use the Label Powerset multi-label to multi-class transformation approach, but this can also be used with all the advanced label space division methods available in scikit-multilearn. Note that we set the second parameter of our Keras wrapper to true, as the base problem is multi-class now.

In [10]:
from skmultilearn.problem_transform import LabelPowerset
clf = LabelPowerset(classifier=Keras(create_model_multiclass, True, KERAS_PARAMS), require_dense=[True,True])
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)

5.2. Pytorch

Pytorch is another often used library, that is compatible with scikit-multilearn via the skorch wrapping library, to use it, you must first install the required libraries:

pip install -U skorch torch

To start, import:

In [47]:
import torch
from torch import nn
import torch.nn.functional as F
from skorch import NeuralNetClassifier

5.2.1. Single-class pytorch classifier

We train a two-layer neural network using pytorch based on a simple example from the pytorch example page. Note that the model’s first layer has to agree in size with the input data, and the model’s last layer is two-dimensions, as there are two classes: 0 or 1.

In [99]:
input_dim = X_train.shape[1]
In [100]:
class SingleClassClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
            dropout=0.5,
    ):
        super(SingleClassClassifierModule, self).__init__()
        self.num_units = num_units

        self.dense0 = nn.Linear(input_dim, num_units)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = F.relu(self.dense0(X))
        X = F.relu(self.dense1(X))
        X = torch.sigmoid(self.output(X))
        return X

We now wrap the model with skorch and use scikit-multilearn for Binary Relevance classification.

In [101]:
net = NeuralNetClassifier(
    SingleClassClassifierModule,
    max_epochs=20,
    verbose=0
)
In [96]:
from skmultilearn.problem_transform import BinaryRelevance

clf = BinaryRelevance(classifier=net, require_dense=[True,True])
clf.fit(X_train.astype(numpy.float32),y_train)
y_pred = clf.predict(X_test.astype(numpy.float32))

5.2.2. Multi-class pytorch classifier

Similarly we can train a multi-class DNN, this time hte last layer must agree with size with the number of classes.

In [102]:
nodes = 8
input_dim = X_train.shape[1]
hidden_dim = int(input_dim/nodes)
output_dim = len(numpy.unique(y_train.rows))
In [103]:
class MultiClassClassifierModule(nn.Module):
    def __init__(
            self,
            input_dim=input_dim,
            hidden_dim=hidden_dim,
            output_dim=output_dim,
            dropout=0.5,
    ):
        super(MultiClassClassifierModule, self).__init__()
        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

Now let’s skorch-wrap it:

In [104]:
net = NeuralNetClassifier(
    MultiClassClassifierModule,
    max_epochs=20,
    verbose=0
)
In [105]:
from skmultilearn.problem_transform import LabelPowerset
clf = LabelPowerset(classifier=net, require_dense=[True,True])
clf.fit(X_train.astype(numpy.float32),y_train)
y_pred = clf.predict(X_test.astype(numpy.float32))
/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_split.py:626: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=5.
  % (min_groups, self.n_splits)), Warning)