人脸识别的常用loss及tensorflow实现

常用loss总结

Posted by Jieson on August 25, 2019

人脸识别的常用loss及tensorflow实现

    在人脸识别中,模型的提升主要体现在损失函数的设计上,损失函数会对整个网络的优化有着导向性的作用。从传统的softmax loss到cosface, arcface 都有这一定的提高。

1、softmax loss

\[loss = -\frac{1}{m}\sum_{i=0}^m log\frac{e^{W^T_{y_i} + b_{y_i}}}{\sum_{j=1}^N e^{W^T_{j}+b_j}}\]

softmax

    softmax 只考虑了是否正确分类,但没有考虑类间距离,Softmax并不要求类内紧凑和类间分离,这一点非常不适合人脸识别任务。所以需要改造Softmax,除了保证可分性外,还要做到特征向量类内尽可能紧凑,类间尽可能分离。

tf.nn.softmax_cross_entropy_with_logits

2、center loss

\[L_C = -\frac{1}{2}\sum_{i=1}^m||x_i-C_{y_i}||^2\]

center

    center loss 考虑到不仅仅是分类要对,而且要求类间有一定的距离。上面的公式中 \(\large C_{y_i}\)表示某一类的中心,\(x_i\) 表示每个人脸的特征值。作者在softmax loss的基础上加入了\(L_C\),同时使用参数\(lambda\)来控制类内距离,整体的损失函数如下:

\[L = L_S + L_C = -\frac{1}{m}\sum_{i=0}^m log\frac{e^{W^T_{y_i}} + b_{y_i}}{\sum_{j=1}^N e^{W^T_{j}+b_j}} + \frac{1}{2}\sum_{i=1}^m||x_i-C_{y_i}||^2\]
def center_loss(features, label, alfa, nrof_classes):
    """Center loss based on the paper "A Discriminative Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """
    nrof_features = features.get_shape()[1]
    centers = tf.get_variable('centers', [nrof_classes, nrof_features], dtype=tf.float32,
                              initializer=tf.constant_initializer(0), trainable=False)
    label = tf.reshape(label, [-1])
    centers_batch = tf.gather(centers, label)
    diff = (1 - alfa) * (centers_batch - features)
    centers = tf.scatter_sub(centers, label, diff)
    loss = tf.reduce_mean(tf.square(features - centers_batch))
    return loss, centers

3、Triplet Loss

triplet

    三元组损失函数,三元组由Anchor、Negative、Positive组成,从上图可以看到,triplet loss 就是使同类距离更近,类间更加远离。

\[tripletloss = \sum_{i}^N[||f(x_{i}^a) - f(x_{i}^p)||^2 - ||f(x_{i}^a) - f(x_{i}^n)||^2 + \alpha]\]

表达第一项为类内距离,中间项为类间距离,\(\alpha\)为margin。使用梯度下降法优化就是使类内距离不断下降,类间距离不断增大。

优点:直接使用embeddings计算相似度作为loss,加大类间距离,压缩类内间距

缺点:训练收敛慢,对triplet 对的选取比较敏感

def triplet_loss(anchor, positive, negative, alpha):
    """Calculate the triplet loss according to the FaceNet paper

    Args:
      anchor: the embeddings for the anchor images.
      positive: the embeddings for the positive images.
      negative: the embeddings for the negative images.

    Returns:
      the triplet loss according to the FaceNet paper as a float tensor.
    """
    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)

        basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)

    return loss

4、Arcface 前面的softmax Loss 没有考虑类间距离, center loss 学习类中心,使类内紧凑,但没有类间可分。triplet loss 收敛较慢。因此就产生了sofmax的变形loss,如L-Softmax、SphereFace、Arcface。arcface是直接在角度空间中最大化分类界限,而cosface是在余弦空间中最大化分类界限,角度距离比余弦距离在对角度的影响更加直接。

arcface

\[arcface = - \frac{1}{N}\sum_{i=1}^Nlog(\frac{e^{s(cos(\theta_{yi}+m))}}{e^{s(cos(\theta_{yi}+m))} + \sum_{j=1, {j} \neq {y_i}}e^{scos(\theta_j)}})\]
def arcface_loss(embedding, labels, out_num, w_init=None, s=64., m=0.45):
    '''
    :param embedding: the input embedding vectors
    :param labels:  the input labels, the shape should be eg: (batch_size, 1)
    :param s: scalar value default is 64
    :param out_num: output class num
    :param m: the margin value, default is 0.5
    :return: the final cacualted output, this output is send into the tf.nn.softmax directly
    '''
    cos_m = math.cos(m)
    sin_m = math.sin(m)
    mm = sin_m * m  # issue 1
    threshold = math.cos(math.pi - m)
    with tf.variable_scope('arcface_loss'):
        # inputs and weights norm
        embedding_norm = tf.norm(embedding, axis=1, keep_dims=True)
        embedding = tf.div(embedding, embedding_norm, name='norm_embedding')
        weights = tf.get_variable(name='embedding_weights', shape=(embedding.get_shape().as_list()[-1], out_num),
                                  initializer=w_init, dtype=tf.float32)
        weights_norm = tf.norm(weights, axis=0, keep_dims=True)
        weights = tf.div(weights, weights_norm, name='norm_weights')
        # cos(theta+m)
        cos_t = tf.matmul(embedding, weights, name='cos_t')
        cos_t2 = tf.square(cos_t, name='cos_2')
        sin_t2 = tf.subtract(1., cos_t2, name='sin_2')
        sin_t = tf.sqrt(sin_t2, name='sin_t')
        cos_mt = s * tf.subtract(tf.multiply(cos_t, cos_m), tf.multiply(sin_t, sin_m), name='cos_mt')

        # this condition controls the theta+m should in range [0, pi]
        #      0<=theta+m<=pi
        #     -m<=theta<=pi-m
        cond_v = cos_t - threshold
        cond = tf.cast(tf.nn.relu(cond_v, name='if_else'), dtype=tf.bool)

        keep_val = s*(cos_t - mm)
        cos_mt_temp = tf.where(cond, cos_mt, keep_val)

        mask = tf.one_hot(labels, depth=out_num, name='one_hot_mask')
        # mask = tf.squeeze(mask, 1)
        inv_mask = tf.subtract(1., mask, name='inverse_mask')

        s_cos_t = tf.multiply(s, cos_t, name='scalar_cos_t')

        output = tf.add(tf.multiply(s_cos_t, inv_mask), tf.multiply(cos_mt_temp, mask), name='arcface_loss_output')
    return output

reference

[1]https://blog.csdn.net/u012505617/article/details/89355690
[2]http://ydwen.github.io/papers/WenECCV16.pdf
[3]Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition [C]// ECCV, 2016.
[3]Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks [C]// ICML, 2016.
[4]Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition [C]// CVPR. 2017.
[5]https://arxiv.org/abs/1801.07698
[6]https://github.com/deepinsight/insightface
[7]https://github.com/davidsandberg/facenet