class LogisticGradient extends Gradient
Compute gradient and loss for a multinomial logistic loss function, as used in multi-class classification (it is also used in binary logistic regression).
In The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition
by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, which can be downloaded from
http://statweb.stanford.edu/~tibs/ElemStatLearn/ , Eq. (4.17) on page 119 gives the formula of
multinomial logistic regression model. A simple calculation shows that
$$ P(y=0|x, w) = 1 / (1 + \sum_i^{K-1} \exp(x w_i))\\ P(y=1|x, w) = exp(x w_1) / (1 + \sum_i^{K-1} \exp(x w_i))\\ ...\\ P(y=K-1|x, w) = exp(x w_{K-1}) / (1 + \sum_i^{K-1} \exp(x w_i))\\ $$
for K classes multiclass classification problem.
The model weights \(w = (w_1, w_2, ..., w_{K-1})^T\) becomes a matrix which has dimension of (K-1) * (N+1) if the intercepts are added. If the intercepts are not added, the dimension will be (K-1) * N.
As a result, the loss of objective function for a single instance of data can be written as
$$ \begin{align} l(w, x) &= -log P(y|x, w) = -\alpha(y) log P(y=0|x, w) - (1-\alpha(y)) log P(y|x, w) \\ &= log(1 + \sum_i^{K-1}\exp(x w_i)) - (1-\alpha(y)) x w_{y-1} \\ &= log(1 + \sum_i^{K-1}\exp(margins_i)) - (1-\alpha(y)) margins_{y-1} \end{align} $$
where $\alpha(i) = 1$ if \(i \ne 0\), and $\alpha(i) = 0$ if \(i == 0\), \(margins_i = x w_i\).
For optimization, we have to calculate the first derivative of the loss function, and a simple calculation shows that
$$ \begin{align} \frac{\partial l(w, x)}{\partial w_{ij}} &= (\exp(x w_i) / (1 + \sum_k^{K-1} \exp(x w_k)) - (1-\alpha(y)\delta_{y, i+1})) * x_j \\ &= multiplier_i * x_j \end{align} $$
where $\delta_{i, j} = 1$ if \(i == j\), $\delta_{i, j} = 0$ if \(i != j\), and multiplier = $\exp(margins_i) / (1 + \sum_k^{K-1} \exp(margins_i)) - (1-\alpha(y)\delta_{y, i+1})$
If any of margins is larger than 709.78, the numerical computation of multiplier and loss
function will be suffered from arithmetic overflow. This issue occurs when there are outliers
in data which are far away from hyperplane, and this will cause the failing of training once
infinity / infinity is introduced. Note that this is only a concern when max(margins)
> 0.
Fortunately, when max(margins) = maxMargin > 0, the loss function and the multiplier
can be easily rewritten into the following equivalent numerically stable formula.
$$ \begin{align} l(w, x) &= log(1 + \sum_i^{K-1}\exp(margins_i)) - (1-\alpha(y)) margins_{y-1} \\ &= log(\exp(-maxMargin) + \sum_i^{K-1}\exp(margins_i - maxMargin)) + maxMargin - (1-\alpha(y)) margins_{y-1} \\ &= log(1 + sum) + maxMargin - (1-\alpha(y)) margins_{y-1} \end{align} $$
where sum = $\exp(-maxMargin) + \sum_i^{K-1}\exp(margins_i - maxMargin) - 1$.
Note that each term, $(margins_i - maxMargin)$ in $\exp$ is smaller than zero; as a result, overflow will not happen with this formula.
For multiplier, similar trick can be applied as the following,
$$ \begin{align} multiplier &= \exp(margins_i) / (1 + \sum_k^{K-1} \exp(margins_i)) - (1-\alpha(y)\delta_{y, i+1}) \\ &= \exp(margins_i - maxMargin) / (1 + sum) - (1-\alpha(y)\delta_{y, i+1}) \end{align} $$
where each term in $\exp$ is also smaller than zero, so overflow is not a concern.
For the detailed mathematical derivation, see the reference at http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297
- Source
 - Gradient.scala
 
- Alphabetic
 - By Inheritance
 
- LogisticGradient
 - Gradient
 - Serializable
 - Serializable
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Instance Constructors
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        compute(data: Vector, label: Double, weights: Vector, cumGradient: Vector): Double
      
      
      
Compute the gradient and loss given the features of a single data point, add the gradient to a provided vector to avoid creating new objects, and return loss.
Compute the gradient and loss given the features of a single data point, add the gradient to a provided vector to avoid creating new objects, and return loss.
- data
 features for one data point
- label
 label for this data point
- weights
 weights/coefficients corresponding to features
- cumGradient
 the computed gradient will be added to this vector
- returns
 loss
- Definition Classes
 - LogisticGradient → Gradient
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        compute(data: Vector, label: Double, weights: Vector): (Vector, Double)
      
      
      
Compute the gradient and loss given the features of a single data point.
Compute the gradient and loss given the features of a single data point.
- data
 features for one data point
- label
 label for this data point
- weights
 weights/coefficients corresponding to features
- returns
 (gradient: Vector, loss: Double)
- Definition Classes
 - Gradient
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( classOf[java.lang.Throwable] )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()