Training of multi-class or multi-label classification machines are embarrassingly parallelizable via the one-vs.-rest approach. However, training of all-in-one multi-class learning machines such as multinomial logistic regression or all-in-one multi-class SVMs (MC-SVMs) is not parallelizable out of the box. In my talk, I present optimization strategies to distribute the training of all-in-one multiclass SVMs over the classes, which makes them appealing for the use in extreme classification.