Günther, Stefanie: Simultaneous Parallel-in-Layer Training for Deep Residual Networks

Event calendar for the Department of Mathematics

13. December 2018
Created by Ulrike Hahn

Deep residual networks (ResNets) have shown great promise to model complex data relations with applications in image classification, speech recognition, or text processing, among others. Despite the rapid methodological developments, compute times for ResNet training however can still be tremendous, measured in the order of hours or even days. While common approaches to decrease the training runtimes mostly involve data-parallelism, the sequential propagation through the network layers creates a scalability barrier where training runtimes increase linearly with the number of layers.

This talk presents an approach to enables concurrency accross the network layers and thus overcome this scalability barrier. The proposed method is inspired by the fact that the propagation through a ResNet can be interpreted as an optimal control problem. In this context, the discrete network layers are interpreted as the discretization of a time-continuous dynamical system. Recent advances in parallel-in-time integration and optimization methods can thus be leveraged in order to speed up training runtimes. In particular, an iterative multigrid-reduction-in-time approach will be discussed, which recurively divides the time domain (i.e. the layers) into multiple time chunks that can be processed in parallel on multiple compute units. Additionally, the multigrid iterations enable a simultaneous optimization framework where weight updates are based on inexact gradient information.

Referent: Dr. Stefanie Günther, Technische Universität Kaiserslautern, AG Scientific Computing

Zeit: 09:45 Uhr

Ort: Gebäude 32, Raum 349

Semester Schedule

Here you will find important dates for studying and examinations, commission meetings and special events for the Department of Mathematics.