r. Przemyslaw Tredak, Sr. DL Frameworks Engineer @ NVIDIA
The dependency engine inside MXNet has a very elegant design that enables efficient
multithreaded execution by bypassing Python’s Global Interpreter Lock (GIL). While the design works
very well for hybridized execution, imperative execution exposes small inefficiencies in the
approach. Furthermore, as GPUs get faster and the number of GPUs used in training jobs gets larger, those
inefficiencies become a problem even for the hybridized models.
In this talk we will explore the current MXNet dependency engine design and look at examples of
the inefficient execution. We will also propose ways to improve this design to achieve the
fastest training and inference times for both imperative and hybridized models.