The computational limits of deep learning

A new project led by MIT researchers argues that deep learning is reaching its computational limits, which they say will result in one of two outcomes: deep learning being forced towards less computationally-intensive methods of improvement, or else machine learning being pushed towards techniques that are more computationally-efficient than deep learning.

The team examined more than 1,000 research papers in image classification, object detection, machine translation and other areas, looking at the computational requirements of the tasks. 

They warn that deep learning is facing an important challenge: to "either find a way to increase performance without increasing computing power, or have performance stagnate as computational requirements become a constraint."

Some potential improvements they discuss and compare:

Increasing computing power: Hardware accelerators. For much of the 2010s, moving to more-efficient hardware platforms was a key source of increased computing power. All of these approaches sacrifice generality of the computing platform for the efficiency of increased specialization. But such specialization faces diminishing returns, and so other different hardware frameworks are being explored, including quantum computing.

However, such attempts have yet to disrupt the GPU/TPU and FPGA/ASIC architectures. Of these, quantum computing is the approach with perhaps the most long-term upside, since it offers a potential for sustained exponential increases in computing power..

Reducing computational complexity: Network Compression and Acceleration. This body of work primarily focuses on taking a trained neural network and sparsifying or otherwise compressing the connections in the network, so that it requires less computation to use it in prediction tasks. This is typically done by using optimization or heuristics such as “pruning”, quantizing, or low-rank compression. These yield networks that retain the performance of the original network but require fewer floating point operations to evaluate.

Thus far these approaches have produced computational improvements that, while impressive, are not sufficiently large in comparison to the overall orders-of-magnitude increases of computation in the field.

Finding high-performing small deep learning architectures: Neural Architecture Search and Meta Learning. It's recently become popular to use optimization to find network architectures that are computationally efficient to train while retaining good performance on some class of learning problems, and exploiting the fact that many datasets are similar and therefore information from previously trained models can be used (meta-learning and transfer learning).

While often quite successful, the current downside is that the overhead of doing meta learning or neural architecture search is itself computationally intense (since it requires training many models on a wide variety of datasets).