Algorithm and Data Center Optimization

Algorithm Optimization

There are many computer scientists that are researching various ways to reduce the amounts of computation needed to train neural networks. Two of these methods to reduce computation are called pruning and quantization. Pruning involves removing parameters that do not contribute to the overall performance, and therefore making the model a little bit smaller. Quantization takes the remaining parameters and makes them leaner, by lowering the amount of memory each parameter occupies within the computer. These changes have minor effects at first, but when applied to billions of parameters, which most modern AI systems contain, can make a massive difference. For example, quantization reduces memory requirements by up to 51 percent. Both of these involve changing the internal process of training AI to lower the amount of energy needed, this is where the name algorithm optimization comes from. (ARS Technica)

Data-Center Optimization

Data-Center Optimization concerns the way we run the algoritms that train the AI models. Training modern AI models require thousands of GPUs that run for long periods of time. These GPUs are usually stored in warehouses called data centers. When an AI system is being trained, the work done to complete the algorithm is split into many different parts and each part is given to a GPU to carry out. Processing inefficiencies occur when these various parts are not split evenly and GPUs run full speed until they are all done at different points. To improve efficiency, you can slow the GPUs that are processing less information so they consume less energy, but the entire algorithm is getting finished at the same time. This process could potentially cut up to 30 percent of energy consumption. (ARS Technica)