A combined team of Microsoft and NVIDIA have introduced the DeepSpeed and Megatron-powered Megatron-Turning Natural Generative Language Model (MT-NLG), the biggest and the strongest monolithic transformer language model programmed to date, with 530 billion parameters.
MT-NLG number of parameters three times compared to the previous largest AI systems, GPT-3 (175 Billion). Its accuracy surpasses others in a wide range of natural language tasks such as completion prediction, reading comprehension, commonsense reasoning, natural language inferences, word sense disambiguation. The 105-layer, transformer-based MT-NLG improved upon the prior state-of-the-art models in zero-, one-, and few-shot settings and set the new standard for large-scale language models in both model scale and quality.
Large-scale language model
Learning such a powerful model has been made possible by numerous innovations. For example, NVIDIA and Microsoft have combined a state-of-the-art GPU-based learning infrastructure with an advanced distributed learning software stack. Natural language databases of hundreds of billions of content items have been created, and training methods have been developed to improve the efficiency and stability of optimization.
MT-NLG was trained using Microsoft for Azure NDv4 and Nvidia’s Selene supercomputer powered by 560 DGX A100 servers, each equipped with eight NVIDIA A100 80GB Tensor Core GPUs. Each of these 4,480 graphics cards, initially designed for video games but also extremely capable of processing large amounts of data while training AI, currently costs thousands of dollars in commerce. Although not all of the computer’s power was used solely by this research team, it took over a month to train the AI.
While the giant language models are advancing state of the art on language generation, they also suffer from issues such as bias and toxicity. Microsoft researchers observed that the MT-NLG model picks up stereotypes and biases from the data on which it is trained, which means the model can produce offensive outputs that are potentially racist or sexist. Microsoft and NVIDIA are committed to working on addressing this problem. In addition, any use of MT-NLG in production scenarios must ensure that proper measures are put in place to mitigate and minimize potential harm to users.
“The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language. The innovations of DeepSpeed and Megatron-LM will benefit existing and future AI model development and make large AI models cheaper and faster to train. We look forward to how MT-NLG will shape tomorrow’s products and motivate the community to push the boundaries of NLP even further,” the company explained in a press release.
We live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight.