When Artificial Intelligence met Fugaku, the world's fastest supercomputer
In the world of large language models (LLMs) and generative artificial intelligence (AI), access to massive computational resources is crucial. With GPT (Generative Pre-trained Transformer *1) scale models requiring huge computational power for training, it has been observed that increasing data volume and computational power can naturally lead to the acquisition of various abilities. However, Japan has been lagging behind global top companies in this regar
To address this issue, Fujitsu has embarked on a mission to democratize high-performance computing (HPC) and AI by collaborating with Tokyo Institute of Technology, Tohoku University, and RIKEN to develop LLM distributed parallel learning methods using Fugaku, Japan's leading supercomputer. Japan has been lagging behind in LLM, but the meeting between Fugaku and LLM will create a computational environment that enables advanced LLM training in Japan, allowing Japan to quickly catch up with the advanced technology possessed by the U.S., China, and other countries.
Fugaku, a general-purpose supercomputer, currently boasts world-class performance, ranking second in the TOP500 list. In 2021, Fujitsu achieved the world's fastest performance in the MLPerf HPC benchmark using Fugaku and CosmoFlow. Therefore, Fugaku is considered the fastest supercomputer in Japan for LLM training. However, Fugaku was not initially designed for LLMs and other large-scale models, so performance optimization for these applications was not implemented.
To address this, four parties – Tokyo Institute of Technology, Tohoku University, Fujitsu, and RIKEN – are working together to optimize Fugaku's performance for large-scale deep learning frameworks and develop technologies to harness its power for LLMs. They will focus on training LLMs using primarily Japanese data, aiming to gain insights and build an open-source large language model suitable for commercial use.
This project aims to efficiently execute large language model training in Fugaku's ultra-large-scale parallel computing environment from May 24, 2023, to March 31, 2024.
Each institution and company plays a vital role in this collaboration:
- Tokyo Institute of Technology: Overall coordination, parallelization, and acceleration of LLMs
- Tohoku University: Collection of training data, selection of models
- Fujitsu: Acceleration of LLMs
- RIKEN: Distributed parallelization, communication acceleration, and acceleration of LLMs
Furthermore, they will collaborate with Nagoya University to develop data generation and learning methods for multimodal applications in industries such as manufacturing. They will also consider partnering with CyberAgent, Inc. to provide data and technology for large-scale language model construction.
Adapting deep learning frameworks for cutting-edge supercomputers like Fugaku is a key challenge in the high-performance computing (HPC) domain. The Megatron-DeepSpeed framework(*2) has been successfully ported to Fugaku's Arm CPU environment, allowing researchers to utilize its computational power for LLM development. However, to fully harness Fugaku's capabilities, several issues must be addressed.
First, accelerating batched matrix multiplication (*3)is essential for improving LLM training efficiency, reducing time and resources required for training large-scale models. Second, enhancing collective communication on the Tofu interconnect D(*4) is crucial for improving distributed parallel learning methods’ performance, ensuring smooth and efficient training even with massive data volumes. Lastly, developing stable training methods using FP16 (half-precision floating-point format) (*5) is vital for maintaining LLM accuracy and reliability. Although FP16 can potentially reduce memory usage and computational requirements, it may introduce numerical instability issues. Overcoming these challenges will enable more efficient LLM training without compromising performance.
In the field of natural language processing, key challenges include collecting and cleaning text data to ensure accurate and reliable NLP results, conducting legal reviews of research outputs (source code and models) to establish a legally compliant public release system, and developing pre-processing and fine-tuning methods to optimize LLM performance. Addressing these challenges is crucial for realizing LLMs’ full potential and advancing AI research. By leveraging Fugaku’s power and overcoming these obstacles, researchers can contribute to the development of innovative NLP applications and promote collaboration within the AI research community.
Fujitsu plans to leverage the advanced AI technologies and insights gained from this collaboration to promote the development of groundbreaking applications. By offering these technologies through its AI platform, Fujitsu Kozuchi (code name) – Fujitsu AI Platform, the company aims to contribute to the realization of a sustainable society.
This joint effort to optimize Fugaku for LLMs and develop large-scale language models using Japanese data is a significant step towards strengthening Japan’s AI research capabilities and maintaining its global competitiveness. By democratizing HPC and AI, Fujitsu and its partners are paving the way for a future where businesses and organizations of all sizes can harness the power of AI to differentiate themselves and contribute to a more sustainable world.
*1. GPT (Generative Pre-trained Transformer): a powerful natural language processing model based on the Transformer architecture, which is trained through unsupervised pre-training on large text datasets and is fine-tuned for specific tasks. GPT-3 and GPT-4 are the latest iterations of this model, with increased capacity and improved performance, enabling state-of-the-art results in various language tasks.
*2. The Megatron-DeepSpeed framework: Megatron-LM with DeepSpeed built in, maintained by Microsoft, the developer of DeepSpeed. Megatron-LM is a software framework for training Transformer models such as GPT on a large-scale computing environment. DeepSpeed is a software framework developed by Microsoft that facilitates large-scale, high-speed deep learning.
*3. Batched matrix multiplication: a technique that involves performing multiple small matrix multiplications concurrently within a single batch, enhancing parallel efficiency and computational speed by leveraging the capabilities of processors.
*4. The Tofu interconnect D: The Tofu (Torus fusion) interconnect family is a group of system interconnects for highly scalable HPC systems developed by Fujitsu. The Tofu Interconnect D (TofuD) is a new member to this family and designed for use in Fugaku.
*5. FP16 (half-precision floating-point format): a 16-bit representation of floating-point numbers that uses less memory and computational resources compared to higher-precision formats, such as FP32 or FP64, enabling faster training and inference in deep learning models while maintaining acceptable levels of numerical accuracy and precision.