
(AGENPARL) – ven 10 maggio 2024 *Source*:
Tokyo Institute of Technology
Tohoku University
Fujitsu Limited
RIKEN
Nagoya University
CyberAgent Inc.Kotoba Technologies Inc.
*Immediate release:* May 10, 2024
*Headline*: Release of “Fugaku-LLM” – a large language model trained on the
supercomputer “Fugaku”
*Sub-headline: * Enhanced Japanese language ability, for use in research
and business
*Note?*This news will be available at Tokyo Tech News in our official
website in a few hours https://www.titech.ac.jp/english/news/2024/069223
Summary
– Large language model with enhanced Japanese language ability was
developed using Japanese supercomputing technology
– Distributed parallel learning by maximizing the performance of the
supercomputer “Fugaku”
– Commercial use is permitted, which will lead to innovative research
and business applications such as AI for Science
Abstract
A team of researchers in Japan released Fugaku-LLM, a large language model
[1] with enhanced Japanese language capability, using the RIKEN
supercomputer Fugaku. The team is led by Professor Rio Yokota of Tokyo
Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku
University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib
of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, Shota
Sasaki of CyberAgent, Inc, and Noriyuki Kojima of Kotoba Technologies Inc.
To train large language models on Fugaku, the researchers developed
distributed training methods, including porting the deep learning framework
Megatron-DeepSpeed to Fugaku in order to optimize the performance of
Transformers on Fugaku. They accelerated the dense matrix multiplication
library for Transformers, and optimized communication performance for
Fugaku by combining three types of parallelization techniques and
accelerated the collective communication library on the Tofu interconnect D
network.
Fugaku-LLM has 13 billion parameters[2] and is larger than the
7-billion-parameter models that have been developed widely in Japan.
Fugaku-LLM has enhanced Japanese capabilities, with an average score of 5.5
on the Japanese MT-Bench[3], the highest performance among open models that
are trained using original data produced in Japan. In particular, the
benchmark performance for humanities and social sciences tasks reached a
remarkably high score of 9.18.
Fugaku-LLM was trained on proprietary Japanese data collected by
CyberAgent, along with English data, and other data. The source code of
Fugaku-LLM is available on GitHub[4] and the model is available on Hugging
Face[5]. Fugaku-LLM can be used for research and commercial purposes as
long as users comply with the license.
In the future, as more researchers and engineers participate in improving
the models and their applications, the efficiency of training will be
improved, leading to next-generation innovative research and business
applications, such as the linkage of scientific simulation and generative
AI, and social simulation of virtual communities with thousands of AIs.
Background
In recent years, the development of large language models (LLMs) has been
active, especially in the United States. In particular, the rapid spread
of ChatGPT[6], developed by OpenAI, has profoundly impacted research and
development, economic systems, and national security. Countries other than
the U.S. are also investing enormous human and computational resources to
develop LLMs in their own countries. Japan, too, needs to secure
computational resources for AI research so as not to fall behind in this
global race. There are high expectations for Fugaku, the flagship
supercomputer system in Japan, and it is necessary to improve the
computational environment for large-scale distributed training on Fugaku to
meet these expectations.
Therefore, Tokyo Institute of Technology, Tohoku University, Fujitsu,
RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies have started
a joint research project on the development of large language models.
Role of each institution/company
Tokyo Institute of Technology: General oversight, parallelization and
communication acceleration of large language models (optimization of
communication performance by combining three types of parallelization,
acceleration of collective communication on the Tofu interconnect D network)
Tohoku University: Collection of training data and model selection
Fujitsu: Acceleration of computation and communication (acceleration of
collective communication on Tofu interconnect D network, performance
optimization of pipeline parallelization) and implementation of
pre-training and fine-tuning after training
RIKEN: Distributed parallelization and communication acceleration of
large-scale language models (acceleration of collective communication on
Tofu interconnect D network)
Nagoya University: Study on application methods of Fugaku-LLM to 3D
generative AI
CyberAgent: Provision of training data
Kotoba Technologies: Porting of deep learning framework to Fugaku
*Figure1*: https://tokyotech.box.com/s/zszt2m1p7k4pw0pf975phbpuwy97vy77
Figure 1. RIKEN’s supercomputer Fugaku ©RIKEN
Research outcome
Significantly improved the computational performance of training large
language models on the supercomputer Fugaku
GPUs[7] are the common choice of hardware for training large language
models. However, there is a global shortage of GPUs due to the large
investment from many countries to train LLMs. Under such circumstances, it
is important to show that large language models can be trained using
Fugaku, which uses CPUs instead of GPUs. The CPUs used in Fugaku are
Japanese CPUs manufactured by Fujitsu, and play an important role in terms
of revitalizing Japanese semiconductor technology.
By extracting the full potential of Fugaku, this study succeeded in
increasing the computation speed of the matrix multiplication by a factor
of 6, and the communication speed by a factor of 3. To maximize the
distributed training performance on Fugaku, the deep learning framework
Megatron-DeepSpeed was ported to Fugaku, and the dense matrix
multiplication library was accelerated for Transformer. For communication
acceleration, the researchers optimized communication performance for
Fugaku by combining three types of parallelization techniques and
accelerated the collective communication on the Tofu interconnect D
network. The knowledge gained from these efforts can be utilized in the
design of the next-generation computing infrastructure after Fugaku and
will greatly enhance Japan’s future advantage in the field of AI.
An easy-to-use, open, and secure, large language model with 13 billion
parameters
In 2023, many large language models were developed by Japanese companies,
but most of them have less than 7 billion parameters. Since the performance
of large-scale language models generally improves as the number of
parameters increases, the 13-billion-parameter model they developed is
likely to be more powerful than other domestic models. Although larger
models have been developed outside of Japan, large language models also
require large computational resources, making it difficult to use models
with too many parameters. Fugaku-LLM is both high performance and
well-balanced.
In addition, most models developed by Japanese companies employ Continual
learning[8], in which open models developed outside of Japan are
continually trained on Japanese data. In contrast, Fugaku-LLM is trained
from scratch using the team’s own data, so the entire learning process can
be understood, which is superior in terms of transparency and safety.
Fugaku-LLM was trained on 380 billion tokens using 13,824 nodes of Fugaku,
with about 60% of the training data being Japanese, combined with English,
mathematics, and code. Compared to models that continually train on
Japanese, Fugaku-LLM learned much of its information in Japanese.
Fugaku-LLM is the best model among open models that are produced in Japan
and trained with original data. In particular, it was confirmed that the
model shows a high benchmark score of 9.18 in the humanities and social
sciences tasks. It is expected that the model will be able to perform
natural dialogue based on keigo (honorific speech) and other features of
the Japanese language.
Future Development
The results from this research are being made public through GitHub and
Hugging Face so that other researchers and engineers can use them to
further develop large language models. Fugaku-LLM can be use for research
and commercial purposes as long as users comply with the license.
In the future, as more researchers and engineers participate in improving
the models and their applications, the efficiency of training will be
improved, leading to next-generation innovative research and business
applications, such as the linkage of scientific simulation and generative
AI, and social simulation of virtual communities with thousands of AIs.
Acknowledgement
This research was supported by the Fugaku policy-supporting proposal
“Development of Distributed Parallel Training for Large Language Models
Using Fugaku” (proposal number: hp230254).
Terms
[1] Large language model : Models the probability with which text appears
and can predict the text (response) that follows a given context (query).
[2] Parameter : A measure of the size of a neural network. The more
parameters, the higher the performance of the model, but the more data is
required for training.
[3] Japanese MT-Bench : Benchmark test provided by Stability AI
[4] GitHub : Platform used to publish open source software GitHub
[5] Hugging Face : Platforms used to publish AI datasets Hugging Face
[6] ChatGPT : A large language model developed by OpenAI, which has brought
about a major social change, surpassing 100 million users in about two
months after its release.
[7] GPU : Originally produced as an accelerator for graphics, but has
recently been used to accelerate deep learning
[8] Continual learning : A method for performing additional training on a
large language model that has already been trained. Used for training
language models in different languages or domains.
*Contact: *Emiko Kawaguchi, Public Relations Department, Tokyo Institute
*About Tokyo Institute of Technology*
Tokyo Tech stands at the forefront of research and higher education as the
leading university for science and technology in Japan. Tokyo Tech
researchers excel in fields ranging from materials science to biology,
computer science, and physics. Founded in 1881, Tokyo Tech hosts over
10,000 undergraduate and graduate students per year, who develop into
scientific leaders and some of the most sought-after engineers in industry.
Embodying the Japanese philosophy of “monotsukuri,” meaning “technical
ingenuity and innovation,” the Tokyo Tech community strives to contribute
to society through high-impact research.
https://www.titech.ac.jp/english/