BLOOM AI Model — The Stepping Stone For Next-Level Intelligence

8 min readJul 18, 2023

BLOOM AI: The largest, open-source multilingual language model

The emergence of artificial intelligence has created a breakthrough in the world. The BLOOM model is a versatile framework at the technology forefront with advanced capabilities of understanding natural language, machine learning, and problem-solving.

The BLOOM model, “Biologically Localized and Online One-shot Multi-Task Learning,” is a machine learning framework, breaking the frontiers in generative AI, that blends the power of deep learning algorithms with human-brain inspired notions.

Developed by more than 1000 AI researchers, BLOOM AI is the largest open-access AI model. It creates an opportunity for small businesses, start-ups, and individuals to leverage the potential of the AI model to create innovative applications.

Without further ado, let’s delve deep into the BLOOM AI model and see how it is a stepping stone for the next level of intelligence!

Everything you should know about BLOOM AI

BLOOM is an open-access multilingual language model with a staggering 176 billion parameters and training data on over 366 billion tokens. The initiatives of Hugging Face’s Big Science team, the Microsoft DeepSpeed Team, the NVIDIA Megatron-LM Team, the IDRIS/GENCI Team, the PyTorch team, and BigScience’s engineering team were involved in developing the most perfect language model in the world.

The project was founded by Hugging Face and the French NLP community and soon went on to attract participants from over 70+ countries and experts from 250 institutions. The two imminent French agencies-CNRS and GENCI, provided a computing grant of a whopping three million for the research and training of the BLOOM Model. The BLOOM Model was trained on the Jean Zay supercomputer at IDRIS/CNRS in the south of Paris for over 117 days (11 March — 6 July 2022).

It is built on Transformer architecture which comprises an input-embedding layer, 70 transformers blocks, and an output language-modeling layer. The architecture of the BLOOM model is identical to GPT3; however, BLOOM is trained in 46 different languages and 13 programming languages.

What languages is BLOOM AI trained on?

BLOOM is based on the causal language model. It is trained as a next-token predictor and predicts the succeeding token in a sentence based on the preceding tokens. This attribute enables BLOOM to connect different concepts in a sentence and accurately solve arithmetic, translational, and programming problems. BLOOM’s architecture comprises 70 transformer blocks with each block comprising a self-attention layer and a multi-perceptron layer, with input and post-attention layer norms.

Graph-pattern search, full-text search, edit graph data, slicer, and advanced phrases query searches are a few of the capabilities that BLOOM possesses. One of the major advantages of BLOOM is that it is a 16 GB RAM which is sufficient to run a super-powerful language model without the necessity of a GPU.

What are they differentiators between BLOOM AI and ChatGPT?

Here are some differentiators that set BLOOM AI apart from other language modrls:

Employed 384 graphics cards of 80 gigabytes each on the Jean Zay 28 PFLOPS supercomputer for training.
Utilizes 176 billion parameters
Seventy layers with 112 attention heads for each layer.
Implements ALiBi positional embeddings — GeLU activation function
Open-source, anyone can use and access it.

Understanding BLOOM AI’s Architecture

How does the BLOOM model Work?

The architecture of BLOOM is based on the casual-decoder transformer model, which is the standard model used for developing LLMs with above 100B parameters for best performance. However researchers and developers introduced key variations in the standard model to ensure BLOOM outperforms all the language models.

Here are some innovations that make BLOOM different:

ALiBi Positional Embedding

Additional information is added to the embedding layer in the standard architecture model. However, while building BLOOM, the developers implemented ALiBi (Attention with Linear Biases), which utilizes a unique approach by attenuating the attention scores from the distance between the keys and queries. The main motive is to leverage the potential of ALiBi because of its ability to extrapolate the longer sequences. However, to the researchers’ surprise, the ALiBi application enhanced downstream performance and led to a smoother training process. It even outperformed both learning and rotary embeddings.

Embedding LayerNorm

The developing team experimented with another additional layer normalization right after the embedding layer during the preliminary experiments on a whopping 104 billion parameters model, significantly improving training stability. The BigScience team decided to train BLOOM with additional layer normalization to avoid training instabilities. Notably, the preliminary experiments were conducted in float16, and the final training was performed on bfloat16. It led to a conclusion that float16 is the cause for training instabilities and bfloat16 doesn’t need an embedding LayerNorm.

BLOOM Training Process

The BLOOM Model is trained on the ROOTS corpus, and the training process comprises different stages like data sourcing and processing. The ROOTS corpus consisted of 498 Hugging Face datasets that cover 46 languages and 3 programming languages.

The BLOOM model was trained on Megatron-DeepSpeed 20, a state-of-the-art framework for large-scale distributed training. This dynamic framework comprises of two parts:

Megatron-LM21 — It provides the capability for Transformer execution, tensor parallelism, and data loading primitives.
DeepSpeed 22 — It provides the ZeRO optimizer, model pipelining and distributes the training components on the table.

This framework developed by the dynamic fusion of Megatron — LM21 and DeepSpeed 22 offers efficient and effective training with 3D parallelism. It provides the four essential and complementary approaches to distributed deep learning, and they are:

Data Parallelism

Data Parallelism creates multiple replicas of the model and places each replica on a different device. The model is fed on each device with a slice or a part of the data. The parallel processing ensures the synchronization of all the model replicas at the end of every training phase.

2. Tensor Parallelism

Tensor parallelism focuses on partitioning individual layers of the model across multiple devices. Instead of having the whole activation or gradient stored on a single GPU, the fragments of the tensor are stored on multiple GPUs, which assists in performing horizontal parallelism and intra-layer model parallelism.

3. Pipe Parallelism

The pipe parallelism approach splits the model’s layers across different GPU systems to ensure that each GPU system handles a fraction of the model assisting in vertical parallelism.

4. ZeRO Optimizer -

Zero or Zero Redundancy Optimizer ensures that different processes utilize only a fraction of data (parameter, gradients, and optimizer states) necessary for training steps. The developers used ZeRO stage 1, where only the optimizer stages were shared.

The BLOOM model received training for 117 days and achieved a training throughput of 150 TFLOPS which is currently the highest throughput a language model can achieve with A100 80GB GPUs.

Advantages of the BLOOM AI model:

BLOOM offers many benefits, making it one of the most powerful tools for diverse industry domains. Here are some of its benefits:

The BLOOM model’s ability to swiftly adapt to new tasks, even with minimal training data, is one of its most striking aspects.
The BLOOM model prioritizes ethical and fair decision-making to minimize biases and promote transparency and trustworthiness.
As new duties develop, more modules may be easily added without interfering with the performance of current modules.
The BLOOM model constantly adjusts its model parameters depending on the most recent data, ensuring it stays in sync with changing data distributions.
The capacity of the BLOOM model to learn from sparse data and its complex neural network design contributes to its high accuracy.

Limitations of the BLOOM AI Model:

One thing that limits its potential to be harnessed by every organization is itshigh running costs. The BLOOM model was trained on the 384 NVIDIA Tesla A100 GPUs, which cost around $32,000 each. The LLM Research is focused on training the model on bigger aspects, leading to rising training and running costs.

Moreover, the compressed version of BLOOM is 227 GB, and specialized hardware with hundreds of gigabytes of VRAM is required to operate and run the model. Compared to Chat GPT, it requires a large computing cluster equivalent to NVIDIA DGX 2, which costs around $400,000. However, Hugging Face plans to launch an API platform for the researchers at $40/month, which may not be cost-effective.

Besides, the BLOOM model is trained on real datasets because of which it may generate biased content. This can lead to over-representing some figures, under-representing some facts, and encouraging stereotypes which can lead to the creation of factually incorrect content and the generation of repetitive texts.

Applications of BLOOM

BLOOM learning capabilities help in natural language processing

The BLOOM AI model presents many applications throughout various industries and businesses. Its potential can be leveraged to improve operational efficiency and open new doorways for innovation. One of the potential applications of the BLOOM AI model can be seen in natural language processing which include but are not limited sentiment analysis, text summarization, and language translation.

With proficient training in 46 languages and 13 programming languages, generating coherent text and content for different purposes, like marketing, content creation, and others, makes it helpful. Researchers and developers can use it for research and development purposes to build advanced language models and artificial intelligence tools.

The researchers have warned about the authenticity of the content generated by the model, and factual content for math and history should not be trusted directly, thereby limiting its usage for biomedical, political, and legal purposes.

Wrapping up,

The BLOOM AI model opens the portal to next-level intelligence with its exceptional accuracy, scalability, flexibility, rapid learning, and natural language processing. All these abilities make it an excellent tool to implement in various industries to make operations easier.

The model’s capacity to handle and analyze complex data, generate human-like responses, and take decisions based on ethical approaches makes it different from other language models. Organizations can leverage the potential of BLOOM to improve their operational efficiency and productivity. The progress in AI technology opens up new doors and unlocks opportunities to revolutionize the world, and BLOOM is one of the important stepping stones in the transformational journey.

Thanks for sticking on till the end. We appreciate your interest and commitment in exploring this fascinating field. We hope that you found the information valuable and insightful.

If you are interested in exploring Generative AI and have any relevant projects or collaborations in mind, we would be pleased to hear from you. Please feel free to contact us to discuss any ideas, questions, or potential opportunities. Once again, thank you for your readership, and we look forward to connecting with you!

About the Author:

Dr. Kiran Kumar is an accomplished AI researcher, innovator, and senior data scientist. With a Ph.D. in Supply Chain Analytics, he possesses a profound understanding of data analysis and machine-learning techniques. His extensive research contributions are showcased through numerous publications in esteemed international journals. Driven by a passion for pioneering advancements, he holds patents for groundbreaking innovations in the field. Currently, he is focused on developing cutting-edge products by leveraging his expertise in Prompt engineering and Generative AI.