Software program engineers develop a solution to run AI language fashions with out matrix multiplication

Software engineers develop a way to run AI language models without matrix multiplication — Overview of the MatMul-free LM. The sequence of operations are proven for vanilla self-attention (top-left), the MatMul-free token mixer (top-right), and Ternary Accumulations. The MatMul-free LM employs a MatMul-free token mixer (MLGRU) and a MatMul-free channel mixer (MatMul-free GLU) to keep up the transformer-like structure whereas lowering compute value. Credit score: *arXiv* (2024). DOI: 10.48550/arxiv.2406.02528

A workforce of software program engineers on the College of California, working with one colleague from Soochow College and one other from LuxiTec, has developed a solution to run AI language fashions with out utilizing matrix multiplication. The workforce has printed a paper on the arXiv preprint server describing their new method and the way nicely it has labored throughout testing.

As the ability of LLMs similar to ChatGPT has grown, so too have the computing assets they require. A part of the method of operating LLMs entails performing matrix multiplication (MatMul), the place information is mixed with weights in neural networks to offer seemingly finest solutions to queries.

Early on, AI researchers found that graphics processing items (GPUs) had been ideally suited to neural community purposes as a result of they will run a number of processes concurrently—on this case, a number of MatMuls. However now, even with enormous clusters of GPUs, MatMuls have change into bottlenecks as the ability of LLMs grows together with the variety of individuals utilizing them.

On this new research, the analysis workforce claims to have developed a solution to run AI language fashions with out the necessity to perform MatMuls—and to do it simply as effectively.

To realize this feat, the analysis workforce took a brand new method to how information is weighted—they changed the present methodology that depends on 16-bit floating factors with one which makes use of simply three: {-1, 0, 1} together with new capabilities that perform the identical varieties of operations because the prior methodology.

In addition they developed new quantization strategies that helped increase efficiency. With fewer weights, much less processing is required, ensuing within the want for much less computing energy. However additionally they radically modified the best way LLMs are processed by utilizing what they describe as a MatMul-free linear gated recurrent unit (MLGRU) within the place of conventional transformer blocks.

In testing their new concepts, the researchers discovered {that a} system utilizing their new method achieved a efficiency that was on par with state-of-the-art techniques at the moment in use. On the similar time, they discovered that their system used far much less computing energy and electrical energy than is mostly the case with conventional techniques.

Extra info:
Rui-Jie Zhu et al, Scalable MatMul-free Language Modeling, arXiv (2024). DOI: 10.48550/arxiv.2406.02528

Journal info:
arXiv

Quotation:
Software program engineers develop a solution to run AI language fashions with out matrix multiplication (2024, June 26)
retrieved 26 June 2024
from https://techxplore.com/information/2024-06-software-ai-language-matrix-multiplication.html

This doc is topic to copyright. Aside from any honest dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for info functions solely.

Software program engineers develop a solution to run AI language fashions with out matrix multiplication

‘Harry Potter’ Sequence at HBO Faucets Francesca Gardiner, Mark Mylod

Volkswagen takes $1 billion stake in EV maker

NewsGo

Volkswagen takes $1 billion stake in EV maker

Bianca Censori in Revealing Outfit with Kanye West at Cheesecake Manufacturing facility

Kate Middleton wished to ‘personal up’ to Photoshop fail, thought ‘honesty was the perfect coverage’: ‘Deeply upset’

Takeaways from Alabama Basketball’s Elite Eight Win Over Clemson

How one can rejoice and be an ally

Watch Champions League Soccer: Livestream Bayern Munich vs. Lazio From Anyplace

Fb, Instagram logins restored following reported outage

Did Fb log you out? Web site skilled outage on Tremendous Tuesday

Watch Champions League Soccer: Livestream Bayern Munich vs. Lazio From Anyplace

Bayern Munich vs. Lazio prediction, odds, begin time: 2024 UEFA Champions League picks, finest bets for March 5

Lakers unlock sturdy defensive effort, defeat Oklahoma Metropolis

Duleep Trophy: Who’re the sensational Shams Mulani, Tanush Korian and Manav Suthar

Asian Champions Trophy 2024: Asian Champions Trophy ultimate between India and China, when will the match begin, the place to look at reside streaming?

IND vs BAN: Solely 152 runs extra… Virat Kohli will be part of the particular membership, solely three Indians together with Sachin are in it

5 batsmen who’ve hit probably the most sixes in a calendar yr in Exams, McCullum’s document is about to be damaged!

‘Study from India and repair the schooling system’, who suggested Pakistan to ask for cash?

Browse by Category

Recent News

Duleep Trophy: Who’re the sensational Shams Mulani, Tanush Korian and Manav Suthar

Asian Champions Trophy 2024: Asian Champions Trophy ultimate between India and China, when will the match begin, the place to look at reside streaming?