Create: 2020-08-27 Update: 2025-06-29 14:38:49

The latest paper by David Patterson & Google TPU team reveals details of the world most efficient and one of the most powerful supercomputers for DNN Acceleration - TPU v3. The one which was used to train BERT.
We recommend that you definitely read the full text, but here are insights and tldr highlights

Key Insight:
The co-design of an ML-specific programming system (TensorFlow), compiler (XLA), architecture (TPU), floating-point arithmetic (Brain float16), interconnect (ICI), and chip (TPUv2/v3) let production ML applications scale at 96%–99% of perfect linear speedup and 10x gains in performance/ Watt over the most efficient general-purpose supercomputers.

More highlights:

🐣🐤🐔 Three generations
There are 3 generations of TPU now released, TPU v1 used fixpoint arithmetic and was used for inference only. TPU v2 and v3 operate in floating-point and used for training. TPU v4 results were presented in MLPerf summer release, but there is no public information available. The TPU architecture differs from CPU with
▪️ Two Dimensional array processing units (instead of 1D vector SIMDs in CPU)
▪️Narrower data (8-16 bits)
▪️ Drop complex CPU features - caches and branch prediction

🐮🤜🐤 Fewer cores per chip (two oxen vs 1024 chickens)
NVidia put thousands of CUDA cores inside their chip. TPU v3 has only 2 TensorCores per chip. It's way easier to generate a program for 2 beefier cores than to swarm of wimpier cores.
Each TensorCore includes the following units:-
▪️ICI(Inter Core Interconnects) - connect core across different chips-
▪️HBM, stacked DRAM on the same interposes substrate-
▪️Core Sequencer - manages instructions and performs scalar operations-
▪️Vector Processing Unit, performs vectors operation for 1D and 2D vectors-
▪️Matrix Multiply Unit (MXU)

🐱🐶❓ From inference to training chip
Key challenges on the way from inference chip V1 to training hardware V2
▪️ Harder parallelization
▪️ More computation
▪️ More memory
▪️ More programmability
▪️ Wider dynamic range of data

✂️🧮✂️ Brain Float
IEEE FP16 and FP32 use (1+8+23) and (1+5+7) bits for the sign, exponent, and mantissa values respectively. In practice, DNN doesn't need mantissa precision of FP32, but the dynamic range of FP16 is not enough. Using of FP16 also requires loss scaling.
The compromised bf16 keeps the same 8 bits for exponent, as FP32, but reduced mantissa - only 7 bits instead of 23.
BF16 delivers reducing space usage and power consumption with no loss scaling in software required.

🍩🧬⚡️ Torus topology and ICI
TPU v1 was an accelerator card for CPU 'based computer. TPUv2 and v3 are building blocks of the supercomputer. Chips connected with ICI interface, each running at ~500Gbits/s. ICU enables direct connection between chips, so no need of any extra interfaces. GPU/CPU based supercomputers have to apply NVLink and PCI-E inside computer chase and InfiniBand network and switches to connect them.
Chips in TPUv2 and v3 clusters are connected in 2D Torus topology (doughnut ) and achieve an unbelievable linear scale of performance growth with increasing of chips number.

🛠⚙️🖥 XLA compiler (to orchestrate them all)
TF programs are graphs of operations, where tensor-arrays are first-class citizens. XLA compiler front-end transforms the TF graph into an intermediate representation, which is then efficiently mapped into selected TPU (or CPU/GPU) architectures. XLA maps TF graph parallelism across hundreds of chips, TensorCores per chip, multiple units per core. XLA provides precise reasoning about memory use at every point in the program.
Young XLA compiler has more opportunities to improve than a more mature CUDA stack.

🌲🐰🦊 Green Power (Forest animals approves)
TPU v3 supercomputer already climbed on the 4th row of TOP500 ranking, but what is remarkable - it demonstrates an overwhelming 146.3 GFLops/Watt performance. The nearest competitor has 10 times and lower number.

Original Paper
A Domain Specific Computer for training DNN

PDP-11🚀

cacm.acm.org

A Domain-Specific Supercomputer for Training Deep Neural Networks

Google's TPU supercomputers train deep neural networks 50x faster than general-purpose supercomputers running a high-performance computing benchmark.

hottg.com/pdp11ml/274

16.7K viewsedited Aug 27, 2020 at 15:31

>>Click here to continue<<

PDP-11🚀

Share with your best friend

Can I mute a Telegram group?

In recent times, Telegram has gained a lot of popularity because of the controversy over WhatsApp’s new privacy policy. In January 2021, Telegram was the most downloaded app worldwide and crossed 500 million monthly active users. And with so many active users on the app, people might get messages in bulk from a group or a channel that can be a little irritating. So to get rid of the same, you can mute groups, chats, and channels on Telegram just like WhatsApp. You can mute notifications for one hour, eight hours, or two days, or you can disable notifications forever.

The latest paper by David Patterson & Google TPU team reveals details of the world most efficient and one of the most powerful supercomputers for DNN Acceleration - TPU v3. The one which was used to train BERT.

PDP-11🚀 TG
Webview: 274
Telegram TG Webview: hottg.com/pdp11ml/webview
Telegram TG Channel: PDP-11🚀
Telegram Updated: 2025-06-29 14:38:49

United States America Popular Telegram Group (US)

Telegram Q&A

Q: How does hottg.com work?

Once you've set up a username, you can give people a hottg.com/username link. Opening that link on their phone will automatically fire up their Telegram app and open a chat with you. You can share username links with friends, write them on business cards or put them up on your website.This way people can contact you on Telegram without knowing your phone number.

With Telegram, you can send messages, photos, videos and files of any type (doc, zip, mp3, etc), as well as create groups for up to 200,000 people or channels for broadcasting to unlimited audiences. You can write to your phone contacts and find people by their usernames. As a result, Telegram is like SMS and email combined — and can take care of all your personal or business messaging needs. In addition to this, we support end-to-end encrypted voice calls.

Q: What is Telegram? What do I do here?

Telegram is a messaging app with a focus on speed and security, it’s super-fast, simple and free. You can use Telegram on all your devices at the same time — your messages sync seamlessly across any number of your phones, tablets or computers.

Q: Who is Telegram for?

Telegram is for everyone who wants fast and reliable messaging and calls. Business users and small teams may like the large groups, usernames, desktop apps and powerful file sharing options. You can appoint admins with advanced tools to help these communities prosper in peace. Public groups can be joined by anyone and are powerful platforms for discussions and collecting feedback.In case you're more into pictures, Telegram has animated gif search, a state of the art photo editor, and an open sticker platform (find some cool stickers here or here). What's more, there is no need to worry about disk space on your device. With Telegram's cloud support and cache management options, Telegram can take up nearly zero space on your phone.

Q: How is Telegram different from WhatsApp?

Unlike WhatsApp, Telegram is a cloud-based messenger with seamless sync. As a result, you can access your messages from several devices at once, including tablets and computers, and share an unlimited number of photos, videos and files (doc, zip, mp3, etc.) of up to 2 GB each. And if you don't want to store all that data on your device, you can always keep it in the cloud.Thanks to our multi-data center infrastructure and encryption, Telegram is faster and way more secure. On top of that, Telegram is free and will stay free — no ads, no subscription fees, forever.

Q: Can I make calls via Telegram?

Yes! Voice calls are currently available to users around the world.

Many modern travelers appear to struggle with managing various aspects of their finances simultaneously while abroad, such as banking, budgeting, investing, trading, and saving. It is important to have apps installed on the device that will help you carry out these necessary tasks.

Hot Topic in US