🤖 MLguru #12: DeepMind Loses Millions, BERT becomes roBERTa, and a replica of GPT-2

Updated Jun 25, 2024 • 4 min read

Welcome to the 12th edition of MLguru - our bi-weekly Machine Learning newsletter.

In this edition you will read about:

DeepMind losing millions on deep learning research,
The return of BERT,
GPT-2 replica,
And much more!

DeepMind loses millions on deep learning research

Alphabet’s DeepMind lost $572 million last year. DeepMind, likely the world’s largest research-focused artificial intelligence operation, lost $572 million last year and more than $1 billion in the past three years. Read the article to know why it happened.

BERT becomes roBERTa

Researchers at Facebook AI and from the University of Washington modified BERT to beat the best published results on three benchmarks. They added more training data, increased pre-training length, and modified the loss term. It was the beginning of roBERTa - their new BERT variation. Read more to see how they’ve achieved it.

And while we are still talking about BERT, let’s move on to Nvidia, who broke records in training and inference for real-time conversational AI. The company broke the hour mark in training BERT, one of the world’s most advanced AI language models and a state-of-the-art model widely considered a good standard for NLP. Read how they made it happen here.

GPT-2 replicated

As Open-AI has not released their largest model, Aaron Gokaslan and Vanya Cohen from Brown University decided to replicate the 1.5B model to allow others to build on their pretrained model and further improve it. You can access the model and generate text using their Google Colab, which you can find here. You can also read more about the whole process in their post on Medium.

Training ResNet on CIFAR-10 to 94% accuracy in 34s

Are you curious how to achieve 94% accuracy in 34 seconds? Read tips and tricks on how to train your resnet 8. This article is the final post of a series that shows how to speed up single-GPU training implementation to take on a field of multi-GPU competitors.

New state of the art optimizer - RAdam

In his new paper, Jian Liu has introduced RAdam, also called “Rectified Adam”. It’s a new variation of the classic Adam optimizer that provides an automated, dynamic adjustment to the adaptive learning rate based on their detailed study into the effects of variance and momentum during training. Read more about the paper here.

🤖 MLguru #12: DeepMind Loses Millions, BERT becomes roBERTa, and a replica of GPT-2

DeepMind loses millions on deep learning research

BERT becomes roBERTa

GPT-2 replicated

Training ResNet on CIFAR-10 to 94% accuracy in 34s

New state of the art optimizer - RAdam

Read more on our Blog

🤖 MLguru #17: AI can now master complex cooperative games, DeepFakes and The state of Machine Learning in Fintech

🤖 MLguru #16: Bookstore that sells AI-generated novels, OctConv by Facebook, and confident learning

🤖 MLguru #15: The State of ML Frameworks, Machine Translation, and PyTorch 1.3

🤖 MLguru #14: TensorFlow 2.0, The CodeSearchNet Challenge and AI that suggests fashion choices

🤖 MLguru #13: Fighting DeepFakes, Lung Cancer Detection and Reinforcement Learning in Games

We're Netguru