🤖 MLguru #12: DeepMind Loses Millions, BERT becomes roBERTa, and a replica of GPT-2

Photo of Mateusz Opala

Mateusz Opala

Updated Jan 4, 2023 • 4 min read
MLguru facebook -1

Welcome to the 12th edition of MLguru - our bi-weekly Machine Learning newsletter.

In this edition you will read about:

  • DeepMind losing millions on deep learning research,
  • The return of BERT,
  • GPT-2 replica,
  • And much more!

Also do not forget to sign up for our Sign up now.


DeepMind loses millions on deep learning research

Alphabet’s DeepMind lost $572 million last year. DeepMind, likely the world’s largest research-focused artificial intelligence operation, lost $572 million last year and more than $1 billion in the past three years. Read the article to know why it happened.


BERT becomes roBERTa

Researchers at Facebook AI and from the University of Washington modified BERT to beat the best published results on three benchmarks. They added more training data, increased pre-training length, and modified the loss term. It was the beginning of roBERTa - their new BERT variation. Read more to see how they’ve achieved it.

MLguru - quotes

And while we are still talking about BERT, let’s move on to Nvidia, who broke records in training and inference for real-time conversational AI. The company broke the hour mark in training BERT, one of the world’s most advanced AI language models and a state-of-the-art model widely considered a good standard for NLP. Read how they made it happen here.


GPT-2 replicated

As Open-AI has not released their largest model, Aaron Gokaslan and Vanya Cohen from Brown University decided to replicate the 1.5B model to allow others to build on their pretrained model and further improve it. You can access the model and generate text using their Google Colab, which you can find here. You can also read more about the whole process in their post on Medium.


Training ResNet on CIFAR-10 to 94% accuracy in 34s

Are you curious how to achieve 94% accuracy in 34 seconds? Read tips and tricks on how to train your resnet 8. This article is the final post of a series that shows how to speed up single-GPU training implementation to take on a field of multi-GPU competitors.


New state of the art optimizer - RAdam

In his new paper, Jian Liu has introduced RAdam, also called “Rectified Adam”. It’s a new variation of the classic Adam optimizer that provides an automated, dynamic adjustment to the adaptive learning rate based on their detailed study into the effects of variance and momentum during training. Read more about the paper here.

Photo of Mateusz Opala

More posts by this author

Mateusz Opala

How to build products fast?  We've just answered the question in our Digital Acceleration Editorial  Sign up to get access

Read more on our Blog

Check out the knowledge base collected and distilled by experienced professionals.

We're Netguru!

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency
Let's talk business!

Trusted by: