🤖 MLguru #12: DeepMind Loses Millions, BERT becomes roBERTa, and a replica of GPT-2
Welcome to the 12th edition of MLguru - our bi-weekly Machine Learning newsletter.
In this edition you will read about:
- DeepMind losing millions on deep learning research,
- The return of BERT,
- GPT-2 replica,
- And much more!
Also do not forget to sign up for our Machine Learning Webinar, taking place on the 12th of September. Our ML duo, Piotr & Konrad, will share loads of practical knowledge with everyone interested in implementing Machine Learning solutions. Sign up now.
DeepMind loses millions on deep learning research
Alphabet’s DeepMind lost $572 million last year. DeepMind, likely the world’s largest research-focused artificial intelligence operation, lost $572 million last year and more than $1 billion in the past three years. Read the article to know why it happened.
BERT becomes roBERTa
Researchers at Facebook AI and from the University of Washington modified BERT to beat the best published results on three benchmarks. They added more training data, increased pre-training length, and modified the loss term. It was the beginning of roBERTa - their new BERT variation. Read more to see how they’ve achieved it.
And while we are still talking about BERT, let’s move on to Nvidia, who broke records in training and inference for real-time conversational AI. The company broke the hour mark in training BERT, one of the world’s most advanced AI language models and a state-of-the-art model widely considered a good standard for NLP. Read how they made it happen here.
As Open-AI has not released their largest model, Aaron Gokaslan and Vanya Cohen from Brown University decided to replicate the 1.5B model to allow others to build on their pretrained model and further improve it. You can access the model and generate text using their Google Colab, which you can find here. You can also read more about the whole process in their post on Medium.
Training ResNet on CIFAR-10 to 94% accuracy in 34s
Are you curious how to achieve 94% accuracy in 34 seconds? Read tips and tricks on how to train your resnet 8. This article is the final post of a series that shows how to speed up single-GPU training implementation to take on a field of multi-GPU competitors.
New state of the art optimizer - RAdam
In his new paper, Jian Liu has introduced RAdam, also called “Rectified Adam”. It’s a new variation of the classic Adam optimizer that provides an automated, dynamic adjustment to the adaptive learning rate based on their detailed study into the effects of variance and momentum during training. Read more about the paper here.