v21’s avatarv21’s Twitter Archive—№ 61,615

  1. RT @artetxem: Who said that training GPT-2 or BERT was expensive? "We use 512 Nvidia V100 GPUs [...] Upon the submission of this paper, tr…