Closed
Description
🌟 New model addition
Model description
T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
- Pre-trained on C4 only without mixing in the downstream tasks.
- no parameter sharing between embedding and classifier layer
- "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.
The key reason why these models are interesting is that - unlike the originally released models - they were trained only on unlabeled data and not on any labeled data, making them applicable for few-shot learning experiments. As they are very similar to the original T5 models, I assume they are relatively easy to implement.
Open source status
- the model implementation is available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/the model weights are available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.mdwho are the authors: (mention them, if possible by @gh-username) - Colin Raffel ( @craffel ), Noam Shazeer ( @nshazeer ), Adam Roberts ( @adarob ), Katherine Lee, Sharan Narang, Michael Matena ( @mmatena ), Yanqi Zhou, Wei Li, Peter J. Liu
(Also tagging @patrickvonplaten as he is mentioned in the who to tag guide for T5)
Activity
patrickvonplaten commentedon Sep 20, 2020
Sorry for the long delay on this one - I hope to be able to take a look in the next two weeks :-)
patrickvonplaten commentedon Sep 20, 2020
And thanks a lot for the very in-detail description here!
calclavia commentedon Sep 30, 2020
Any update on this task?
craffel commentedon Oct 1, 2020
Hi all, in case it is helpful Noam recently wrote up a maybe-exhaustive list of the differences between T5.1.1 and a vanilla Transformer. Copying it here:
arglog commentedon Oct 13, 2020
Hi @patrickvonplaten
Any updates on this? It's exciting to be able to use the T5 v1.1 models in huggingface! Thanks!
ratthachat commentedon Oct 18, 2020
Hi Patrick,
There are newly released T5.1.1 checkpoints which give SOTA on natural question for non-retrieval models which I posted a
discussion here . Maybe it's a bit more encouragement to integrate T5.1.1 into HF :D
ratthachat commentedon Oct 23, 2020
@craffel Thanks for your clarification about T5.1.1 .
However, I could not find any source code of T5.1.1 , is it possible to provide the link to the source ?
craffel commentedon Oct 23, 2020
Hi, the source is all in the mesh TF transformer codebase
https://github.com/tensorflow/mesh/tree/master/mesh_tensorflow/transformer
Here is the gin config for t5.1.1.base
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/models/t5.1.1.base.gin
acul3 commentedon Oct 26, 2020
Multilingual t5 (mt5) has been released
https://github.com/google-research/multilingual-t5
https://arxiv.org/abs/2010.11934
it looks like use same implementation method as T5 v1.1
really look forward to be able use it on huggingface library
19 remaining items