🌟 T5 V1.1

# 🌟 New model addition

## Model description

T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
- Pre-trained on C4 only without mixing in the downstream tasks.
- no parameter sharing between embedding and classifier layer
- "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.

The key reason why these models are interesting is that - unlike the originally released models - they were trained **only** on unlabeled data and not on any labeled data, making them applicable for few-shot learning experiments. As they are very similar to the original T5 models, I assume they are relatively easy to implement.


## Open source status

* [x] the model implementation is available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/
* [x] the model weights are available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md
* [x] who are the authors: (mention them, if possible by @gh-username) - Colin Raffel ( @craffel ), Noam Shazeer ( @nshazeer ), Adam Roberts ( @adarob ), Katherine Lee, Sharan Narang, Michael Matena ( @mmatena ), Yanqi Zhou, Wei Li, Peter J. Liu

(Also tagging @patrickvonplaten as he is mentioned in the **who to tag** guide for T5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🌟 T5 V1.1 #6285

🌟 New model addition

Model description

Open source status

No positional embedding

FFN Layers

Attention Layers

Embeddings

Residuals, etc.

19 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

🌟 T5 V1.1 #6285

Description

🌟 New model addition

Model description

Open source status

Activity

patrickvonplaten commented on Sep 20, 2020

patrickvonplaten commented on Sep 20, 2020

calclavia commented on Sep 30, 2020

craffel commented on Oct 1, 2020

No positional embedding

FFN Layers

Attention Layers

Embeddings

Residuals, etc.

arglog commented on Oct 13, 2020

ratthachat commented on Oct 18, 2020

ratthachat commented on Oct 23, 2020

craffel commented on Oct 23, 2020

acul3 commented on Oct 26, 2020

19 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions