-
Notifications
You must be signed in to change notification settings - Fork 29.1k
🌟 T5 V1.1 #6285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry for the long delay on this one - I hope to be able to take a look in the next two weeks :-) |
And thanks a lot for the very in-detail description here! |
Any update on this task? |
Hi all, in case it is helpful Noam recently wrote up a maybe-exhaustive list of the differences between T5.1.1 and a vanilla Transformer. Copying it here:
|
Any updates on this? It's exciting to be able to use the T5 v1.1 models in huggingface! Thanks! |
Hi Patrick, |
@craffel Thanks for your clarification about T5.1.1 . |
Hi, the source is all in the mesh TF transformer codebase |
Multilingual t5 (mt5) has been released it looks like use same implementation method as T5 v1.1 |
Hey guys, I will start adding mT5 next week |
@patrickvonplaten : waiting for mt5 :) |
Yep will start working on it this week :-) |
Think a reasonable estimate for official release is in ~2 weeks: #8488 |
T5 V1.1 and MT5 have the same architecture. I'm struggling a bit with finding a good name for the library. Not sure if I like the names => Going for |
Hi @patrickvonplaten , thanks again ! |
Yeah good point @ratthachat! @craffel - We decided internally that we will make a new model file for T5v1.1 / mT5 as it's more in line with the libraries' philosophy. The best name that I can think of at the moment is Would be super interested in hearing your opinion about it! Or better name suggestions in case you have some :-) |
It might be confusing to refer to T5.1.1 as T5 v2 since it would result in an inconsistent versioning system. I think T511Model is probably ok, but I defer to you all as to what HF's naming convention should be. |
I would either suggest:
|
If possible and not cause any harm I support @agemagician choice 2. above. |
I haven't reproduce benchmark performance (such as glue cola, mrpc, etc.) with PyTorch T5.1.1 so far. Is anyone else trying this? |
I have reproduced mT5-small model by finetuning XNLI benchmark task now. It seems to work. |
Uh oh!
There was an error while loading. Please reload this page.
🌟 New model addition
Model description
T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:
The key reason why these models are interesting is that - unlike the originally released models - they were trained only on unlabeled data and not on any labeled data, making them applicable for few-shot learning experiments. As they are very similar to the original T5 models, I assume they are relatively easy to implement.
Open source status
(Also tagging @patrickvonplaten as he is mentioned in the who to tag guide for T5)
The text was updated successfully, but these errors were encountered: