SentenceTransformer API vs. Transformer API + pooling

In your documentation you mention two approaches to using your package to create sentence embeddings.

First, from the [Quickstart](https://www.sbert.net/docs/quickstart.html#quickstart), you wrote:

```python
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')

#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']

#Sentences are encoded by calling model.encode()
sentence_embeddings = model.encode(sentences)
print(sentence_embeddings.shape)
# (3, 768)
```

Second, from [Sentence Embeddings with Transformers](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html#sentence-embeddings-with-transformers), you wrote:

```python
model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
# Model is of type: transformers.modeling_bert.BertModel

#Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print(sentence_embeddings.shape)
# torch.Size([3, 768])
```

What are the important differences between these two approaches? The only thing I can see is that in the second approach, the `BertModel` model returns token embeddings and then you manually perform pooling (mean or max). If I use this second approach, what would I be missing from using `SentenceTransformer.encode()`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SentenceTransformer API vs. Transformer API + pooling #405

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

SentenceTransformer API vs. Transformer API + pooling #405

Description

Activity

nreimers commented on Sep 2, 2020

githubrandomuser2017 commented on Sep 2, 2020

nreimers commented on Sep 3, 2020

MathewAlexander commented on Sep 8, 2020

nreimers commented on Sep 9, 2020

MathewAlexander commented on Sep 9, 2020

githubrandomuser2017 commented on Sep 9, 2020

nreimers commented on Sep 10, 2020

githubrandomuser2017 commented on Sep 19, 2020

nreimers commented on Sep 21, 2020

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions