I have a special situation where I need to embed a sentence with bert- sentence transfer and get a numeric value for it. This is not possible with this model, because we get a vector of very high dimensionality len([v1, v2 , ...]) = 384, and I don't know which valid method exists to reduce that dimensionality to a single value. I have the following code to compute the embedding model:

from sentence_transformers import SentenceTransformer
sbert_model = SentenceTransformer('all-MiniLM-L6-v2') 
sentence_embeddings = sbert_model.encode(sentences)
print('BERT Embedding Vector length', len(sentence_embeddings[0])) # = 384

I need a rigorous way to get a one numeric value embedding for each sentence (1000 sentences in total), so the shape of sentence_embeddings.shape() = (1000, 384). I did the following with linear layers:

import torch.nn as nn
def linear_layer(embeddings: np.array, input_m, output_r):
    inputs = torch.from_numpy(embeddings)
    m_to_r_feat = nn.Linear(input_m, output_r)
    out = m_to_r_feat(inputs).flatten()
    return out.detach().numpy()
1d_embedding = linear_layer(sentence_embeddings, 384, 1)

Is this Ok? And do I need to do it for each sentence, or the way I did it is correct?

🔴 No definitive solution yet