Questions tagged [encoder]
24 questions
4
votes
2 answers
What is the difference between BERT architecture and vanilla Transformer architecture
I'm doing some research for the summarization task and found out BERT is derived from the Transformer model. In every blog about BERT that I have read, they focus on explaining what is a bidirectional encoder, So, I think this is what made BERT…
Luong Minh Tam
- 143
- 1
- 5
3
votes
1 answer
Why transform embedding dimension in sin-cos positional encoding?
Positional encoding using sine-cosine functions is often used in transformer models.
Assume that $X \in R^{l\times d}$ is the embedding of an example, where $l$ is the sequence length and $d$ is the embedding size. This positional encoding layer…
kyc12
- 165
- 1
- 5
3
votes
1 answer
What to do with Transformer Encoder output?
I'm in the middle of learning about Transformer layers, and I feel like I've got enough of the general idea behind them to be dangerous. I'm designing a neural network and my team would like to include them, but we're unsure how to proceed with the…
Rstan
- 33
- 1
- 3
2
votes
2 answers
Role of decoder in Transformer?
I understand the mechanics of Encoder-Decoder architecture used in the Attention Is All You Need paper. My question is more high level about the role of the decoder. Say we have a sentence translation task: Je suis ètudiant -> I am a student
The…
kyc12
- 165
- 1
- 5
2
votes
1 answer
Encoding correlation
I have rather theory-based question as I'm not that experienced in encoders, embeddings etc. Scientifically I'm mostly oriented around novel evolutionary model-based methods.
Let's assume we have data set with highly correlated attributes. Usually…
Piotr Rarus
- 854
- 1
- 5
- 15
2
votes
1 answer
Doubts regarding function used for positional encoding
In position encoding of the transformer, we usually use a sinusoidal encoding rather than a binary encoding even though a binary encoding could successfully capture the positional information very similar to a sinusoidal encoding (with multiple…
Ashwin Prasad
- 21
- 3
2
votes
2 answers
Is it vital to do label encoding with target variable
Should I always use label encoding while doing binary classification?
Rus Pylypyuk
- 21
- 2
1
vote
1 answer
Encode time-series of different lengths with keras
I have time-series as my data (one time-series per training example). I would like to encode the data within these series in a fixed-length vector of features using a keras model.
The problem is that my different examples' time-series don't have the…
Contestosis
- 191
- 1
- 6
1
vote
1 answer
How to add a Decoder & Attention Layer to Bidirectional Encoder with tensorflow 2.0
I am a beginner in machine learning and I'm trying to create a spelling correction model that spell checks for a small amount of vocab (approximately 1000 phrases). Currently, I am refering to the tensorflow 2.0 tutorials for 1. NMT with Attention,…
Dom
- 11
- 2
1
vote
1 answer
sklearn serialize label encoder for multiple categorical columns
I have a model with several categorical features that need to be converted to numeric format. I am using a combination of LabelEncoder and OneHotEncoder to achieve this.
Once in production, I need to apply the same encoding to new incoming data…
revy
- 133
- 5
1
vote
1 answer
How do I implement Dual-encoder model in Pytorch?
I am trying to implement the paper titled Learning Cross-lingual Sentence Representations via a Multi-task Dual-Encoder Model.
Here the encoder and decoder share the same weights but I am unable to put it in code. Any links ?
gaurus
- 351
- 1
- 2
- 5
1
vote
1 answer
How is RNN decoder output calculated?
I was trying to read RNN Encoder Decoder paper.
RNN (plain RNN i.e. non encoder-decoder RNN)
It starts with giving equation for RNN:
hidden state in RNN is given as:
... equation (1)
where f is a non linear activation function.
The output is a…
Mahesha999
- 299
- 1
- 9
1
vote
2 answers
What does the output of an encoder in encoder-decoder model represent?
So in most blogs or books touching upon the topic of encoder-decoder architectures the authors usually say that the last hidden state(s) of the encoder is passed as input to the decoder and the encoder output is discarded. They skim over that topic…
Marek M.
- 63
- 6
1
vote
1 answer
Encode categorical data for unsupervised learning
What is the best encoder for categorical data in unsupervised learning?
I am using unsupervised learning on mixed data (such as K-means).
Before running my unsupervised algorithm, I am using dimension reduction of my data using FAMD (PCA for mixed…
Julien PETOT
- 11
- 1
1
vote
0 answers
Motivation of LSTM with no Input
I have read this paper where authors use LSTM to learn the attention applied to several sets. They use LSTM without input or output, LSTM just uses the hidden state and evolves it:
My question is what is the motivation of using LSTM without any…
Oculu
- 11
- 2