Decoder Only Transformer

About 50 results

Open links in new tab

Any time

stackexchange.com
https://ai.stackexchange.com › questions › how-does-the-decoder-only-t…
How does the (decoder-only) transformer architecture work?
May 30, 2023 · In the decoder-only transformer, masked self-attention is nothing more than sequence padding. The 'masking' term is a left-over of the original encoder-decoder …
stackexchange.com
https://ai.stackexchange.com › questions › why-does-clip-use-a-decode…
Why does CLIP use a decoder-only transformer for encoding text?
Feb 7, 2023 · I believe because Decoder-only basically cuts down the model size in half, and has also shown empirically to be better. In the original Transformer paper, the evaluation task was …
stackexchange.com
https://ai.stackexchange.com › questions › which-situation-will-helpful-us…
Which situation will helpful using encoder or decoder or both in ...
Jul 27, 2023 · 6 I have some questions about using (encoder / decoder / encoder-decoder) transformer models, included (language) transformer or Vision transformer. The overall form of …
stackexchange.com
https://ai.stackexchange.com › questions › difference-of-encoder-decod…
Difference of encoder-decoder to decoder-only transformers w.r.t.
Nov 14, 2024 · 2 What is the difference between an encoder-decoder transformer and decoder-only transformer with regard to the loss calculation. Specifically, how does the loss signal …
stackexchange.com
https://ai.stackexchange.com › questions
Why does GPT-2 Exclude the Transformer Encoder?
GPT-2 is a close copy of the basic transformer architecture. GPT-2 does not require the encoder part of the original transformer architecture as it is decoder-only, and there are no encoder …
stackexchange.com
https://ai.stackexchange.com › questions › why-gpt-uses-decoder-only-a…
transformer - Why GPT uses decoder only architecture, when they …
Oct 8, 2024 · I wonder why does GPTs use decoder only architecture, instead of full Encoder Decoder architecture. In full encoder-decoder transformer architecture, we convert the input …
stackexchange.com
https://ai.stackexchange.com › ... › how-is-the-next-token-predicted-in-tr…
How is the next token predicted in transformers?
Apr 21, 2023 · In the transformer (or GPT/decoder only), at the end of the decoder blocks but before the final linear layer you have X vectors (for the X tokens at the input of the decoder). …
stackexchange.com
https://ai.stackexchange.com › ... › what-is-the-difference-between-trans…
What is the difference between transformer types? - Artificial ...
Sep 2, 2024 · The original transformer architecture (Attention Is All You Need, 2017) which was developed for machine translation tasks utilized both an encoder and a decoder. The encoder …
stackexchange.com
https://ai.stackexchange.com › questions › how-does-chat-gpt-encode-…
deep learning - How does Chat GPT encode a question? - Artificial ...
Dec 11, 2023 · Chat GPT is based on a decoder-only Transformer so it does not have an encoder. Given that, how is a user's question passed as input to Chat GPT's decoder? In a …
stackexchange.com
https://ai.stackexchange.com › questions › whats-make-transformer-enc…
What's make transformer encoder difference from its decoder part?
Nov 30, 2024 · You’re right that encoder-decoder transformer aligns with the traditional autoencoder (AE) structure except AE’s encoder output is usually a compressed latent …

Pagination
- 1
- 2
- 3
- Next