About 50 results
Open links in new tab
  1. How does the (decoder-only) transformer architecture work?

    May 30, 2023 · In the decoder-only transformer, masked self-attention is nothing more than sequence padding. The 'masking' term is a left-over of the original encoder-decoder …

  2. Why does CLIP use a decoder-only transformer for encoding text?

    Feb 7, 2023 · I believe because Decoder-only basically cuts down the model size in half, and has also shown empirically to be better. In the original Transformer paper, the evaluation task was …

  3. Which situation will helpful using encoder or decoder or both in ...

    Jul 27, 2023 · 6 I have some questions about using (encoder / decoder / encoder-decoder) transformer models, included (language) transformer or Vision transformer. The overall form of …

  4. Difference of encoder-decoder to decoder-only transformers w.r.t.

    Nov 14, 2024 · 2 What is the difference between an encoder-decoder transformer and decoder-only transformer with regard to the loss calculation. Specifically, how does the loss signal …

  5. Why does GPT-2 Exclude the Transformer Encoder?

    GPT-2 is a close copy of the basic transformer architecture. GPT-2 does not require the encoder part of the original transformer architecture as it is decoder-only, and there are no encoder …

  6. transformer - Why GPT uses decoder only architecture, when they …

    Oct 8, 2024 · I wonder why does GPTs use decoder only architecture, instead of full Encoder Decoder architecture. In full encoder-decoder transformer architecture, we convert the input …

  7. How is the next token predicted in transformers?

    Apr 21, 2023 · In the transformer (or GPT/decoder only), at the end of the decoder blocks but before the final linear layer you have X vectors (for the X tokens at the input of the decoder). …

  8. What is the difference between transformer types? - Artificial ...

    Sep 2, 2024 · The original transformer architecture (Attention Is All You Need, 2017) which was developed for machine translation tasks utilized both an encoder and a decoder. The encoder …

  9. deep learning - How does Chat GPT encode a question? - Artificial ...

    Dec 11, 2023 · Chat GPT is based on a decoder-only Transformer so it does not have an encoder. Given that, how is a user's question passed as input to Chat GPT's decoder? In a …

  10. What's make transformer encoder difference from its decoder part?

    Nov 30, 2024 · You’re right that encoder-decoder transformer aligns with the traditional autoencoder (AE) structure except AE’s encoder output is usually a compressed latent …