LING 385: Lecture 17 | Notes by Cole Gawin

Hidden representations are better representations

NNs that represent "XOR" maps vectors in criss-crossing categories into vectors in their own separate categories
embeddings = hidden representations of embedding NNs
- better representations of words than the one-hot encodings
  - transforms one-hot encodings into vectors with similar numerical patterns representing grammatical similarity
- learnt by predicting words from previous words in a text

Two basic ways of learning from mistakes:

BERT method: give the LLM a sentence, eliminate one or more of the words, and ask the LLM to guess what the missing words are
GPT method: give the LLM a sequence of words, and ask it to guess the next word

a language like English contains many dependencies that are not between contiguous words
- i.e. "The dog that they got is very cute"
- dog, not they, goes with is
speakers of a language are very good at learning dependencies even if they're at arbitrary distances in a sentence
knowledge in an embedding NN is not sufficient in learning all the generalizations of a language
- that's why we need LLMs, which are hundreds of NNs connected to each other

structured

LLMs are hundreds of NN's
they try to do exactly what NNs do: transform not-so-great representations into much better representations

NNs are arranged in 2 separate ways:

an excellent representation reduces Loss much better than a terrible representation
an LLM will have many stages of NNs, each of them produces a better representation than the one before
- each stage is called an encoder
- modern systems could have 24 or more such Encoders

we could get 50 NN’s to learn randomly from the same data, each learning unique regularities
- this is called an ensemble of learners
now we combine all of their knowledge somehow
each of 24 the Encoders we just discussed may have an ensemble of 24 separate NN’s, each learning a different aspect of a sentence’s structure
- maybe one NN, for instance, learns about pronouns and how they refer, and another about subjects and verbs