Sidekick Project: What is Large Language Model (LLM)?

Imagine you were trying to teach a child (robot) to learn how to speak. You might start giving it a very simple sentence with just a few words. For example,

"He who asks a question is a fool for five minutes; he who does not ask a question is a fool forever."

Based on this sentence, a child realised that there are 13 possible words that he could play with. By giving one word, he would know what the next word should be.

Well, things can get trickier when some words can have multiple possibilities to produce the next words, i.e. "who", or "a". The child robot then might randomly choose the next possible paths:

"he who asks a question is a fool for five minute he who does not ask a question is a fool forever"
"he who asks a fool forever"
"he who asks a fool for five minute"
"he who does not ask a fool forever"
(so on) ...

Well, most of these do not make sense!

To help your child robot to learn, you could feed it with more sample sentences. Then your child robot can build some functions to determine when facing multiple choices, which path to choose. He/she might come up with simple function that based upon probability from previous one, or two, or three words and so on.

Next_word = Probability(word-1, word-2, word-3, ...)

This function based on probability is probably a bit "crude", as you can probably guess that we have over 10,000 words in a dictionary. So given if you were to look back one word, that would give over 10,000 possibilities. Looking back 2 words means 10,000^2 possibilities and so on! - your robot child would likely catch on fire from overheating!

Soon, you probably started looking at some other more clever and faster functions, some approximate functions, like:

proximity function based upon distance between words
Fourier Series function
Taylor Series function
etc

Imagine, you/we have been looking at finding and tuning these functions since 1966!

Now comes good news for the AI (or bad news for us human), with the breakthrough of Deep Learning and Neural-Network transformer models/algorithms in late 2017, this process can be calculated very quickly and your robot child would not catch a fire. Imagine that since then, your robot child has grown very hungry and started consuming more and more texts, images, and other data. Now in 2024, imagine that this robot child has gobbled up more than billions pieces of texts sized to be in petabytes of data and more, ... and still counting.

Well, that child is now LLM

As intelligence always comes with arrogance, as for us, (human) parents, all we can do right now can be summarised in one picture below.

(Continue part 2 - What is Neural Network?)

Sidekick Project

What is Large Language Model (LLM)?

No comments:

Post a Comment