How AI Models Understand Our Text❗πŸ’»πŸ” πŸ”‘

In today's world, interacting with AI models, passing prompts and generating ideas has undoubtedly become a necessity.

But since ages, understanding language has always felt like a very human thing!

.

And now with the introduction of this large language models (LLMs) and their ability to interpret everything we pass as a prompt, definitely raises some sort of curiosity to figure out the driving mechanism behind it.


So, in an attempt of trying to understand a few of these things, I explored the topic a bit and whatever I have learned, I would love to share in this blog in the simplest way possible.

.

.

Whenever we pass a prompt to AI, immediately it does not understand anything,

Let's take a simple sentence:

'The boy dropped the glass and it broke'

Step 1: Tokenization

The first thing the AI model does is break the sentence into small pieces/tokens (just like separating puzzle pieces before solving it).

EXAMPLE:

The | boy | dropped | the | glass | and | it | broke


Step 2: Words become Numbers (Vectors)

AI models cannot understand words directly.

When the model is trained, a structured vector space of meaning is formed inside the model as a giant multi-dimensional map where words with similar meanings are located close to each other.


Using this learned structure, the model converts each word into a set of numbers (numerical co-ordinates in the multi-dimensional space).

EXAMPLE:

boy -> [ 0.21, -0.47, 1.32, .. ]

After this, every word carries a basic representation of its meaning. 

However, the model still does not understand the sentence yet.


Step 3: Positioning

The model must know the sequence of words, that's why it adds position information.

EXAMPLE:

The (1)    boy (2)    dropped (3)    the (4)    glass (5)    and (6)    it (7)    broke (8)


Step 4: Attention Mechanism

At this stage, every word in the sentence already contains a meaning vector and a position vector.

Now comes the attention mechanism, a key idea introduced in the research paper 'Attention Is All You Need' which delivered the 'Transformer Architecture' used by modern AI models.

Prior to this, models use to process sentences sequentially word by word, but the transformer architecture introduced a different approach.

Instead of reading the sentence one word at a time,

  • each word looks at every other word in the sentence 
  • determines which words are important for understanding its meaning 
  • refines its meaning using the information

This process is called as self-attention, because the words in the sentence attend to one another to determine their meaning. 

EXAMPLE:

The word "dropped" initially holds its basic meaning i.e. action of letting something fall and further in accordance with the attention mechanism, it looks at other words in the sentence 

boy -> who did the dropping?

glass -> what was dropped?

and its meaning refines from 'dropped' to 'boy dropped glass'.

This same process is repeated for every word and across multiple layers in such a way that with each layer the understanding improves and meaning starts to form.

.

.

So.... what feels to us like understanding is in reality PREDICTION based on patterns learned from massive amount of data.

.

And when billions of these predictions happen smoothly and accurately,

It feels like INTELLIGENCE...

But underneath, it's MATHEMATICS!!




Comments

Post a Comment

Popular posts from this blog

Freedom Whispers ! πŸ’«

Be the Author of your life !πŸ“ƒ✏✅