Digital DNA — Vector Mapping

The Core Idea

Words as Numbers

To a computer, the word "cat" is just a string of characters. It doesn't know that cats are animals, that they're similar to dogs, or that they're very different from trucks.

The breakthrough: give every word a position in space. Each word becomes a list of numbers — a vector — that represents its meaning. Words with similar meanings land near each other. "Cat" is close to "dog" but far from "truck."

These positions are called embeddings. They're learned by reading billions of sentences and noticing which words appear in similar contexts. "You shall know a word by the company it keeps."

Interactive

2D Word Map

Below is a simplified 2D embedding space with ~35 words. Click any word to highlight its nearest neighbors. Notice how words cluster by meaning: animals together, colors together, countries together.

Word Embedding Space Click a word to explore

The Famous Example

Vector Arithmetic

Here's where it gets magical. Since words are positions in space, you can do math with them. The most famous example:

king − man + woman ≈ queen

Take the "king" vector, subtract the "man" direction, add the "woman" direction, and you land near "queen." The math captures the relationship between gender and royalty. Pick an equation below and watch the arrows.

≈ queen

Measuring Meaning

Cosine Similarity

How do we measure if two words are similar? We look at the angle between their vectors. If they point in the same direction, they're similar (angle near 0°, similarity near 1.0). If they're perpendicular, they're unrelated (angle near 90°, similarity near 0). If they point opposite ways, they're opposites (angle near 180°, similarity near −1.0).

Drag the endpoints of the two vectors below and watch the similarity update in real time.

Cosine Similarity Explorer Drag the vector tips

0.87 similarity

The Big Picture

Why It Matters

Embeddings are the foundation of modern AI's understanding of language. When you search Google, it converts your query into a vector and finds pages with similar vectors — even if they don't share exact words. When Netflix recommends a movie, it's comparing embedding vectors.

Large language models like GPT and Claude use embeddings as their first step: every word goes in as a vector, gets transformed through many layers, and comes out as a prediction of what word comes next. The entire magic of AI conversation starts with turning words into points in space.

Embeddings

Dense vector representations of words learned from context. Similar meanings → nearby vectors. The backbone of modern NLP.

Semantic Search

Search by meaning, not keywords. "How to fix a flat" matches "tire repair guide" because their embeddings are close.

Transformers

The architecture behind GPT, Claude, and BERT. Takes embeddings and lets words "attend" to each other across a sentence.

Dimensionality

Real embeddings use 300–1536 dimensions, not 2. More dimensions capture more nuance. Our 2D map is a simplification.