What's in a name? Exploring name embeddings

Generate names from descriptions with real-world analogies!
(Click here for demo powered by LLMs and LangChain.)

What does a “Steve” look like? How about a “Lily”? As computer scientists and expecting parents know, naming things is hard. Humans might choose names by their associations with things like stories, famous people, or feelings. But can computers do the same?

Yes, we can teach a computer to understand analogies—using word embeddings! Word embeddings are a concept in machine learning (ML) and natural language processing (NLP) that lets us represent words as a series of numbers (a vector). This puts the words into an embedding space in which we can do math find words that have similar or opposite meanings. For example, the paper “Efficient Estimation of Word Representations in Vector Space” (Mikolov et al. 2013, https://arxiv.org/abs/1301.3781) showed that in an embedding space, subtracting “Man” from “King” and adding “Woman” results in “Queen”.

I've always wanted to embed names into an embedding space to do similar math with names. That way, we could use a computer to find names associated with a description that we provide. In fact, large language models (LLM) like ChatGPT make this easy because they can make embeddings not only from words but also entire sentences, paragraphs, or even articles. LLMs embed text by finding related words from a huge number of other texts, such as Wikipedia articles. To create name embeddings, we can do something similar by embedding articles that are related to a name, and combining them into a single vector to represent that name.

To demonstrate this idea, I've put together a demo using LangChain which you can run here: https://colab.research.google.com/gist/devinkwok/6a9c60e82df79e94b73010d77c59bedb/whatsinaname.ipynb (note, requires Google account to run). The demo only embeds 1000 popular first names using information from Wikipedia article summaries, but can already find some interesting (and occasionally surprising!) analogies between names and descriptions. For example, the computer suggests the name “Ariella”, which is a genus of sea snails), for the description “Lover of dolphins and sea life.” Unsurprisingly, “Draws comic book covers” gives both the names of comic book creators (Lilah) and characters from comic books (Grayson). The demo also distinguishes between masculine and feminine names when “Woman”, ”Male”, etc. is added to the description, albeit inconsistently (perhaps because shorter summaries lack information such as pronouns).

A fun application of this project might be to investigate nominative determinism, which is the hypothesis that people tend to work in fields that are related to their names. If this hypothesis is true for a particular trait (e.g. a person's gender or job), some name embeddings should be much more similar to that trait than others. On the other hand, names should seem randomly distributed if this hypothesis is false. Either way, it would be interesting to rank traits by how much they bias towards certain names!

2023 October 23