Why We Should Name Our Models
Certain books, read at the right time, have the ability to completely change the path of your life. For me, Asimov’s Foundation was this book. His universe instilled a love of technology that has influenced my interests more strongly than any other.
Asimov’s stories are a centerpiece of science fiction, inspiring a generation of scientists and leaders. Many influential thinkers have attributed their interest in science to Asimov, including Elon Musk, Paul Krugman, and Peter Thiel.
Asimov’s stories are important because they resemble a world similar to our own. The genre is “sci-fi”, but the stories revolve around the human characters in their struggle to deal with the reality of technological forces.
Asimov popularized the term ‘robot’ in the 1930s, turning the mechanical creatures into central fixtures of each story. As I interact with GPT-3, the language model I discussed last week, I’m reminded more and more of Asimov’s universe and the robots within them.
At the time of writing, robots were an interesting plot device used to examine the shortcomings of their human creators. Today, we deal with robots in our everyday life, from our smartphones to our vehicles, and the reality is much stranger.
The Complete Robot is a series of stories that share a common theme of understanding and diagnosing robot psychology. The engineers of Asimov’s robots were careful to instill the Three Laws of Robotics in every machine:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.
These laws drive the plot of The Complete Robot. Characters apply these laws and spend their time dealing with the loopholes that arise. Interestingly, Asimov predicted something else: robots have a similar naming convention to what we use today:
- In Runaround (1941), the robot is named SPD-13, or “Speedy”
- In Reason (1941), the robot is named QT-1, or “Cutie”
- In Liar! (1941), the robot is named RB-34 or “Herbie”
In each story, the robot is caught in a situation where the laws are ambiguous:
- Runaround describes what happens when Speedy attempts to satisfy the Second Law and Third Law simultaneously.
- Reason is about Cutie’s interpretation of the First Law based on a priori reasoning.
- Liar! focuses on the interpretation of “harm” when "Herbie considers emotional harm as well as physical harm as part of the First Law.
Each situation is resolved through the careful deduction of human operators, but a clear message emerges: even with well-defined laws, there will be situations that require interpretation.
Unlike Asimov’s universe, our robots don’t have clear laws. We create our machines not through the careful wiring of a positronic brain, but through a messy training process where we feed in billions of lines of text from the internet and hope for the best. This complicates things: not only do we not have the well-structured laws of Asimov’s robots, we barely understand how our models are making decisions.
Modern neural networks display strengths, weaknesses, and quirks rivaling those of anything described in Asimov’s stories. As the techniques for training on extensive datasets become more accessible, so too will the differences in how our models behave.
Different training methods result in models with different behaviors. A neural network trained on a Common Crawl of the internet will behave very differently than one trained on a catalog of video games from the 1980s.
Like the robots in Asimov’s novels, each model we create will have a unique personality depending on how it was trained. Perhaps there is a lesson we can learn from Asimov’s characters as well: the personification of models helps make their behavior more interpretable.
Asimov’s characters took it upon themselves to name the machines based on their behaviors. Speedy, Cutie, and Herbie were the shorthand names assigned by human operators. These names allowed the operators to think of the robots not as infallible machines, but as imperfect creatures with flaws not unlike our own.
Models like BERT and ELMo have already earned names based on their design and capabilities. Until recently, these capabilities were only measurable through strict quantitative benchmarks and scientific papers. With the rise of general-purpose language models like GPT-3, the average consumer will experience qualitative differences in what a neural network can do. People already rely on language models for tasks like speech recognition, but the next few years will bring new models that make AI more conversational.
As models begin to cross the uncanny valley, human nature will compel us to give them names and personalities that fit their behavior. Instead of lengthy explanations, humans rely on names as a kind of shorthand that bundles technical, cultural, and emotional associations about a topic. Names provide an abstraction that allows us to interact with the world in a more meaningful way.
I believe that we should embrace this personification. The sooner we start anthropomorphizing AI, the sooner we’ll be prepared to think about the best way to harness these systems and all of their quirks.
It sounds intimidating to talk about a “Generative Pretrained Transformer” or “Bidirectional Encoder Representations from Transformers”, but GPT and BERT are accessible for non-technical users. This simple change allows a broader set of people to participate in the discussion around AI.
Naming models also forces us to take into account ethical considerations that may otherwise go unquestioned. Giving a machine a name helps users and policymakers alike to understand how we trained it and what we expect it to do.
The more I see how data shapes our models’ behaviors, the more I see the value in familiarizing everyone with the general trends of artificial intelligence. Names make this introduction easier by communicating the topic in a way that people are already familiar with.
As machine learning democratizes, we have a responsibility to involve non-technical people in the decision-making process for how it should be used. The same way that Asimov used his novels to introduce his readers to artificial intelligence, I hope that we can use names to introduce language models to the broader public.
Bonus: Next week, I’ll be announcing a new competition where you can submit your own prompts to the GPT-3 model. This will be a great opportunity to learn more about how the model works, improve your predictive modeling skills, and even have the chance to get feedback on your results.
If you want a chance to try it out before anyone else, reply to this email. I’ll feature the best results in GPT Stories, a newsletter written by machines.
Sunday Scaries is a newsletter that answers simple questions with surprising answers. The author of this publication is currently living from his car and traveling across the United States. You can subscribe by clicking the link below. 👇
If you enjoyed this issue of Sunday Scaries, please consider sharing it with a friend. The best way to help support this publication is to spread the word.