Can the origins of human morality help with the AI alignment problem

A newborn baby, hours old, with no experience of society, and exposure to only one person, the mother, already begins to develop its sense of right and wrong. Children aren’t born with morals, they internalize them through delicate and nuanced interactions with their parents and the world around them. We know we’re not born with morality because right and wrong changes depending on what era and society you grew up in. Burning witches used to be very right. Another sign is how predictably certain experiences lead to anti-social behaviour.

The origins of morality is worth understanding as we experience the surprising leap in AI capabilities, even to people in the industry, to make sure we have a plan to align it with us when it becomes independent from us.

The pressing question in the artificial intelligence field right now is can we imbue an AI with our morals, so that it understands humanity’s best interests and acts accordingly. The fate of humanity seems to hinge on this. Should we fail, humans could become as inconsequential to AI as ants are to us. While we harbor no ill will towards these creatures, our concern does not deter us from decimating millions of them to construct a luxurious condominium.

“The Al does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” – Eliezer Yudkowsky, AI Researcher

We don’t have a super intelligence to align yet, but we will soon. Based on current progress, it could be only 5 years away. This new sense of urgency has created two camps of people: 1) The “We’ll figure it out when we get there” camp, and 2) The “We won’t have time when it happens so we better figure it out now” camp.

Camp 1's laid-back approach is somewhat understandable, if not cavalier. Our sense of right and wrong develops so early in our childhoods that it feels as easy as walking. No one remembers how hard that was to learn but that doesn’t mean it was easy. And it was even harder to teach robots to walk. Roboticists have been working since the 1970s and only now are we getting somewhere. As it turns out, many aspects of human behavior are far more complex than we initially realize, and it's both logical and prudent to assume that morality falls into this category.

We might look to the origins of human morality as a starting point for imagining how to teach an AI. This post explains how human morality develops in childhood and why it’s unlikely to be a reproducible with AI.

Origins of human morality

Let’s return to that child in its mother’s arms. In its early, unsocialized state it’s effectively an animal and behaves like one too. It’s possessed by urges to hit, grab, scream, and bite – quite hard as any breast-feeding mom will tell you.

These animalistic urges scare the child because it believes the world is full of shrieking and biting creatures just like itself; we all project our internal fears onto the external world, even from an early age.

A good mother will not punish the child for biting, and whenever the child shows love, she reciprocates it. The child starts to associate their loving actions with good feelings and the instinctual urges with bad ones.

Of course, the child cannot control its instinctual urges well and still acts badly. Over time it gets better and better with the mother’s calm reassurance providing the motivation. As their sense of control, the child begins to experience a new feeling for the first time: guilt — i.e. the feeling of doing the bad thing when you know the good thing. This guilt is the seed of our internal moral compass.

We can see how the development of human morality is a complex, nuanced process that is deeply rooted in instincts and emotions, things machines don’t have.

(This mechanism was discovered by Donald Winnicott, one of the giants in the field of psychology who dedicated his life to understanding childhood development.)

Teaching morality to AI

It’s hard to imagine teaching morality to AI with this mechanism. Machines don’t have instincts or feel love for anyone or anything. They also don’t grow up or get nurtured except in a metaphorical way by teams of engineers.

Our current way of training AI's about right and wrong is RLHF (Reinforced Learning with Human Feedback). We give the AI a task, see how it performs, and tell it if they did it right or wrong — a.k.a. we discipline it.

Discipline is the application of external rules, as opposed to morality which is an internal guide. Discipline only works under supervision too, and where there’s a power dynamic: strict parents, school, military, etc. This won’t work with a super intelligent AI since we won’t have power over it for long, and when it grows up and leaves the house it’ll no longer need to obey us. And without a moral compass, it will do whatever it wants whether it’s good for humans or not.

There are currently no proposed methods for AI alignment that would give us any confidence an AI was truly aligned, and not just telling us what we wanted to hear.

I wrote this post because everything I’ve read about training AI to align with human values has been akin to discipline, not developing morality, which is fine until we develop a system that can think for itself. I feel less confident we translate the intricate process of moral development in humans to a machine that does not possess the same qualities and experiences. Arguing that we can simply address the issue when it arises leans an existential risk on hopes and prayers.

We used the brain’s structure as inspiration for the current AI breakthroughs. Perhaps we can draw inspiration from the field of psychology for the insights on the alignment problem.


🦾This is the first of many posts I hope to write to start collecting my thoughts on AI and what it means to product design and our larger world. My thinking is early and my ideas will surely change over time.


Related and worth your time

The Bull and the Bear. Back-to-back these are a fascinating combination of interviews.

 

🥑 Subscribe to all my posts here.

PersonalMark Rabo