AI and the paperclip problem

Philosophers have speculated that an AI tasked with a task such as creating paperclips might cause an apocalypse by learning to divert ever-increasing resources to the task, and then learning how to resist our attempts to turn it off. But this column argues that, to do this, the paperclip-making AI would need to create another AI that could acquire power both over humans and over itself, and so it would self-regulate to prevent this outcome. Humans who create AIs with the goal of acquiring power may be a greater existential threat.

Professor of Strategic Management and Jeffrey S. Skoll Chair of Technical Innovation and Entrepreneurship, Rotman School of Management University Of Toronto

The notion that artificial intelligence (AI) may lead the world into a paperclip apocalypse has received a surprising amount of attention. It motivated Stephen Hawking and Elon Musk to express concern about the existential threat of AI. It has even led to a popular iPhone game explaining the concept.

The concern isn’t about paperclips per se. Instead it is that, at some point, switching on an AI may lead to destruction of everything, and that this destruction would both be easy and arise from a trivial or innocuous initial intent.

The underlying ideas behind the notion that we could lose control over an AI are profoundly economic. But, to date, economists have not paid much attention to them. Instead, their focus has been on the more mundane, recent improvements in machine learning (Agrawal et.al. 2018). Taking a more future-bound perspective, my research (Gans 2017) shows that for a paperclip apocalypse to occur, we must make important underlying assumptions. This gives me reason to believe that it’s less likely than non-economists believe that the world will end this way.

What is the paperclip apocalypse?

The notion arises from a thought experiment by Nick Bostrom (2014), a philosopher at the University of Oxford. Bostrom was examining the ‘control problem’: how can humans control a super-intelligent AI even when the AI is orders of magnitude smarter. Bostrom’s thought experiment goes like this: suppose that someone programs and switches on an AI that has the goal of producing paperclips. The AI is given the ability to learn, so that it can invent ways to achieve its goal better. As the AI is super-intelligent, if there is a way of turning something into paperclips, it will find it. It will want to secure resources for that purpose. The AI is single-minded and more ingenious than any person, so it will appropriate resources from all other activities. Soon, the world will be inundated with paperclips.

It gets worse. We might want to stop this AI. But it is single-minded and would realise that this would subvert its goal. Consequently, the AI would become focussed on its own survival. It is fighting humans for resources, but now it will want to fight humans because they are a threat (think The Terminator).

This AI is much smarter than us, so it is likely to win that battle. We have a situation in which an engineer has switched on an AI for a simple task but, because the AI expanded its capabilities through its capacity for self-improvement, it has innovated to better produce paperclips, and developed power to appropriate the resources it needs, and ultimately to preserve its own existence.

Bostrom argues that it would be difficult to control a super-intelligent AI – in essence, better intelligence beats weaker intelligence. Tweaks to the AI’s motivation may not help. For instance, you might ask the AI to produce only a set number of paperclips, but the AI may become concerned we might use them up, and still attempt to eliminate threats. It is hard to program clear preferences, as economists well know.

The conclusion is that we exist on a knife-edge. Turning on such an AI might be the last thing we do.

A jungle model

The usual models of economics are not suited to understand this situation. Put simply, they presume everyone finds it advantageous to participate in a market.

In contrast, a super-intelligent AI can exert power to get what it wants. That power might be with violence, or some form of ability to persuade. If you think that isn’t plausible, you haven’t thought enough about what it means to be really smart.

Fortunately, Piccione and Rubinstein (2007) have developed a ‘jungle model’. In contrast to standard general equilibrium models, the jungle model endows each agent with power. If someone has greater power, they can simply take stuff from those with less power. The resulting equilibrium has some nice properties, including that it exists, and that it is also Pareto efficient.

This is useful to understand a super-intelligent AI, because if an AI wants to appropriate resources, it needs to find a way to have power over us. If it does not have power, we can control it. If it does, it can control us.

As this is economics, nothing comes for free, even when you are an AI. To become better at turning resources into paperclips, the AI needs to gather and spend resources. To develop power, it also needs to gather and spend resources. We clearly don’t mind if the AI is more capable of doing its job, but the latter capability leads to a paperclip apocalypse.

AI self-improvement

If an AI can simply acquire these capabilities, then we have a problem. Computer scientists, however, believe that self-improvement will be recursive. In effect, to improve, and AI has to rewrite its code to become a new AI. That AI retains its single-minded goal but it will also need, to work efficiently, sub-goals. If the sub-goal is finding better ways to make paperclips, that is one matter. If, on the other hand, the goal is to acquire power, that is another.

The insight from economics is that while it may be hard, or even impossible, for a human to control a super-intelligent AI, it is equally hard for a super-intelligent AI to control another AI. Our modest super-intelligent paperclip maximiser, by switching on an AI devoted to obtaining power, unleashes a beast that will have power over it. Our control problem is the AI’s control problem too. If the AI is seeking power to protect itself from humans, doing this by creating a super-intelligent AI with more power than its parent would surely seem too risky.

This has an upside for humans. If we were to create super-intelligence based on a regular goal, that super-intelligence would understand the control problem and its risks. Thus, it would not activate any changes that would result in power being released. We would not need to worry that it would gather power, because the AI would self-regulate to prevent that outcome.

This optimistic scenario is based on the assumption that AIs would need to re-write themselves as a necessary step for self-improvement. If they can ‘just do it’, or solve the control problem with perfect confidence, then Bostrom’s apocalypse might be the pessimistic outcome. We need much more work to ensure we are safe from a super-intelligent AI, but this argument suggests that the greater risk is an old-fashioned one: a human creates an AI with a goal if acquiring power, and it succeeds. We are certainly not out of the woods yet.

Source : VOXeu