The danger of using artificial intelligence machines that obey our orders is that we tend to express desires very casually. The lines of code that will “revitalize” these machines will inevitably forget to take into account the nuances and do not formulate a clear warning on this subject, as a result of which AI systems will receive goals and incentives that are not consistent with our true preferences.
A classic thought experiment illustrating this problem was put in 1967 by the Oxford philosopher Nick Bostrom (Nick Bostrom). The super-intelligent robot described by him, programmed to produce office paper clips, that is, to fulfill a seemingly harmless goal, eventually turns the whole world into a gigantic paper-clip factory.
Such a scenario is often evaluated as speculative, as representing a danger that can only arise in the distant future. But deviating AI became a problem much earlier than expected.
Let's take the most disturbing example - one that has had a negative impact on billions of people. Video hosting YouTube , in an effort to maximize viewing time, uses content recommendation algorithms created on the basis of AI. Two years ago, computer scientists and users began to notice that the YouTube algorithm seems to strive to fulfill its goal, recommending more and more extreme and conspiracy theological content. One researcher said that after she watched videos of rallies held as part of the Donald Trump election campaign, YouTube offered her a video with "demagogic speeches of white racists, allegations that there was no Holocaust, and other troubling content." Entering into a rage, the algorithm moves beyond the scope of politics. “A video about vegetarianism,” said the researcher, “led me to a video about veganism. Video about jogging - to a video about running over super marathons. ” As a result, studies show that YouTube’s algorithm just to keep us browsing, contributes to the polarization and radicalization of public opinion and spreads misinformation . “If I planned to use this technology on a mass scale, I would probably try to avoid such an effect when testing it,” says Dylan Hadfield-Menell (Dylan Hadfield-Menell), AI researcher at the University of California at Berkeley.
Programmers YouTube was probably not intended to radicalize humanity. But coders cannot think of everything in the world. “The current way of creating AI puts too much stress on developers, forcing them to anticipate the consequences of the goals they introduce into their systems,” says Headfield-Menell. “And the mistakes made by many developers are things to learn from.”
The main aspect of the problem is that people often don’t know what to aim at. AI systems, because they don’t know what they really want. “Ask someone on the street:“ What do you want from your unmanned vehicle? “- and they will answer you like this:“ Collision avoidance, ”says Dorsa Sadih (Dorsa Sadigh), AI specialist Stanford University, specializing in human interaction century and robot. - But it is clear that this is not enough; people want a lot more. ” Super-safe unmanned vehicles drive too slowly and slow down so often that passengers become sick. When programmers try to list all the goals and preferences that a robomobile must take into account at the same time, the list inevitably turns out to be incomplete. According to Sadih, when she drove around San Francisco while driving a car, she was often forced to brake unmanned vehicles that stopped on the road. They, as their program requires, carefully avoid contact with moving objects, but such objects can also be plastic bags moved by the wind.
To avoid such traps and create a theoretical the basis for solving the problem of eliminating AI deviations, the researchers began to develop a completely new method of programming assistant machines. The formation of this approach was most influenced by the ideas and research by Stuart Russell , 57 - a summer computer scientist from Berkeley, who has awards for scientific work. In 60 - x and 90 - x years of the twentieth century, Russell became famous for innovative research on rationality, decision making and machine learning. He is the main author of the popular textbook Artificial Intelligence: A Modern Approach ( Artificial Intelligence: A Modern Approach ). Over the past five years, his voice has become particularly influential with regard to the problem of eliminating deviations. This discreet British man in a black suit, who knows how to speak comfortably and intelligently, is a regular in international meetings and discussion forums where AI is discussed - the risks associated with it and long-term management.
According to Russell, today's AI pursuing a particular goal, is, by and large, limited, despite all his successes in performing specific tasks - such as defeating a person in Jeopardy! and go, recognition of objects in images and words in speech and even composition of music and literature molecular weight products. Russell argues that requiring the machine to optimize the “reward function” —a careful description of some combination of goals — inevitably leads to AI rejection, since it is impossible to take into account and correctly weigh all goals, subgoals, exceptions and reservations, or even just determine which of which are correct. As independent, “autonomous” robots become more intelligent, setting goals for them will become more dangerous, because robots, fulfilling their function of reward, will be inexorable and at the same time try to prevent us from turning them off.
According to the new approach, instead of pursuing its own goals, the machine should strive to satisfy human preferences; its sole purpose should be to learn as much as possible about these preferences. Russell argues that AI systems, due to uncertainties in our preferences and the need to contact us for guidance, will remain safe for humans. In a recent book "Human Compatible"
- ( Human Compatible ) Russell sets forth his thesis in the form of three “principles of useful machines” that echo the three laws of robotics Isaac Asimov formulated in 80, but not so naive. Russell’s version is:
- The only purpose of the machine is maximize human preferences.
The ultimate source of information about human preferences is human behavior.
In recent years, Russell and his team at Berkeley, as well as like-minded groups at Stanford, the University of Texas (University of Texas) and elsewhere, have been developing innovative ways to give AI systems topics is the key to recognizing our preferences, even if the latter are not explicitly described.
These research teams teach robots to figure out the preferences of a person who does not formulate them and may not even be sure what he wants. These robots gain the ability to recognize our desires by observing their vague demonstration, and even invent new ways of behavior that help eliminate the ambiguity of human preferences. (For example, when meeting a four-way stop sign, unmanned vehicles have developed the habit of stepping back a bit, suggesting that people-drivers move forward). The results achieved indicate that AI can surprisingly accurately determine our moods and preferences even when you have to find them out on the fly.
“These are just the first attempts to formalize the problem,” Sadih emphasizes. “People just recently realized that human-robot interaction requires more attention.”
Whether the current start-up efforts and the three principles of Russell’s useful machines really portend AI a brighter future is unclear. This approach connects the successful development of robotics with its ability to understand what is real, in fact, people prefer, and for several years robots have been trying to solve this problem. According to Paul Christiano, a researcher for correcting deviations from OpenAI , Russell and his team at least significantly clarified the problem and helped “Determine the contours of the desired behavior - of what we are striving for.”
How to understand a person
Russell's main thesis came to him as an insight, as a sublime act of reason. It was in 2003, during a creative vacation, when a scientist from Berkeley, being in Paris, was going to a rehearsal of the choir where enrolled as a tenor. “Because I have not very good ear for music,” he recalled recently, “I have always carefully studied music, getting to the metro rehearsal.” The choral arrangement of Agnus Dei by Samuel Barber 1951 filled his headphones as he raced in a train under the City of Light. “It was such wonderful music! Said Russell. “And then it dawned on me: what matters, and therefore what should be the goal of AI, is in a sense the aggregate timbre of human experience.”
Robots, the scientist realized, should not pursue such goals as maximizing the viewing time or the number of clips; they should simply strive to improve our lives. There was only one question left: “If the duty of machines is to try to optimize the aggregate timbre of human experience, how can they find out what it is?”
Russell’s new approach is rooted in much more distant past than 1998 year. In 57 - the years of the twentieth century, while still a London schoolboy, he studied artificial intelligence, programming tic-tac-toe and chess games on the computer of a nearby college. Later, after moving to the San Francisco Bay area, which was favorable for AI research, he began to think about rational decision making. Soon, the scientist came to the conclusion that such decision-making is impossible. People are not even remotely rational, because in a computational sense this is not feasible: we cannot calculate which action at any given moment in time trillions of actions will later lead to the best result in our distant future; and AI can't. Russell suggested that our decision-making process is hierarchical - we achieve very imperfect rationality, pursuing vague long-term goals through medium-term, while paying the greatest attention to our immediate circumstances. Robotic agents, he decided, should do something similar, or at least understand our working principles.
Russell’s Parisian epiphany took place at a turning point for artificial intelligence research. A few months earlier, an artificial neural network, using the well-known approach called “reinforcement learning”, shocked scientists by quickly learning how to play and win video games from scratch Atari . Along the way, she even came up with new tricks. When learning with reinforcement, AI learns to optimize its reward function, for example, its score in the game; when he tries various behaviors, those that increase the reward function are fixed and the likelihood of their use in the future grows.
More in 1967 in the year Russell developed the opposite approach, and then worked on improving it with his staff member Andrew Eun (Andrew Ng). A system using the “opposite of reinforcement learning” approach does not seek to optimize the encoded reward function, as in reinforced learning; instead, she seeks to understand what reward function a person optimizes. While when learning with reinforcement, the system determines the actions that lead best to the goal, with the opposite approach, when it is given a certain set of actions, it finds out the main goal.
Later A few months after his insight, inspired by Agnus Dei, at an AI management meeting at the German Foreign Office, Russell talked about the opposite of reinforcing training with Nick Bostrom, who became famous for his paperclip example. “It was there that the two sides of the issue came together,” Russell said. In the subway, he realized that machines should strive to optimize the aggregate timbre of human experience. And now he realized that if it’s not clear to them how to do this, if computers don’t know what people prefer, “they can take a more opposite approach to learning with reinforcement to find out more.”
In the standard application of this approach, the machine seeks to find out the reward function that a person pursues. But in real life, we must be prepared to actively help her study us. Returning to Berkeley after a creative break, Russell began working with his staff to create a new, “cooperative” version of the approach, the opposite of reinforcement learning. With a cooperative approach, the robot and the person can interact, figuring out the true preferences of the person during various "auxiliary games". The abstract scenarios of these games reflect real situations that require acting in conditions of a lack of knowledge.
Among the games developed by the researchers there is one that is known as the “off-switch game” ) It is dedicated to one of the most obvious ways in which an autonomous robot can deviate from our true preferences: by disabling its circuit breaker. Alan Turing in 90 year (a year after he published an innovative article about AI) in a radio lecture The BBC suggested that it is possible to “keep the machines in subjection, for example, turning off their power at strategic moments”. Now such a solution to the problem is considered too simple. What prevents the AI from breaking its own switch or, in a more general sense, ignoring commands that require it to stop maximizing the reward function? The shutdown problem, Russell wrote in Human Compatible, is the "core of the smart systems management problem." If we cannot turn off the machine because it opposes it, we have serious problems. If we can, then we will be able to control it in other ways. ”
The key to solving this problem may be the uncertainty regarding our preferences. This was demonstrated by a formal model of the problem - a game with a switch, the participants of which are a man named Harriet and a robot named Robbie. Robbie decides whether to act on behalf of Harriet - say, to book her a good, but expensive hotel room - but she does not know what her preferences are. According to Robbie, his win (approval from Harriet) is in the range from - 40 to + 40, that is, the average is + (Robbie thinks Harriet must like it exquisite room, but he is not sure about it). If you do nothing, then the gain is 0. But there is a third option: Robbie can ask Harriet if she wants him to continue acting or prefers to turn it off, that is, to suspend the issue of booking a room. If she allows the robot to continue, the average expected gain will be more + Therefore, Robbie decides to consult with Harriet and, if she so wishes, will allow her to turn it off.
Russell and his staff have proved that, in general, the result will be like this: Robbie will prefer let Harriet make the decision herself, if not It’s clear that Harriet herself acts in such cases. “It turns out,” Russell emphasizes in “Human Compatible,” “that uncertainty about the goal is necessary to ensure that we can turn off the machine, even when the machine is smarter than us.”“How do we capture changes in our mood? To understand this, the poor robot will have to sweat. ”
These and other knowledge-deficient scenarios were developed as abstract games, but in the laboratory of Scott Nykum at the University of Texas at Austin preference determination algorithms are tested on real robots. Watching during a table setting demonstration how a person puts a fork to the left of the plate, Gemini, the two-armed robot of the laboratory, at first cannot figure out whether the forks should always be to the left of the plates and whether each fork should always be in some strictly defined place . New algorithms allow the robot to master this pattern without a large number of demonstrations. The main task of Nykum is to force AI systems to quantify their uncertainty regarding human preferences, so that they can evaluate whether they have enough knowledge for safe actions. “We,” the researcher notes, “directly, without any explanation, discuss the distribution of goals in a person’s head that may turn out to be correct, as well as the risks associated with this distribution.”
Recently, Nykum and its employees have found an effective algorithm that allows robots to acquire much higher task performance skills than human demonstrators. What can a computing car require? Just learn how to maneuver as human drivers showed him. However, Nykum and his colleagues found that with the help of demonstrations ranked according to how well the person-demonstrator did the job, it is possible to improve and significantly accelerate the training of the robot. “A robot agent,” says Nykum, “after reading this rating, you might think like this:“ If it is a rating, what does it explain? ”What happens more often and less often with improved demonstrations?” The latest version of this learning algorithm, called Bayesian T-REX (from the expression “trajectory-ranked reward extrapolation” - “extrapolation of rewards ranked along a trajectory”), reveals patterns in ranked demos that reveal possible reward functions that people can optimize for. In addition, the algorithm measures the relative probability of various reward functions. Using Bayesian T-REX, the robot, according to Nykum, is able to effectively determine the most likely table setting rules or the purpose of the game Atari "even in the case when he did not see the perfect demonstration. "
Our imperfect choice
Russell’s ideas “take over the minds of the AI researchers community”, states Yoshua Benzhio (Yoshua Bengio), scientific director of the Montreal Institute Mila - one of the leading in the field of AI research. The scientist believes that Russell’s approach, in which AI systems seek to reduce their own uncertainty about human preferences, can be implemented using deep learning - a powerful method that has provided recent revolutionary advances in AI due to the fact that the system sifts data through artificial layers in search of patterns neural network. “For this implementation, of course, more research is needed,” he notes.
Russell sees two main problems. “First, our behavior is so far from rational that figuring out our true basic preferences can be extremely difficult,” he says. AI systems will have to talk about a hierarchy of long-term, medium-term and short-term goals - about the myriad of preferences and obligations with which we are entangled. To help us (and avoid gross mistakes), robots will have to understand the foggy networks of our subconscious beliefs and fuzzy desires.
Secondly, human preferences are changing. Our mind evolves throughout our lives, but, in addition, it can change instantly, depending on our mood or changed circumstances, and the robot will be forced to take all this into account.
In addition, our actions do not always correspond to our ideals. People are capable of simultaneously adhering to conflicting values. The implementation of which should be optimized for the robot? So that it doesn’t happen that it serves the worst of our impulses (or, worse, amplifies these impulses, like the aforementioned YouTube algorithm , thereby facilitating their satisfaction), the robot does not interfere with figuring out what Russell calls our meta-preferences - "preferences regarding the acceptability or unacceptability of certain processes of changing preferences." How do we capture changes in our mood? To understand this, the poor robot will have to sweat.
Like robots, we strive to find out our preferences (what they are now and how we want to see them in the future), as well as the way eliminate ambiguities and contradictions. Similar to the best possible AI, we seek in addition - at least some of us, at some points in time - to understand the “form of good”, as the philosopher Plato called the subject of knowledge. Like us, AI systems can, trying to answer questions, freeze forever - or in the off state, wait for results if insecurity made it impossible to act as an assistant.
“I don’t I hope, ”says Cristiano,“ that in the near future we will be able to firmly establish what is good, or to get perfect answers to any empirical questions that we face. ” However, I hope that the AI systems we create will be able to answer these questions in the same way as people, and iteratively improve the answers that people have already received - at least sometimes. ”
But in the short list compiled by Russell, it is worth including another, third in a row, important problem: what about the preferences of bad people? What will keep the robot from realizing the vile goals of its evil owner? The AI system strives to circumvent prohibitions in the same way that a rich man finds loopholes in tax legislation, so simply prohibiting her from committing crimes will probably be useless.
You can thicken colors: what if are we all kind of bad? That recommendation algorithm, which is struggling to fix YouTube , is built, anyway, taking into account the ubiquitous human impulses.
However, Russell is optimistic. Although it requires additional algorithms and further development of game theory, his intuition tells him that developers will be able to cope with harmful preferences and that the same approach that is designed for robots can be useful even "in raising children, school and university education, and so on. ". In other words, we could, by teaching robots good, find a way to educate ourselves. “It seems to me,” Russell adds, “that we are on the right track.”