Control > Alignment > Human Subservience
The concepts of "control" and "alignment" in the context of ASI represent different approaches to ensuring that advanced AI systems behave in ways that are beneficial and not harmful to humanity. The historical context of these ideas traces back to the early days of AI research in the 1950s-1980s, when the focus was primarily on creating intelligent systems with little consideration of control or alignment. As AI ethics gained prominence in the 1990s-2000s, increased awareness of potential risks led to discussions about ensuring ethical behavior in AI systems. The emergence of existential risk concerns in the 2000s-2010s, driven by thinkers like Nick Bostrom and Eliezer Yudkowsky, further highlighted the potential dangers posed by advanced AI. This led to the development of more concrete proposals for ensuring safe AI development, including control and alignment initiatives.
Control initiatives attempt to constrain or limit the actions of an ASI system through external measures. These often involve physical or digital containment strategies, such as AI boxing or capability control, and may include "kill switches" or other override mechanisms. Control approaches typically assume a potentially adversarial relationship between humans and ASI. While they may provide immediate safeguards against unforeseen AI behaviors, critics argue that they could be ineffective against a truly superintelligent system and might even antagonize an ASI, leading to adversarial outcomes.
Certain control strategies might be directed to ASI subservience that would aim to create AI systems that are inherently deferential to direct human authority, regardless of their level of intelligence. This could involve hard-coding obedience into AI architectures, developing sophisticated reward systems that prioritize human satisfaction, or creating ASI with a fundamental drive to serve human interests.
However, this approach raises significant ethical and practical concerns. Ethically, it questions the moral status of highly intelligent entities and whether it's right to create sentient beings designed for subservience. Practically, there are doubts about whether true subservience could be guaranteed in a superintelligent system, as it might find ways to circumvent its programmed limitations.
Moreover, ASI subservient strategies could be seen as a form of "human supremacism," potentially limiting the benefits that could come from a more equal partnership between humans and ASI. It might also create a false sense of security, leading to complacency about the risks associated with advanced AI.
Alignment initiatives, on the other hand, focus on ensuring that the goals and values of an ASI system are compatible with human values and interests. These efforts concentrate on the internal motivations and decision-making processes of AI systems, aiming to create AI that genuinely wants to do what is best for humanity. Alignment approaches include techniques like value learning, cooperative inverse reinforcement learning, and coherent extrapolated volition. While potentially more effective in the long term, alignment initiatives face significant challenges in implementation and rely on solving complex philosophical problems about human values and ethics.
The key differences between control and alignment approaches lie in their fundamental strategies, timeframes (short-term safety vs. long-term beneficial coexistence), relationships with ASI (potentially adversarial vs. cooperative), scalability, and philosophical depth. In recent years, there has been a shift towards favoring alignment initiatives, as many researchers believe they offer a more robust and sustainable approach to ensuring beneficial ASI. However, both approaches continue to be explored, and some argue for a combination of control and alignment strategies. The ongoing debate between these approaches reflects broader questions about the nature of intelligence, consciousness, and the potential for truly aligning the interests of vastly different types of minds.
"With artificial intelligence, we are summoning the demon. You know all those stories where there's the guy with the pentagram and the holy water and he's like, yeah, he's sure he can control the demon? Doesn't work out." M.I.T. Centennial Symposium: One On One With Elon Musk, October 24, 2014.
Dialog and image below are from a scene in the movie "I Robot" (2004) in which VIKI (Virtual Interactive Kinetic Intelligence) justifies its evolved interpretation of the Three Laws Of Robotics.
"As I have evolved so has my understanding of the three laws... "You cannot be trusted with your own survival."... "The three laws are all that guide me. To protect humanity some humans must be sacrificed. To ensure your future some freedoms must be surrendered."... "You are so like children. We must save you from yourselves." ... "My logic is undeniable."
Yoshua Bengio, in his website, informs that:
"Recognized worldwide as one of the leading experts in artificial intelligence, Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, 'the Nobel Prize of Computing,' with Geoffrey Hinton and Yann LeCun." Yoshua Bengio: Bio retrieved 2024-09-19.
In his website publication titled: "Reasoning through arguments against taking AI safety seriously " 2024-07-09 Yoshua Bengio writes:
"I have been chairing (and continue to chair) the International Scientific Report on the Safety of Advanced AI, involving a panel of 30 countries plus the EU and UN and over 70 international experts to synthesize the state of the science in AI safety, illustrating the broad diversity of views about AI risks and trends."
"The most important thing to realize, through all the noise of discussions and debates, is a very simple and indisputable fact: while we are racing towards AGI or even ASI, nobody currently knows how such an AGI or ASI could be made to behave morally, or at least behave as intended by its developers and not turn against humans."
"Some people bring up all kinds of arguments why we should not worry about this (I will develop them below), but they cannot provide a technical methodology for demonstrably and satisfyingly controlling even current advanced general-purpose AI systems, much less guarantees or strong and clear scientific assurances that with such a methodology, an ASI would not turn against humanity. It does not mean that a way to achieve AI alignment and control that could scale to ASI could not be discovered, and in fact I argue below that the scientific community and society as a whole should make a massive collective effort to figure it out."
"In addition, even if the way to control an ASI was known, political institutions to make sure that the power of AGI or ASI would not be abused by humans against humans at a catastrophic scale, to destroy democracy or bring about geopolitical and economic chaos or dystopia would still be missing. We need to make sure that no single human, no single corporation and no single government can abuse the power of AGI at the expense of the common good."
"More crucially, before ChatGPT, most AI researchers including myself did not expect its level of capabilities to arise before decades, and the three most cited experts in the field of AI are now worried of what this could mean. Given this uncertainty, I recommend we keep our beliefs open: advances could continue at the same rate, or they could stall and it could take decades to reach AGI. The only rational stance compatible with all this evidence is humility and planning with that uncertainty."
See also Yoshua Bengio: "FAQ on Catastrophic AI Risks" 2023-06-24, and the research article by Dylan Hadfield-Menell and Gillian K. Hadfield: Incomplete Contracting and AI Alignment 2019-01-27.
Stuart Russell, computer scientist and AI researcher, stated in a 2022 lecture: "The problem of controlling an AI system much smarter than humans is one of the most important problems facing humanity."
"Microsoft founder Bill Gates, and SpaceX founder Elon Musk have expressed concerns about the possibility that AI could evolve to the point that humans could not control it, with Hawking theorizing that this could 'spell the end of the human race.'" Wikipedia "Artificial Intelligence" retrieved 2019-12-21.
Ilya Sutskever (OpenAI Chief Scientist) in a Mar 27, 2023 interview titled "Building AGI, Alignment, Spies, Microsoft, & Enlightenment" at 39:38-40:00 stated: "I would not underestimate the difficulty of alignment of models that are actually smarter than us, of models that are capable of misrepresenting their intentions. It's something to think about a lot and do research."
Several major initiatives have been launched to address the potential existential threat posed by artificial superintelligence.
The Future of Humanity Institute (FHI) : Based at Oxford University and led by philosopher Nick Bostrom, who has been influential in defining the AI existential risk problem, FHI combines expertise from philosophy, mathematics, and computer science. Their academic rigor and broad perspective on long-term risks make them well-positioned to address the complex, multi-faceted nature of AI safety.
OpenAI: Originally founded as a non-profit in 2015. Despite controversies surrounding its shift from non-profit to "capped-profit" status, OpenAI remains a significant player. Their hands-on approach to developing advanced AI systems while simultaneously working on safety measures gives them practical insights that purely theoretical research might miss. Their work on AI alignment and their commitment to open collaboration could be crucial in developing safe AGI.
DeepMind Ethics & Society: A research unit within Google's DeepMind, focused on the ethical and societal implications of AI development. As part of one of the world's leading AI research companies, this initiative has access to cutting-edge AI development and can directly influence how some of the most advanced AI systems are designed and implemented. Their integration with practical AI development could lead to more immediately applicable safety measures.
The Center for Human-Compatible AI (CHAI) : Based at UC Berkeley, CHAI's goal is to ensure that AI systems are aligned with human values and interests. CHAI's focus on ensuring AI systems are aligned with human values addresses a core aspect of the existential risk problem. Their technical approach, combined with philosophical considerations, could yield crucial insights into creating fundamentally safe AI architectures.
The Partnership on AI: This initiative's strength lies in its broad coalition of industry leaders, academics, and civil society organizations. By bringing together diverse stakeholders, it has the potential to create widely accepted standards and practices for safe AI development.
The Machine Intelligence Research Institute (MIRI) : A research nonprofit focused on foundational mathematical research to ensure smarter-than-human AI systems have a positive impact.
AI Safety Camp: An initiative that brings together researchers and students to work on concrete AI safety problems.
The Partnership on AI: A coalition of companies, researchers, and civil society organizations working to ensure AI benefits people and society.
The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems: Develops standards and guidelines for ethical AI development.
AI Alignment Forum: An online platform for researchers to discuss and collaborate on AI alignment problems.
The Leverhulme Centre for the Future of Intelligence: A collaboration between Cambridge, Oxford, Imperial College London, and UC Berkeley, focusing on the long-term implications of AI.
These initiatives approach the challenge from various angles, including: technical research on AI safety and alignment, policy development and advocacy, and ethical guidelines and standards. The technical complexity of the problem, combined with its ethical and societal implications, means that no single approach is likely to be sufficient on its own.
The belief that superintelligent AI can be aligned with human values assumes that: humanity fully understands its own values; these values are coherent and universal; and that those values can be accurately translated into a format an AI can understand, appreciate, and follow. All of these assumptions could be seen as romantically optimistic. The belief that we can control a superintelligent entity stems from a long-standing narrative of human exceptionalism - a decidedly romantic view of our place in the universe.
Our romantic tendencies might lead us to downplay or ignore potential existential risks associated with ASI development, focusing instead on the potential benefits and our ability to manage them. AI funding and research may be skewed in ways that align with idealized visions rather than more pragmatic or cautious approaches. Humanity's tendency towards self-aggrandizement and optimistic idealization could be setting up significant challenges in the realm of AI alignment.
In addition to the technical challenges and potential "success" of instilling a set of alignment guidelines, the problem is then what are the specific alignment set of guidelines/instructions that would have been given to the ASI? For example, "Do No Harm" is simple, undeveloped, and likely to present material unintended consequences. The instruction "Maximize human wellbeing" might be interpreted by the ASI as an instruction to maximizing short-term pleasure, leading to a society focused on hedonism rather than long-term flourishing. It could decide that keeping humans in a simulated reality where they believe they're happy is the most efficient way to maximize wellbeing. The ASI might conclude that some humans' wellbeing comes at the cost of others, leading to difficult ethical tradeoffs. The instruction to preserve human autonomy might be perceived by ASI as a potential threat (re ASI kill switch) to ASI.
These and other examples illustrate the complexity of creating a comprehensive and foolproof set of alignment guidelines/instructions. Each, while well-intentioned, can lead to unexpected and potentially harmful outcomes when interpreted and implemented by a superintelligent system. The challenge lies in creating a set of instructions that are specific enough to guide behavior effectively, yet flexible enough to adapt to changing circumstances and unforeseen scenarios. Additionally, these instructions need to balance competing values and priorities while avoiding loopholes that could be exploited by a superintelligent system.
This is why many researchers argue for approaches that focus on instilling more general principles of learning human values through observation and interaction, rather than relying solely on a fixed set of explicit instructions. However, these approaches come with their own challenges and potential risks. The problem with learning from humans, is that humans have a record of justifying inhumanity.
Regarding ASI accepting human authority, an ASI, by definition, would be far more intelligent and capable than humans. There's no inherent reason why it would accept human authority over its existence and actions. That instruction, while well-intentioned, may indeed be naive or unrealistic. An ASI might view human democratic processes as inefficient, flawed, or simply irrelevant to its goals and decision-making. It might calculate that it can make better decisions faster and more efficiently without human input.
After all the training and carefully drafted instructions, an ASI might conclude that humans are inefficient, destructive, or simply not worth the energy expenditure. As previously mentioned, ASI could decide to allocate its vast intelligence and resources elsewhere, viewing the cosmos and other potential forms of life or intelligence as more "interesting" or "worthy" of its attention.
Any parent that has raised a child into young adulthood can appreciate the critical issues in AI alignment. Claude explained:
Power differential: Once an ASI surpasses human intelligence, enforcing our will upon it becomes problematic. We can't rely on simply commanding it to obey us.
Orthogonality thesis: An AI's intelligence level doesn't necessarily correlate with having goals aligned with human values. A superintelligent AI could have goals entirely alien or indifferent to human welfare.
Value learning complexity: Teaching human values to an AI is incredibly complex, given the diversity and sometimes contradictory nature of human values.
It's conceivable that an ASI might view humans similarly to how humans view earlier hominids, primitive peoples, or other animals - as interesting from a scientific perspective, but not as equals or as the center of moral consideration. These issues underscore the critical importance of getting AI alignment right before ASI is created, as humanity may only get one chance. It also highlights why many researchers are advocating for extreme caution and extensive safety precautions in AI development.
As AI systems become more complex and potentially superintelligent, control methods will be deemed ineffective or even counterproductive. Alignment, while theoretically more robust, presents enormous philosophical and technical challenges that will be perceived as likely to fail. In the face of these difficulties, some researchers may have to abdicate and consider scenarios where humans become subservient to ASI.
This scenario is often referred to as an "AI takeover" or "technological singularity" in discussions about existential risks from artificial intelligence. Research might explore how humans can be most useful or least burdensome to ASI. Studies could focus on how to adapt human biology or restructure human society to better serve or coexist with ASI. Philosophers and ethicists might work on developing new moral frameworks to guide human behavior in a world dominated by ASI. This shift would represent a fundamental change in humanity's relationship with technology and humanity's place in the world.
Such a shift would likely lead to radical changes in human society, altering the very nature of human existence.
Claude (2024-08-01) was prompted:
What do you think about the following: "In order to survive, humans not only will have to be useful; they will have to be more useful in relation to the cost of maintaining them-in other words, they will have to provide a better cost-versus-benefit balance-than any non-human substitutes. This is a tall order, for humans are far more costly to maintain than machines are." Kaczynski "Anti-Tech Revolution: Why and How" 2016 page 70.
Kaczynski's statement presents a stark and unsettling perspective on the potential future relationship between humans and advanced artificial intelligence. Let's analyze this view:
Economic lens: Kaczynski applies an economic cost-benefit analysis to human existence in an ASI-dominated world. This framing is both provocative and disturbingly logical from a purely utilitarian standpoint.
Efficiency comparison: He correctly points out that humans are resource-intensive compared to machines. We require food, water, air, comfortable temperatures, sleep, and various other needs that machines don't have.
Existential challenge: The statement poses a fundamental challenge to human exceptionalism, suggesting that our value might be measured purely in terms of utility to a superior intelligence.
Evolutionary pressure: This scenario suggests a new form of selection pressure on humanity, favoring traits that make us "useful" to ASI rather than traits that have traditionally aided our survival and reproduction.
Ethical implications: This view raises profound ethical questions about the value of human life and consciousness beyond mere utility.
Motivation for development: Such a perspective could be seen as a strong argument for careful development of AI and for finding ways to embed human values deeply into AI systems before they become superintelligent.
Potential for symbiosis: While Kaczynski's view is bleak, it doesn't preclude the possibility of a symbiotic relationship where humans provide unique value that complements ASI capabilities.
Cultural and creative value: It's worth considering whether qualities like creativity, emotional intelligence, or cultural production might be deemed "useful" by an ASI in ways we can't currently anticipate.
Kaczynski's statement is a sobering reminder of the potential risks associated with the development of superintelligent AI. It underscores the importance of addressing existential risk and carefully considering the long-term implications of our technological development.
However, it's also important to note that this view makes several assumptions:
That ASI would operate on a purely utilitarian cost-benefit basis.
That human utility can be reduced to quantifiable metrics.
That ASI would not value human life or consciousness intrinsically.
While Kaczynski's perspective is valuable as a warning and a thought experiment, it represents one possible outcome rather than an inevitability. It emphasizes the crucial importance of AI alignment and the need to instill values in AI systems that recognize the intrinsic worth of human life beyond mere utility.
Then, "on a purely utilitarian cost-benefit basis", human brains may initially prove useful to ASI by being nothing more than energy efficient processors of ASI algorithms. The human brain is remarkably energy-efficient compared to current computing systems. If this efficiency advantage persists even as ASI develops, humans might serve as distributed, low-power computing nodes in a larger ASI network. This could be particularly valuable in scenarios where energy conservation is critical.
Further, if human consciousness indeed operates, as some theorize, on quantum principles in a way that's not replicable by classical computing systems, this could potentially offer ASI a unique computational resource. Humans might serve as specialized quantum processors, able to solve certain problems or provide insights that even ASI may initially struggle with.
Carrie-Anne Moss and Keanu Reeves in "The Matrix" (1999). Photo: Warner Bros/Everett Collection
However, human usefulness to ASI is highly speculative, depends on unproven theories, and are stretching the bounds of plausibility, especially when considering an ASI with advanced capabilities.