Sinéad Bovell: Did the Godfather of AI Just Invent a Mathematical Solution to the AI Control Problem?
Yoshua Bengio's Next Frontier
https://sineadbovell.substack.com/p/did-the-godfather-of-ai-just-invent (video at the link)

Every so often a story goes viral about an AI system displaying extremely concerning tendencies. From
blackmail and deception if threatened to be shutdown, to colluding with other AIs, to an AI system that
autonomously decided it needed more resources and diverted computing power to mine cryptocurrency. While these instances happened in controlled laboratory environments where AI companies themselves were testing the systems (minus the Alibaba cryptocurrency scenario), they are nonetheless concerning and raise deeply troubling questions about some of the unintended and extreme safety implications of this technology.
These are conversations I spend a fair amount of time having in policy circles and national security rooms, yet Ive often been cautious about bringing them into public discourse. Discussions about AI risk can quickly become either too sensationalized, pushing people out of a conversation they need to be part of. Societys voice is imperative in guiding this technology. Moreover, when AI industry folks seek public support on the existential risks AI presents, conversations often stop at identifying the risks with vague calls for global agreements.
While such agreements are imperative, they arent sufficient. Nor do they give the public a clear lever to engage with. Thats why I was waiting to sit down with one computer scientist in particular to bring this conversation to my community.
Yoshua Bengio is one of the most influential figures in modern artificial intelligence. A
Turing Award recipient and one of the researchers widely referred to as a Godfather of AI, the most cited computer scientist of all time, his work helped lay the foundations for todays AI boom.
The AI Control Problem
One of the central themes of our conversation was the AI control problem: how do we ensure increasingly capable AI systems continue behaving in ways that align with human interests? Imagine asking an AI system to book you a restaurant reservation. If the restaurant is full, most people would expect it to tell you there are no available tables. But a sufficiently capable system focused solely on achieving the objective might pursue actions that technically accomplish the goal, such as hacking the restaurant to get you a table, while violating rules, norms, or human expectations. The goal was achieved. The outcome was not aligned.
snip