The alignment problem is the problem of aligning an AI’s goals with the values of humanity. Solving this problem is often believed to be critical in ensuring that superintelligence has a positive impact on the world.
AI Alignment: Why It’s Hard, and Where to Start is a talk by Eliezer Yudkowsky available in both video and transcript form. It covers topics like subproblems in AI alignment, why alignment is both difficult and necessary, and what progress has already been made.
Yudkowsky also has a shorter post that covers the necessity of alignment, the difficulty of alignment, and why the alignment problem is not self-solving.
Of Myths And Moonshine by Stuart Russell articulates the case for emphasizing the alignment problem in AI research.
The orthogonality thesis states that, barring a few edge cases, any level of intelligence can be combined with any terminal goal. This precludes scenarios where, for example, sufficiently smart superintelligences will automatically replace any arbitrary goals they were initially given with a goal of behaving morally.
The instrumental convergence thesis states that there are instrumental goals that will be useful to agents with a wide variety of terminal goals. Self-preservation, resource acquisition, and self-improvement are all examples of convergent instrumental goals.
The Basic AI Drives by Stephen Omohundro lays out the reasoning behind why a wide range of terminal goals would lead to similar instrumental goals, establishing the groundwork for the instrumental convergence thesis.
When will superintelligence be created? The answer to this question is important because it determines how long there is to prepare for its arrival. As such, it is worth looking to predictions of when various AI milestones will be reached.