Unpacking AI Safety

Tackling AI Safety & Alignment Challenges Amid Rapid Progress and Potential Disruptions.

Anthropic - an AI safety and research company - is on a mission to build reliable, interpretable, and steerable AI systems.

First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human experts but it pursues goals that conflict with our best interests, the consequences could be dire. This is the technical alignment problem.

Second, rapid AI progress would be very disruptive, changing employment, macroeconomics, and power structures both within and between nations. These disruptions could be catastrophic in their own right, and they could also make it more difficult to build AI systems in careful, thoughtful ways, leading to further chaos and even more problems with AI.

If you read one thing on AI safety, I recommend this post covering their core views - it’s balanced and informative.