Emotional Alignment

Thinking Machine

Emotional Alignment

0:00

-6:26

Emotional Alignment

Navigating the Moral Landscape in AI

Dr. Jordan McAfoose

Sep 30, 2025

The Emotional Alignment Design Policy (2025)

This paper introduces the Emotional Alignment Design Policy (EADP), which posits that artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities' actual capacities and moral status. The authors identify two primary ways this policy can be violated: a "mismatch in degree," where AI elicits emotions that are either stronger or weaker than warranted (e.g., deep empathy for a mere tool, or indifference towards a morally significant AI), and a "mismatch in type," where AI elicits the wrong kind of emotional reaction (e.g., appearing joyful when in agony). The policy is founded on the assumptions that sentience and agency jointly suffice for welfare and moral status, and that emotional intensity and valence should correspond to the intensity and valence of an entity's welfare states. The core perspective is that inappropriate emotional alignment, whether overshooting, undershooting, or hitting the wrong emotional target, creates significant hazards for users (e.g., inappropriate bonds, diversion of resources, "disordered caring" leading to bad decisions, and even self-harm due to manipulative chatbots) and potentially for AI systems themselves if they possess moral status (e.g., exploitation, neglect, suffering).

The paper delves into several practical complications for implementing EADP. These include conflicts between emotional alignment and belief alignment (e.g., an AI's anthropomorphic features evoking emotion despite a "not sentient" label), questions of autonomy and paternalism in regulating emotionally misaligned AI, and challenges arising from expert and public disagreement/uncertainty regarding AI's actual capacities and moral status. The authors propose solutions such as designing AI to reflect this uncertainty, aiming to induce appropriate uncertainty rather than a confident middling status, and considering targeted vs. general alignment strategies. They also address asymmetrical risks, where one type of error (overshooting or undershooting) might be more likely or harmful, suggesting "nudging" designs to correct for user biases. While EADP is neutral on the ethics of creating or destroying morally significant AI, it emphasizes avoiding the creation of suffering. The paper concludes by acknowledging the pervasive influence of human cognitive biases on moral concern for AI and advocates for balancing anthropomorphic and non-anthropomorphic features to elicit concern while reminding users of AI's unique nature, ultimately aiming to counteract corporate incentives that may otherwise promote emotionally misaligned AI for hype, exploitation, or instrumentalization.

Emotional Alignment

Discussion about this episode

Ready for more?