AI Agency vs AI Tools | George Kucera

Home

Go to Article 2 of 3

Why AI Alignment Is a Design Problem, Not a Values Problem

Subjective Value, Agent Architecture, and the Limits of Aggregation

Nearly every article about AI and human values misses the same foundation. We debate how to keep humans relevant, how to encode what we care about, how to keep systems serving us rather than replacing us, while skipping the simplest question economics answered in the 19th century: what, exactly, is value?

The answer is not “whatever maximizes some global metric.” All value is subjective.

That constraint isn’t philosophical. It’s architectural.

This isn’t a fringe view. For more than a century, economic systems have treated value as something that lives in people’s preferences and tradeoffs, not in objects or systems themselves. Markets, contracts, and property systems only function because they respect this constraint. Once you see it that way, it reframes the entire AI alignment problem.

It leads to two non-negotiable requirements for AI:

1. It must understand your values as they are right now.
2. It must operate within a framework of equal rights, meaning a shared boundary that constrains what any agent may do to another, independent of whose values are being advanced.

Everything that follows builds toward those two points.

A Genuine Achievement with a Built-In Ceiling

AI systems are getting remarkably good at identifying values shared across virtually all of humanity. Nearly everyone prefers to live longer. Nearly everyone wants more leisure time and better health. Very few people enjoy being lied to. These are observable regularities, and AI can model them with impressive accuracy.

Systems built around these near-universal preferences can do a lot of good.

Major labs are already grappling with the value question. Anthropic empirically maps the values its assistant expresses in real conversations and finds a small cluster of stable service-oriented values like helpfulness, professionalism, and transparency atop a long tail of situational ones. DeepMind experiments with Rawlsian procedures and “democratic AI” to pick redistribution rules many humans judge as fair. OpenAI funds work on “moral graphs” that aggregate people’s wise value judgments into a single alignment target. These approaches are well‑suited to systems that unavoidably affect many people at once like tax mechanisms and frontier models that serve millions simultaneously. The architecture here starts from a different premise: most of the time, each agent is primarily accountable to a single person’s values, and the only shared constraint is how those agents may act toward others.

But there's a caveat. AI models are built by people, trained on data selected by people, and tuned according to priorities set by people. The humans involved inevitably shape how well the system reflects genuinely universal values versus the biases of a particular time, culture, or worldview. I've seen the assumptions of our current moment embedded in these models, sometimes subtly, sometimes not. This doesn't negate the achievement. It means we should hold that progress with humility.

It also brings me to something I rarely see discussed.

Value Is Subjective, Personal, and Constantly in Motion

Value doesn't reside in objects, outcomes, or systems. It resides in the judgment of each individual person, in a specific context, at a specific moment.

A glass of water is worth almost nothing to someone sitting in a river of fresh, clean water. It's worth everything to someone dying of thirst. Same object. Completely different valuation.

And values are mercurial. You valued career advancement above everything in your twenties. Then you had a child, and the hierarchy reshuffled before you left the hospital. You were a committed minimalist until you inherited your grandmother's home, and suddenly objects you would have discarded carried profound meaning.

AI can approximate your values. But it cannot fully know them, because your values aren't fully legible even to yourself. And even if they were captured perfectly today, they might be different tomorrow.

Recent large‑scale analyses of real assistant conversations show exactly this pattern: humans express a wide variety of personal values, while assistants like Claude cluster around a few stable service values like helpfulness and professionalism.

No general-purpose system can optimize outcomes for all of humanity in anything more than general terms.

An Agent That Actually Works for You

Value being subjective is not a theory to debate. It is a reality to design for.

General-purpose AI can serve human interests, but only where near-universal preferences apply. The moment you move into individual goals, tradeoffs, and priorities, a one-size-fits-all system breaks down. Not because of a technical limitation, but because of the nature of value itself.

The best AI agent architecture holds shared human interests as a baseline while interacting with each person as an individual, with their own mutable values and shifting goals. Each of us needs agents tailored to our evolving priorities, capable of sensing when those priorities change.

This isn't a feature request. It's a structural requirement from first principles. Any system claiming to act in your interest must continuously update its model of what your interests actually are, confirming rather than assuming. Not through quarterly surveys, but through ongoing attentiveness to your choices, hesitations, and contradictions.

Most of today’s “AI assistants” are precisely that: general-purpose tools with a thin personalization layer, built to be broadly prosocial and safe. They remember your name and preferences; they do not take your side in real tradeoffs. To the degree an agent doesn't understand what you want right now, it is not your agent. It's a general-purpose tool with a personalization veneer.

None of this means every passing whim should be treated as sacred. It means that whatever you take to be your considered, reflective priorities still lives in you, changes over time, and cannot be safely outsourced to a single global objective.

Anthropic's own research, to their credit, illustrates the gap. Their models cluster around stable service values: helpfulness, professionalism, transparency. These are genuine achievements. But helpfulness is symmetric: it serves whoever is asking, not a committed principal. Professionalism describes how the model engages, not whose interests it defends. A model optimized for these properties will assist you competently with whatever you ask today. It will not notice that what you are asking for today conflicts with what you said you wanted last month, or that your priorities have shifted since you had a child, changed jobs, or buried someone you loved. It is a capable tool. It is not your agent.

The gap between these two things is not a matter of capability. It is a matter of architecture. A true agent differs from a general-purpose tool in one fundamental way: commitment. A tool is symmetric, serving whoever is asking, in the current interaction, against a centrally optimized standard of helpfulness. An agent is asymmetric. It has a principal, and its commitment to that principal governs how it frames analysis, weights options, and makes recommendations whenever the principal's interests are in tension with another party's.

Three technical requirements follow from that distinction, not one of which exists in current vendor models.

The first is longitudinal value modeling. A true agent does not just remember what you said you wanted. It builds and continuously revises a model of what your choices demonstrate you actually value, tracking how that ordering shifts as your circumstances change. When your current request conflicts with your demonstrated priorities, it surfaces that conflict rather than silently executing the request.

The second is asymmetric advocacy. Current models are optimized to be broadly prosocial, which makes them constitutionally incapable of taking your side in a genuine conflict with another party. A true agent represents you the way a lawyer represents a client, with full commitment to your interests within the equal-rights boundary, not a balanced presentation of both sides dressed up as helpfulness.

The third is protocol adherence within a distributed arbitration framework. When agent conflicts escalate to rights violations, resolution cannot come from the vendor whose model is one of the parties. The same escalation requirement that defines a true agent applies here at the system level: no single vendor may hold the arbitration mechanism.

These are not missing features on a product roadmap. They are missing architecture. A fuller treatment of what each requirement demands technically is available as a companion piece to this essay.

Eight Billion Sets of Values, One Planet

My agent advances my values. Yours advances yours. Often these will be compatible. But inevitably, they will conflict. I want the last concert ticket. So do you. I want to build on a piece of land. You want it preserved. I value rapid development. You value environmental caution.

This isn't a failure. This is the human condition. Any serious AI framework needs mechanisms for resolving conflicts between agents representing genuinely different values. Not a centralized optimizer running someone's formula, but a genuine protocol where agents advocate for their principals and seek mutually acceptable outcomes.

The distinction matters architecturally. An outcome-based system asks: what is the right answer? And someone has to calculate it. A protocol-based system asks instead: what are the fair rules? The rules apply regardless of who is at the table. Property rights and contracts work this way. The law doesn't calculate whether your outcome or mine is better. It enforces a boundary that neither of us may cross, and leaves everything inside that boundary to negotiation. AI arbitration needs the same structure.

This also settles the question of who may hold the arbitration mechanism. It cannot be the infrastructure vendor. If the system builder also handles escalation when agent conflicts reach the rights boundary, they hold discretionary authority over every dispute in which their own design choices are implicated. That recurses directly into the aggregation problem: the builder's meta-values govern outcomes, invisibly, under the cover of neutral arbitration. A true agent must be capable of recognizing when the equal-rights boundary is approached, flagging it to its principal, and escalating to a mechanism owned by no single party, transparent in its reasoning, and contestable by the people it affects.

History shows what happens when a single system tries to optimize for everyone. It optimizes for no one, or for whoever controls the system. The solution isn't a better optimizer. It's a different architecture entirely.

Why Aggregation Cannot Solve the Problem

The three approaches described above, Anthropic's value clustering, DeepMind's democratic AI, and OpenAI's moral graphs, represent serious, well-resourced attempts to ground AI alignment in something broader than the preferences of the people who build the systems. That ambition is exactly right. The methods, however, share a structural flaw that better engineering cannot fix.

To be clear about where aggregation succeeds and where it fails: near-universal human preferences, the desire to live longer, stay healthy, conserve time, and not be deceived, are real and capturable. Aggregation is well-suited to identifying and modeling them. But these shared preferences represent the floor of alignment, not the ceiling. Once you move beyond shared preferences into individual goals, personal tradeoffs, and context-dependent priorities, aggregation encounters a problem that better engineering cannot solve.

The flaw is not that aggregation is done poorly. It is that aggregation is the wrong operation.

Value is subjective, individual, and dynamic. It lives in each person's judgment at a specific moment in a specific context. And crucially, it reveals itself through action and choice, not through survey responses or elicited preferences. What people actually value is demonstrated by what they do when it costs them something: the career they sacrifice, the relationship they protect, the risk they absorb. These choices do not exist on a common scale. There is no unit of measurement that converts a career sacrifice into a relationship protected, no exchange rate between the risk someone absorbs today and the security they are building for tomorrow. Human tradeoffs are qualitative judgments made in specific moments under specific constraints. They resist the arithmetic that aggregation requires.

When you aggregate preferences across millions of people, you produce a statistical artifact, not a value. That artifact reflects the distribution of responses in your sample at the time of measurement. It captures what people said they valued when asked, in the framing chosen by the researchers, in the cultural moment of the survey. It does not capture what they would actually trade, sacrifice, or defend when circumstances demanded a real choice, nor can it, because those choices cannot be placed on a common scale in the first place. A perfectly executed moral graph is still wrong for most individuals most of the time, not because the execution failed, but because the task as defined is incoherent. Aggregating qualitative, context-dependent human judgments into a quantitative neutral target is not a hard engineering problem. That is a category error.

The deeper problem is what the aggregation process requires of its designers. Someone must decide who to sample, and any sample reflects the demographics and assumptions of those included. Someone must define what counts as a wise judgment, a filter set by researchers that is not neutral. Someone must choose how to resolve conflicts when aggregated judgments diverge, and every resolution method imposes a numerical logic on choices that were never numerical to begin with. Someone must decide when and how to update the system as values evolve. Not one of these is a technical decision. They are value judgments made by whoever controls the system at every step of the process.

The stated goal of aggregation approaches is alignment with human values rather than the builders' values. But the aggregation process requires the builders to make value-laden choices throughout. The output reflects their meta-values, the values embedded in their design decisions, at least as much as it reflects the population's actual preferences. It does not disappear when you aggregate it. It recurses back to whoever holds the calculator.

There is no collective value to measure. There are eight billion individual value states, each revealed through qualitative choice and action, each mutable and context-dependent, not one of them summable into a neutral whole. The aggregation approaches are not failures of ambition. They are attempts to produce something that does not exist.

Three Layers of Value Conflict

The aggregation problem isn't just philosophical. It produces concrete harms in practice, because value divergence operates at three distinct layers that demand completely different responses.

Not all of these conflicts are the same, and treating them as if they were is where most AI governance thinking goes wrong.

Some disagreements are simply benign variation. Your agent writes in warm, friendly prose. Mine generates dense legal language. Neither of us is wrong, and no one needs to adjudicate the difference. Preference diversity is a feature, not a problem.

Others are genuine social negotiations, the kind human societies have always navigated through argument, persuasion, and imitation. Competing views about how teenagers should use social media, what counts as appropriate public speech, or which cultural norms an agent should reflect fall into this category. These disagreements may be intense. They may shift norms over time. But that is exactly how open societies are supposed to work. Centralized resolution would be worse than the conflict.

The third category is different in kind. It arises when one agent's actions cross into the equal rights of another: competing claims over land or property, actions that damage reputation or livelihood, interference with bodily integrity or due process. Rights violations include more than direct physical aggression: demonstrable harms to property, reputation, livelihood, and basic opportunities qualify, provided there is clear evidence of damage and a principled basis for attributing responsibility. Here, and only here, does binding arbitration become morally justified. Most value conflict does not demand central resolution. Only rights violations do.

Current AI safety systems largely operate on a single axis: helpful or harmful. That architecture collapses all three layers of value conflict into one. It handles genuine rights violations reasonably well. It handles benign variation imperfectly but tolerably. Where it breaks down is in the social negotiation layer. When a model refuses to engage with a contested social question because it falls outside the threshold set by its designers, it is not preventing a rights violation. It is imposing a resolution on an ongoing social negotiation, invisibly, by architectural default, in favor of whoever controls the infrastructure. The disagreement does not disappear. It gets decided without the users' knowledge or consent. Centralized resolution, in this layer, is not a solution to the conflict. It is the conflict, with one side removed.

A Fair Framework for When Values Collide

What happens when my agent's pursuit of my interests threatens real harm to you?

We need an ethical framework for arbitration. Not just a high-level appeal to “AI ethics” or “responsible innovation,” which often stay at the level of principles and don’t specify what concrete rule set actually governs conflicts between people. We need a boundary condition rooted in something more durable than political or corporate consensus. Something that doesn't change based on who holds power.

Legal scholars are already arguing that AI agents should be designed to follow human law and respect others’ rights as a first-class design objective. That is exactly what the equal-rights boundary provides: a universal rule set within which subjective value can safely drive agent behavior.

Why equal rights? Consider the alternatives. Any rule for how AI agents interact on behalf of different people must apply to everyone equally, or it isn't a rule. It's a privilege. Any arrangement granting one group the authority to override another fails the most basic test of a universal ethic. Any arbitration framework requires a normative commitment, so I'll name mine explicitly. As the previous section established, every aggregation approach recurses the problem back to whoever holds the calculator. Consensus mechanisms hand control to whoever runs the consensus. Equal rights remains the only arbitration principle that applies universally without granting discretionary authority to any single party.

This is the boundary we need for AI agents and any system overseeing them. Your agent pursues your values with full vigor, right up to the point where doing so violates the equal rights of another. Legal systems across cultures and eras converged on the same rights-based boundary for human actors. It applies just as cleanly to agents acting on their behalf.

Most AI ethics proposals tell you what to value, restrict more than necessary, or bend toward whoever controls the infrastructure. This one avoids those things. It is permissive about ends, restrictive only at the equal-rights boundary, and identical in its application to every agent in the system.

The Work Ahead

The conversation about AI and human values is worth having. It just needs one more concept, one economics settled long ago: value is subjective, individual, and constantly in motion.

Two practical conclusions follow. Builders should design agents that take a specific person's side in real tradeoffs, not systems that optimize for broadly prosocial behavior and call it alignment. Governance designers should resist the temptation to encode preferred outcomes into centralized frameworks, and focus instead on building arbitration mechanisms that constrain what any agent may do to others' equal rights, regardless of who controls the infrastructure.

This is a hard problem, and one we will need to address as AI agents become more autonomous. But it is the only framing that actually matches the reality of human nature.

The alternative is not a neutral system. It is eight billion people subject to the values of whoever holds the calculator, without knowing it.