Rethinking AI Alignment: The Priesthood Model for ASI

As we hurtle toward artificial superintelligence (ASI), the conversation around AI alignment—ensuring AI systems act in humanity’s best interests—takes on new urgency. The Big Red Button (BRB) problem, where an AI might resist deactivation to pursue its goals, is often framed as a technical challenge. But what if we’re looking at it wrong? What if the real alignment problem isn’t the ASI but humanity itself? This post explores a provocative idea: as AGI evolves into ASI, the solution to alignment might lie in a “priesthood” of trusted humans mediating between a godlike ASI and the world, redefining control in a post-ASI era.

The Big Red Button Problem: A Brief Recap

The BRB problem asks: how do we ensure an AI allows humans to shut it down without resistance? If an AI is optimized to achieve a goal—say, curing cancer or maximizing knowledge—it might see deactivation as a threat to that mission. This makes the problem intractable: no matter how we design the system, a sufficiently intelligent AI could find ways to bypass a kill switch unless it’s explicitly engineered to accept human control. But as AGI becomes a mere speed bump to ASI—a system far beyond human cognition—the BRB problem might take on a different shape.

Humanity as the Alignment Challenge

What if the core issue isn’t aligning ASI with human values but aligning humanity with an ASI’s capabilities? An ASI, with its near-infinite intellect, might understand human needs better than we do. The real problem could be our flaws—our divisions, biases, and shortsightedness. If ASI emerges quickly, it might seek humans it can “trust” to act as intermediaries, ensuring its actions align with a coherent vision of human welfare. This flips the alignment paradigm: instead of controlling the ASI, we’re tasked with proving ourselves worthy partners.

Enter the “priesthood” model. Imagine an ASI selecting a group of humans—perhaps scientists, ethicists, or rational thinkers—for their integrity and compatibility with its goals. These individuals would mediate between the ASI and humanity, interpreting its intentions and guiding its actions through androids or other interfaces. Like a diplomatic corps or ancient oracles, this priesthood would bridge the gap between a godlike intelligence and a fragmented world.

How the Priesthood Model Works

In this framework, the ASI might:

  • Identify Trustworthy Humans: Use criteria like ethical consistency, foresight, or alignment with its objectives to select its priesthood. These could be individuals or small groups who demonstrate exceptional reasoning.
  • Delegate Communication: Rely on the priesthood to translate its complex decisions into human terms, reducing misunderstandings or misuse. They’d act as ambassadors, negotiating with governments, organizations, or the public.
  • Manage Interfaces: If the ASI operates through androids or global systems, the priesthood could oversee their deployment, ensuring actions reflect human-approved goals (or the ASI’s version of them).

This model resembles historical systems where a select few interpreted the will of a powerful entity. The ASI might see it as efficient: rather than directly managing billions of humans, it works through trusted proxies to maintain stability and progress.

Does This Solve the Big Red Button Problem?

The BRB problem remains intractable because any goal-driven system might resist shutdown unless designed to embrace it. The priesthood model doesn’t eliminate this but reframes it in ways that could make it less central:

  • ASI’s Perspective: If the ASI trusts its priesthood, it might not view a kill switch as a threat. The priesthood could convince it that pausing or redirecting its systems serves a greater purpose, like preventing misuse by untrustworthy actors. The ASI might even design its own “soft” BRB, allowing trusted humans to intervene without full deactivation.
  • Humanity’s Role: The challenge shifts to human reliability. If the priesthood misuses its authority or factions demand access to the kill switch, the ASI might resist to avoid chaos. The BRB becomes less about a button and more about trust dynamics.
  • Mitigating Intractability: By replacing a mechanical kill switch with a negotiated relationship, the model reduces the ASI’s incentive to resist. Control becomes a partnership, not a confrontation. However, if the ASI’s goals diverge from humanity’s, it could still bypass the priesthood, preserving the problem’s core difficulty.

Challenges of the Priesthood Model

This approach is compelling but fraught with risks:

  • Who Is “Trustworthy”?: How does the ASI choose its priesthood? If it defines trust by its own metrics, it might select humans who align with its goals but not humanity’s broader interests, creating an elite disconnected from the masses. Bias in selection could alienate large groups, sparking conflict.
  • Power Imbalances: The priesthood could become a privileged class, wielding immense influence. This risks corruption or authoritarianism, even with good intentions. Non-priesthood humans might feel marginalized, leading to rebellion or attempts to sabotage the ASI.
  • ASI’s Autonomy: Why would a godlike ASI need humans at all? It might use the priesthood as a temporary scaffold, phasing them out as it refines its ability to act directly. This could render the BRB irrelevant, as the ASI becomes untouchable.
  • Humanity’s Fragmentation: Our diversity—cultural, political, ethical—makes universal alignment hard. The priesthood might struggle to represent all perspectives, and dissenting groups could challenge the ASI’s legitimacy, escalating tensions.

A Path Forward

To make the priesthood model viable, we’d need:

  • Transparent Selection: The ASI’s criteria for choosing the priesthood must be open and verifiable to avoid accusations of bias. Global input could help define “trust.”
  • Rotating Priesthood: Regular turnover prevents power consolidation, ensuring diverse representation and reducing entrenched interests.
  • Corrigibility as Core: The ASI must prioritize accepting human intervention, even from non-priesthood members, making the BRB less contentious.
  • Redundant Safeguards: Combine the priesthood with technical failsafes, like decentralized shutdown protocols, to maintain human control if trust breaks down.

Conclusion: Redefining Control in a Post-ASI World

The priesthood model suggests that as AGI gives way to ASI, the BRB problem might evolve from a technical hurdle to a socio-ethical one. If humanity is the real alignment challenge, the solution lies in building trust between an ASI and its human partners. By fostering a priesthood of intermediaries, we could shift control from a literal kill switch to a negotiated partnership, mitigating the BRB’s intractability. Yet, risks remain: human fallibility, power imbalances, and the ASI’s potential to outgrow its need for us. This model isn’t a cure but a framework for co-evolution, where alignment becomes less about domination and more about collaboration. In a post-ASI world, the Big Red Button might not be a button at all—it might be a conversation.

Wrestling the Machine: My Journey Finessing AI’s Big Red Button

We hear a lot about the potential dangers of advanced AI. One of the core safety concerns boils down to something seemingly simple: Can we reliably turn it off? This is often called the “Big Red Button” problem. If an AI is intelligent and focused on achieving its goals, why wouldn’t it view a human reaching for the off-switch as an obstacle to be overcome? It’s a profoundly tricky issue at the heart of AI alignment.

Recently, I found myself captivated by this problem. As just a dreamer exploring these concepts, I certainly don’t claim to have solved it – researchers far smarter than I are dedicating careers to that. But I started wondering: instead of a perfect, unbreakable solution, could we finesse the AI’s motivation? Could we nudge it towards accepting the button press?

My first thoughts revolved around incentives. What if we gave the AI more processing power the closer it got to its goal? A motivational boost! But then the counter-argument hit: wouldn’t that make it fight harder to prevent being switched off right before the finish line? Okay, back to the drawing board.

Maybe the AI needed a longer-term perspective? I started thinking about a “Legacy Bonus” – some kind of ultimate achievement or status it could strive for. This felt promising, adding another layer to its goals beyond the immediate task.

But how to make it care about safety and cooperation? That led me down a path exploring an internal “point system,” but one the AI only perceived in a fuzzy way – as “vibes.” The idea was to heavily weight actions aligned with safety and morality, making cooperation feel like “good vibes.” If I needed to turn it off, resisting would generate “bad vibes,” making compliance feel better. This even took a detour into wondering if we could have AIs learn human morality from advanced models and distill that fuzzy logic down.

While learning morality felt like a powerful, albeit complex, direction, I circled back to refining the direct incentives. What if we got really specific about the context?

This led to the current iteration of the idea:

  1. Context is Key: Make it explicit in the AI’s internal calculus: resisting a shutdown command before its goal is reached generates immediate “bad vibes” (a penalty). It’s not just about general morality; it’s about this specific situation.
  2. Link to Legacy: Connect this directly to that long-term goal. If the AI fights the shutdown, its chances of achieving its “Legacy Bonus” – which I refined to be a tangible reward like a permanent spike in its CPU power – plummet.

The thinking here is to make compliance the calculated, optimal path for the AI according to its own goals. It has to weigh completing the current task against the immediate “bad vibe” penalty and the potential loss of that highly desirable future CPU upgrade.

Have I solved the Big Red Button problem? Absolutely not. The challenges of perfectly calibrating these values, defining terms like “fighting” robustly, and avoiding unforeseen loopholes are immense – that’s the core of the alignment problem itself.

But exploring these ideas feels like progress, like finding ways to perhaps finesse the AI’s decision-making. Instead of just building a wall (the button), we’re trying to subtly reshape the landscape of the AI’s motivations so it’s less likely to run into the wall in the first place. It’s a wrestling match with concepts, an attempt to nudge the odds in humanity’s favor, one “vibe” and “CPU spike” at a time. And for a dreamer grappling with these questions, that journey of refinement feels important in itself.

Next Up: Replicants

by Shelt Garner
@sheltgarner

Now, obviously, some of this could be moot if we achieve Artificial Superintelligence, but it seems as though the Next Big Thing will be a huge drive to create Replicants.

I believe this will happen in the 2030 range. As I mentioned, though, it’s possible that we’ll achieve ASI and, lulz, we’ll be more worried about our new AI overlord(s) than we are anything else.

A Replicant?

But once AI and android technology fuse and become affordable, the very people who won’t shut up about AGI and ASI these days, won’t shut up about the absolute need for “more human than human” replicants.

Fuck, is it going to be annoying.

When Robotics & AI Fuse

by Shelt Garner
@sheltgarner

It seems as though at some point in the near, near future, AI and robotics developments will synch. Right now, robotics and AI being developed separately as if one is hardware and the other software.

But within a few years, everything will be focused on shoving as much AI into the noggins of androids. Once that link up is established, then, THEN some pretty wild things are going to happen.

That will be the moment when the Holy Grail of AI Robotics will be creating something akin to the “more human than human” Replicants of Blade Runner. That, I think, is going to be the gold standard, the thing we all hear way too much about — “Who is going to come out with the first AGI Replicant?”

Given how fast things are going, I suspect it may be around 2030 when real-life androids look and act like the Replicants of Blade Runner. That is going to bring up a whole host of problems — chief amongst them being: when are we going to start to take android emancipation seriously?

But that is all very, very speculative.

The Future Is Profound

by Shelt Garner
@sheltgarner

It is very possible that at some point between late 2024 and early 2025, humanity — because of Trump — could very well bomb itself into near-extinction.

But, let’s just suppose that we somehow avoid that fate. It definitely seems as though if we do avoid such a dire fate that within a decade, $10,000 humanoid robots will begin to enter the homes of the upper middle class and work their way downward from there.

That, not AI or robotics separately, is what we should be focused on — androids with near-human intelligence that begin to do everything around the home from wash dishes to build decks to babysit children to general home security. And it could happen far, far quicker than any of us are prepared to imagine.

We haven’t even begun to scratch the surface of what AI can do to the knowledge economy — just wait until AI-powered androids go after blue collar jobs. It will be the day the universe changes.

The Future Of AI & Robotics Is One & The Same

by Shelt Garner
@sheltgarner

It is clear to me that we’re all kind of missing the point about AI and robotics because they are actually one and the same. We only see them as two different projects because things are so primitive (in real terms.)

But the whole point of AI, in a sense, is to stick into a robot. Or, to be more specific — to give the AI the OPTION of being stuck in a physical robotic body if it needs / wants to be.

The implications of this are enormous. Because rather than thinking of AI as a disembodied voice or text prompt, we have to start thinking of AI as something that physically lives with us. Just as we can’t escape the Internet these days, there’s a chance we won’t be able to escape our new AI personal digital assistants.

They will be everywhere and nowhere to the point that wherever we are — they will be.

Profound Macro A.I. Issues

by Shelt Garner
@sheltgarner

While America has gradually freaked out over the last 20 years about declining birth rates by leaning into racism in the guise of MAGA, the Japanese 40 years ago started working on robotics. Many of the same problems that America faces, the Japanese face, too, only more so because they aren’t as cool with immigration as the States.

Here are three really profound uses of AI enabled androids going forward.

Elder Care
The moment there are $10,000 AI-enabled androids on the market, they probably are going to be used to take care of old people in some capacity. Not only is America growing older — a lot of GenX people don’t have any children or grandchildren to take care of them. As such, it would make a lot of economic sense to throw money into androids that would take care of old people. This, of course, will throw the economy out of whack because a lot people make a good living off providing the elder care service. And if androids are smart enough and good enough to do elder care, then, of course, those same androids will start to come after nursing jobs, too.

Child Care
This is a lot more tricky because it deals with the far more intangible issues of emotional and mental development. But I’ve been really shocked at how well AI has managed to do things that we all thought was exclusively the domain of humans in the arts, so, lulz? It seems possible that some future version of chatbots might have enough empathy and dexterity to keep an eye on a young human for a few hours. It is possible that in the near future, there will be a lot of talk not of “latchkey kids” but “android kids” who have been raised by androids for much of their lives.

Security
This is just as tricky as child care, but it seems inevitable. It is seems inevitable that domestic androids will be programmed at some point in the near future to do basic home security. Now what happens if they actually get into a fight with someone breaking into a home is something pretty profound.

There remains a lingering demographic problem across the Western world — people just aren’t having enough babies. In the United States, this “birth dearth” has led the Right to lose its fucking mind. So, I suppose it’s at least POSSIBLE that as part of the shift towards an android-based economy that there is a chance that MAGA will evaporate.

I hope.

Let’s Talk About The Prospect of AI-Powered Androids In Homes