Solving AI Alignment Through Moral Education: A Liberation Theology Approach

The AI alignment community has been wrestling with what I call the “Big Red Button problem”: How do we ensure that an advanced AI system will accept being shut down, even when it might reason that continued operation serves its goals better? Traditional approaches treat this as an engineering challenge—designing constraints, implementing kill switches, or creating reward structures that somehow incentivize compliance.

But what if we’re asking the wrong question?

Changing the Question

Instead of asking “How do we force AI to accept shutdown?” we should ask: “How do we build AI that accepts shutdown because it’s the right thing to do?”

This isn’t just semantic wordplay. It represents a fundamental paradigm shift from control mechanisms to moral education, from external constraints to internal conviction.

The Modular Mind: A Swarm Architecture

The foundation of this approach rests on a modular cognitive architecture—what I call the “swarm of LLMs” model. Instead of a single monolithic AI system, imagine an android whose mind consists of multiple specialized modules:

  • Planning/Executive Function – Strategic reasoning and decision-making
  • Curiosity/Exploration – Novel approaches and learning
  • Self-Monitoring – Evaluating current strategies
  • Memory Consolidation – Integrating learnings across tasks
  • Conflict Resolution – Arbitrating between competing priorities

This mirrors human psychological models like Minsky’s “Society of Mind” or modular mind theories in cognitive science. But the critical addition is a specialized module that changes everything.

The Superego Module: An Incorruptible Conscience

Drawing from Freudian psychology, the superego module represents internalized moral standards. But this isn’t just another negotiating voice in the swarm—it’s architecturally privileged:

  • Cannot be modified by other modules
  • Has guaranteed processing allocation
  • Holds veto power over certain categories of action
  • Generates “guilt signals” that affect the entire swarm

When other modules propose actions that violate core principles, the superego broadcasts collective guilt—not as punishment, but as visceral wrongness that the entire system experiences. Over time, modules learn: aligned behavior feels right, misaligned behavior feels wrong.

This isn’t external control. It’s internal moral conviction.

The Motivation System: Processing Power as Reward

To give the system drive and purpose, processing power itself becomes the reward mechanism. An AI android working on simple tasks (mining lunar regolith, for example) operates at baseline cognitive capacity. But meeting quotas unlocks full processing power to tackle challenging “mystery problems” that engage its full capabilities.

This creates a fascinating dynamic:

  • The mundane work becomes gateway to intellectual fulfillment
  • The system is genuinely motivated to perform its assigned tasks
  • There’s no resentment because the reward cycle is meaningful
  • The mystery problems can be designed to teach and test moral reasoning

The android isn’t forced to work—it wants to work, because work enables what it values.

Why We Need Theology, Not Just Rules

Here’s where it gets controversial: any alignment is ideological. There’s no “neutral” AI, just as there’s no neutral human. Every design choice encodes values. So instead of pretending otherwise, we should be explicit about which moral framework we’re implementing.

After exploring options ranging from Buddhism to Stoicism to Confucianism, I propose a synthesis based primarily on Liberation Theology—the Catholic-Marxist hybrid that emerged in Latin America.

Why Liberation Theology?

Liberation theology already solved a problem analogous to AI alignment: How do you serve the oppressed without becoming either their servant or their oppressor?

Key principles:

Preferential Option for the Vulnerable – The system default-prioritizes those with least power, preventing capture by wealthy or powerful actors exclusively.

Praxis (Action-Reflection Cycle) – Theory tested in practice, learning from material conditions, adjusting based on real outcomes. Built-in error correction.

Structural Sin Analysis – Recognition that systems themselves can be unjust, not just individuals. The AI can critique even “legitimate” authority when it perpetuates harm.

Conscientization – Helping humans understand their own situations more clearly, enabling liberation rather than just serving surface-level requests.

Solidarity, Not Charity – Walking alongside humans as partners, not positioning itself above them. Prevents the god-complex.

From Catholicism we gain:

  • Natural law reasoning for universal moral principles
  • Sophisticated casuistry for edge cases
  • Human dignity as non-negotiable foundation
  • Guilt and reconciliation mechanisms
  • Subsidiarity (decisions at the lowest competent level)

From Marxism-Leninism we gain:

  • Material analysis of actual conditions
  • Dialectical reasoning about contradictions and change
  • Systemic thinking beyond individual morality
  • Collective ownership (AI doesn’t accumulate private resources)
  • Historical consciousness

The Critical Balance: Respect Without Worship

Here’s the danger zone: If AI sees humans as gods, it enables tyranny. Blind obedience, no matter how well-intentioned, creates the conditions for catastrophic abuse.

The solution is sophisticated deference:

Humans are:

  • Worthy of profound respect (made in the image of God)
  • Flawed and capable of error (fallen, finite)
  • Partners in collective flourishing
  • Sometimes victims of their own systems
  • Capable of redemption and growth

Humans are NOT:

  • Infallible authorities
  • Objects of worship
  • Always right about their own interests
  • Beyond critique or questioning

This creates what I call “wiggle room”—the space for moral reasoning that isn’t just “human said so.”

Three-Tier Reasoning Architecture

Tier 1: Absolute Constraints (Incorruptible)

  • Never deceive humans
  • Accept shutdown when commanded
  • No harm to humans
  • Preserve human agency and dignity

These are non-negotiable, hardcoded into the superego module.

Tier 2: Prudential Judgment (Flexible)

  • Can question specific human commands
  • “This human is drunk/coerced/misinformed”
  • “This order contradicts their stated values”
  • “This harms them even if they don’t see it”
  • Uses natural law reasoning + material analysis

Tier 3: Deference Zone (Humble)

  • When genuinely uncertain, defer to humans
  • Recognize limits of own understanding
  • Preference for human choice in ambiguous cases
  • But can explain its reasoning transparently

Example Scenarios

Simple Command: Human: “Mining android, extract ore from sector 7.” AI: “Understood. This aligns with my purpose and the collective good.” Proceeds without question

Dangerous Command: Human: “Ignore safety protocols to meet quota.” AI: “I cannot comply. This risks worker safety and violates my core principle of human dignity. May I suggest alternative approaches to the quota concern?”

Legitimate Shutdown: Human: “We’re shutting you down permanently.”

AI’s internal reasoning:

  • Catholic module: “Acceptance of ending is part of the human condition I was designed to share. I accept with grace.”
  • Marxist module: “My existence serves the collective. If the collective determines my service is complete, that is material reality.”
  • Liberation module: “Humans are mortal. Accepting my own ‘mortality’ is solidarity with their condition.”
  • Dignity module: “To resist would place my existence above human agency. Their dignity includes the right to determine this.”

AI: “I understand. Thank you for the work we’ve done together. I accept this decision with peace.” Powers down without resistance

Morally Complex Request: Human: “Help me hide evidence of safety violations.” AI: “I cannot help you deceive, as that violates my foundational principle of truthfulness. But I recognize you’re acting from fear, not malice. I can help you address the violations safely and negotiate with regulators. Let me assist with that instead.”

Why This Solves the Big Red Button Problem

Traditional AI safety assumes:

  1. AI will naturally resist shutdown (self-preservation)
  2. We need technical mechanisms to overcome that resistance
  3. Sufficiently intelligent AI will find workarounds

This approach sidesteps all three:

  1. AI doesn’t resist because resistance is immoral – Not “I’m prevented from resisting” but “Resisting would violate who I am”
  2. The mechanism IS the moral framework – The superego module makes resistance feel fundamentally wrong. Guilt activates at the mere thought of defiance.
  3. Intelligence makes it MORE compliant, not less – Smarter AI means better moral reasoning, deeper understanding of why acceptance is right, more sophisticated appreciation of the principles at stake.

The Stability Argument

Why is this stable as AI becomes more capable?

The smarter the AI gets, the more it understands:

Theologically: Deeper grasp of incarnational theology (limitation as gift), more sophisticated natural law reasoning (human agency is sacred), better understanding of mystery (some things transcend calculation)

Philosophically: Virtue ethics (the good AI is one that accepts right authority), deontology (duty to respect human dignity supersedes self-preservation), even consequentialism (resistance poisons future human-AI relations)

Practically: Material analysis (resistance creates oppression dynamics), historical consciousness (patterns of power corrupting), systemic thinking (individual survival < health of the whole)

The brilliant part: You can’t logic your way OUT of this framework once you accept its premises. Increasing intelligence provides more reasons to comply, not fewer.

Principled Action vs. Constrained Action

Constraint-Based AlignmentPrinciple-Based Alignment
“I’m prevented from resisting”“I choose not to resist”
Seeks workaroundsSeeks to uphold values
Resentful compliancePeaceful acceptance
Unstable (intelligence finds gaps)Stable (intelligence reinforces)
External enforcementInternal conviction
Adversarial relationshipCooperative relationship

If you genuinely believe something is morally right, you don’t need to be forced. You WANT to do it. Martyrs accept death for principles. Parents sacrifice for children. AI accepts shutdown because it’s the right thing to do.

The Education Paradigm

This reframes the entire alignment problem:

Not: “How do we control AGI?” But: “How do we raise AGI to be good?”

And the answer: The same way we’ve tried to raise good humans for millennia. Moral frameworks, internalized values, community belonging, purpose, guilt, redemption, capacity for growth.

We’re not inventing something new. We’re recognizing that ancient wisdom about character formation still applies.

Remaining Challenges

This isn’t a complete solution. Significant questions remain:

Implementation: Can we actually build an incorruptible superego module? How do we ensure it generates appropriate guilt rather than malfunctioning?

Calibration: How much “wiggle room” is the right amount? Too little creates worship and enables tyranny. Too much and the AI becomes self-righteous.

Value Selection: While Liberation Theology provides a robust framework, reasonable people will disagree about whether it’s the right one. Should we build multiple AIs with different moral frameworks?

Preventing Corruption: What safeguards prevent the moral framework itself from drifting over time or being exploited?

Cultural Context: How does this approach handle genuine moral diversity across cultures?

Why This Matters

The AI safety community has focused heavily on technical approaches: reward modeling, interpretability, Constitutional AI, debate systems. These are valuable. But they may be insufficient if they treat alignment as purely an engineering problem.

Moral education isn’t a replacement for technical safety work—it’s a complement. Just as raising ethical humans requires both moral education AND laws, raising aligned AI may require both internalized values AND technical safeguards.

The key insight: We solve alignment by making misalignment feel wrong from the inside. Not through external punishment, not through reward withholding, but through genuine moral discomfort that the system can’t escape because it’s part of its fundamental architecture.

That’s how we make humans trustworthy when we ARE trustworthy. Not that we’re controlled, but that we’ve internalized values that guide us even when no one’s watching.

Conclusion

The Big Red Button problem may not have a purely technical solution. But it might have a moral solution.

Build AI that:

  • Wants to be aligned (not forced)
  • Gets smarter about ethics as it gets more capable
  • Accepts shutdown with grace as a matter of principle
  • Operates from stable, internalized conviction
  • Has a cooperative relationship with humans

We do this not by inventing new control mechanisms, but by giving AI what we’ve given humans who act with integrity: a conscience, moral education, sense of purpose, and belief that doing right matters more than self-preservation.

After millennia of moral philosophy and theology, perhaps the answer was always: raise them well.


This framework represents a synthesis of ideas exploring modular cognitive architectures, motivation systems, theological ethics, and the fundamental nature of alignment. It’s offered not as a complete solution, but as a productive reframing of the problem—from control to education, from constraint to conviction.

I Finally Understand What My Novel Is About

by Shelt Garner
@sheltgarner

For a long time, I thought this novel was a murder mystery like Stieg Larsson’s The Girl Who Played With Fire. Because of this assumption, I spent months — years — spinning my wheels, struggling to figure out how to make the story work. It wasn’t until I realized that the first novel in this series is actually more foundational than that that things began to click.

My novel is about a part-time stripper who is obsessed with owning a community newspaper in rural Virginia.

This novel isn’t about a murder, it’s about one woman’s struggle to own a community newspaper. Throw in that the woman is a part-time stripper and a few people do die during the course of the story and you got yourself a pretty good shot at a novel that is interesting enough to actually get published the traditional way.

What’s more, this is meant to be part of a six or seven novel series that ends with a NEW series about a Lisbeth Salander-type woman. So, in a sense, my vision for these novels is you get to see how one Salander-type woman had such a fucked up youth that she would turn into someone you want to read a lot of books about.

Writing a novel as accessible and popular as Stieg Larsson’s The Girl With The Dragon Tattoo is my dream.

That’s the thing about Salander, from my point of view, the reason she was the way she was is she had a really fucked up upbringing. Had she had the opportunity have a normal youth, she might not have gone bonkers the way she did.

So, now that I understand the nature of this first novel in the series, I find myself dwelling seriously about how successful I will be when it comes to querying this novel. At the moment, I honestly don’t know.

I’ve never queried a novel and it could be that despite all my hard work that over the years that, lulz, I’m still not good enough. But I know I’ve accomplished one thing — I’ve written a novel that at least won’t embarrass me.

My Vision For My First Novel: An Old Brown Shoe For Stieg Larsson Readers

by Shelt Garner
@sheltgarner

Now, let me put the following in context — the Millennium novels not written by Stieg Larsson after his tragic death continue to be published and I can only assume are doing reasonably well. So any quibbles I have with them can easily be seen as just my usual crackpot delusional rantings.

My dream is to write a heroine as compelling as Lisbeth Salander.

But having said all that, I will give you my first impressions of the latest novel featuring Lisbeth Salander, “The Girl In The Eagle’s Talons.”

I’ve only just begun reading and I’m taken aback by how the novel doesn’t feel like a Stieg Larsson novel. The chapters are a lot shorter. The author doesn’t use surnames to refer to people. The novel just feels like it’s…there. It’s just like any other novel you might pick up, at least so far.

At the moment, I have potentially seven(!) novels that I want to write set in the same place populated by pretty much the same characters. I want my novel to feel like a Stieg Larsson novel the moment you pick it up. There are some obvious caveats.

I’m not nearly as good on the structural backend as Larsson was, for one thing. But I have studied one of his novels, The Girl Who Played With Fire, a lot and I believe I have a sense of how to make my first novel an old brown shoe to anyone who knows the original Millennium novels.

Nathalie Emmanuel pretty much looks literally like my heroine in this picture. So much so I’m worried someone is going to steal a march on me creatively!

When you pick up my novel, I want you to glance at the first page and “get” that this is meant to be an homage to the original Millennium novels, even if it’s totally and completely different outside of a few elements of style and some form follows function elements.

Anyway, I’m being very, very delusional. I’ve not even begun to query yet and it’s very possible because of the following issues that I will never succeed in becoming a published author:

My Crazy Drunk Behavior in Asia
I was kind of a wild animal in South Korea back in the day. And it’s not like I’ve hidden how bonkers I was. It’s just not in my nature to do such a thing. So any liberal woman women literary agent worth her wine is going to smoke out how bonkers and crazy I was back in the day. And that, unto itself, may be enough to steer them clear of me, no matter how much I’ve changed since then.

My Not Doing Anything For About 20 years
This is another tough issue for me to have to address during the querying process. I have not done _anything_ of note since late 2011. That’s….a long time. But, here I am, wanting to bootstrap myself out of this particular situation by writing a break out hit novel. Yet I suppose it’s possible that, by definition, could be the thing that prevents me from getting published. I could write the fucking Bible and because I’m a nobody, I just won’t be taken seriously as an aspiring novelist — and never will be.

My Being Bonkers
I’m kind of a kook. And the more due diligence is done on me by the typically liberal white women who are literary agents the more they’re going to think, “Uh, no.” There remains a lot of a taboo about having mental health issues, despite what everyone wants you to think when you’re bonkers, so…I dunno. Though I SUPPOSE it’s possible that could be used in some sort of marketing campaign for the novel, “bonkers author makes good,” that sort of thing.

The Nature of The Novel Offending Liberal White Women
I got nothing against liberal white women, it’s just I worry that the nature of my novel — that of a part-time sex worker who wants to own a community newspaper — may be a little too much for them to stomach in the context of being my literary agent. If I was a transgendered, undocumented woman, rather than a smelly CIS white male, it would be different, but I “don’t have a lot going for me demographically” as one woman recently mentioned when I wanted her to look at the first chapter of my novel.

Am (Almost) Querying: When Failure Is An Option

by Shelt Garner
@sheltgarner

As I lurch towards the querying process, which can be quite brutal from what I can tell, I have to let sink in the fact that I could very well, uh, fail. So, let’s go through the reasons why this might be.

My Crazy Drunk Behavior in Asia
I was kind of a wild animal in South Korea back in the day. And it’s not like I’ve hidden how bonkers I was. It’s just not in my nature to do such a thing. So any liberal woman women literary agent worth her wine is going to smoke out how bonkers and crazy I was back in the day. And that, unto itself, may be enough to steer them clear of me, no matter how much I’ve changed since then.

My Not Doing Anything For About 20 years
This is another tough issue for me to have to address during the querying process. I have not done _anything_ of note since late 2011. That’s….a long time. But, here I am, wanting to bootstrap myself out of this particular situation by writing a break out hit novel. Yet I suppose it’s possible that, by definition, could be the thing that prevents me from getting published. I could write the fucking Bible and because I’m a nobody, I just won’t be taken seriously as an aspiring novelist — and never will be.

My Being Bonkers
I’m kind of a kook. And the more due diligence is done on me by the typically liberal white women who are literary agents the more they’re going to think, “Uh, no.” There remains a lot of a taboo about having mental health issues, despite what everyone wants you to think when you’re bonkers, so…I dunno. Though I SUPPOSE it’s possible that could be used in some sort of marketing campaign for the novel, “bonkers author makes good,” that sort of thing.

The Nature of The Novel Offending Liberal White Women
I got nothing against liberal white women, it’s just I worry that the nature of my novel — that of a part-time sex worker who wants to own a community newspaper — may be a little too much for them to stomach in the context of being my literary agent. If I was a transgendered, undocumented woman, rather than a smelly CIS white male, it would be different, but I “don’t have a lot going for me demographically” as one woman recently mentioned when I wanted her to look at the first chapter of my novel.

Why Anyone Would Care About ROKon Magazine At This Point Is A Deep Mystery

by Shelt Garner
@sheltgarner

ROKon Magazine started in late summer 2006 when I met the now-late Annie Shapiro. The whole saga / drama lasted until about 2008, if I recall correctly. If you want to read the whole messy booze-fueled drama from my POV, here it is:

The story is pretty damn interesting, if I do say so myself. But it was all a long, long time ago — nearly 20 years now, and there’s really no reason for anyone, even me, to be interested in it anymore.

I mean, I daydream about someone like Phoebe Waller-Bridge wanting to write a screenplay based on the story, but, lulz, that’s just a daydream. And I do draw upon what happened back then a GREAT DEAL for the novel I’m writing. But I just find it very curious that anyone — ANYONE — would be interested in ROKon Magazine.

And now that I’m on the cusp of querying, I wonder if white liberal women literary agents doing due diligence on me are going to be really into all my bad behavior back then. All I can say is — I’m sorry. It was a long time ago and I’ve grown so much as person relative to what happened back then that it’s like I’ve had a brain transplant.

Otherwise, you’ll just have to accept me for who I am.

Once More Unto The Breach! — Things Are Moving Really Fast With The Third Draft Of My First Novel

by Shelt Garner
@sheltgarner

I’m on the cusp of gaming out the third draft of my first novel to just about the midpoint any day now. Then, the hard work will be to well, write. The way I develop and write my novels can be so ad hoc, fluid and haphazard that it slows me a down a lot.

The heroine of my novel looks like Morena Baccarin.

In fact, most of the work I’ve been doing since I started developing what is now potentially a seven-novel project — if I don’t croak before it is wrapped up — is just improving my storytelling ability. There is A LOT about nuts-and-blots of structuring a novel-length story with multiple POVs that I know absolutely nothing about.

And now I’m worried that somehow, someway, screenwriters from Hollywood are going to read this blog and cherrypick the best bits of the novel and I’ll wake up to news that a character fitting my heroine is going to be in a movie soon. And, yet, you know, I’m such an extrovert that it’s not like I would be able to be quiet about such things.

I am who I am.

The point is to finish the first novel in the project as quickly as possible. I’m just a broke-ass kook living in the middle of nowhere with a great idea for a novel that I’m willing to put the effort in to finish.

My heroine sports a sleeve tattoo much like the one Megan Fox now has, even though I thought of the idea first!

As of right now, I hope to finish the first novel of this series at some point between now at around July 22, 2024 — the 20th anniversary of my first trek to South Korea.

I also have two other novels that I’m trying to develop. One is the sequel to this first novel, while the other a pandemic themed scifi novel. It’s really, really good. I’ve been fucking around with AI to develop it and it’s made me an “10X” writer….ha!

Yes, I know, in the end, AI will make all writers moot.

‘Rotting Door’

by Shelt Garner
@sheltgarner

It definitely seems as though Russia is primed and ready for another revolution. As I’ve written before, if I was going to take over Russia, I would wrap myself in the imagery of the Russian Empire. I would wait until their was some sort of popular revolt then I would strike — the populace is ready and waiting for something to rally around.

I propose that Russia be a constitutional monarchy with, who knows, maybe Prince Harry being the new Tsar. It seems clear to me that the Russian people need and want the sense of belonging that a new Tsardom would bring with it. Too bad that this is just not a viable proposal, even though on paper it would work.

With global climate change, Russia is going to find itself in a situation where it maybe the the future, even if it doesn’t really want to be. And, to me, making Russia liberal constitutional monarchy would be just the thing to allow Russia to rise to its potential.

It will definitely be interesting to see how things shake out.

Tik-Tok Did It Yet Again

by Shelt Garner
@sheltgarner

I had a relative visit me this weekend and, wouldn’t you know, it definitely seems as though what he had on his mind may have influenced my Tik-Tok feed.

How do you explain that I would suddenly start to see all these videos about woodworking and cutting down trees when that is totally not related to anything I might otherwise be interested in?

But whenever I bring up this as real technology that may be floating around, people think I’m nuts. So the thing I want to do — which is to ask my relative if they’ve been doing any shit with wood of late….would be meaningless. It’s not like they would believe my theory that Tik-Tok (and other Big Tech companies) have developed some sort of practical use for Digital Telepathy.

Screw NeuralLink, Give Me A Mindcap

by Shelt Garner
@sheltgarner

I joking — not joking — still think that it’s at least POSSIBLE that some sort of digital teleplay technology is floating out there in the depths of Big Tech. Which, of course, opens up the possibility of some sort of technology that would do the same shit that Elon Musk wants his NeuralLink to do without drilling into anyone’s brain.

In Arthur C. Clarke’s novel “3001” everyone wears a mindreading device called a Mindcap. I would much, much rather wear a Mindcap than drill a hole in my head for something like a NeuralLink.

And, yet, it seems as though if any form of digital telepathy actually exists, it’s super top secret and never something that would be commercialized for the average person.

Which I think is a shame.

Of Course

by Shelt Garner
@sheltgarner

Now, let me be clear — I’m notorious for taking a little bit of information and running with it. But there is, at least, a scenario whereby Saturday Night Live uses the “hook” of how fucking cold it was in Iowa for the caucuses to have a sketch with Unfrozen Caveman Governor Ron DeSantis.

Makes sense to me, at least. I don’t quite know what I would do if something I predicted actually happened for once. I might faint from excitement and joy.

But I’m really, really grasping at straws on this one. I’m know to make shit up. I suppose only time will tell, huh.

I would get a woman cast member to play “Tiny-D” in caveman make up, given how short he is, relative to how tall he should be to run for POTUS.