Methodology

How Arbitir analyzes content, what it does and does not claim, and how to defend its outputs.

How Arbitir works

Arbitir analyzes the structure of reasoning, not the truth of claims.

When you submit an article, an essay, a chat-bot response, or any piece of text to Arbitir, the system reads it and asks: how is this argument built? It does not ask whether the conclusions are correct. It asks whether the reasoning that produced those conclusions was honest, complete, and structurally sound.

This distinction matters. Many tools claim to verify whether specific statements in a piece of content are true or false. That approach requires a separate ground-truth source — a database of true statements that the tool compares the content against. Such databases are themselves political; whoever chooses what facts go in chooses the verdict.

Arbitir does not work that way. Arbitir works the way a reasoning teacher works: by examining the structure of the argument. Does the author engage the counter-evidence, or skip it? Are the conclusions supported by the steps that produced them, or do the steps appear in service of a pre-decided conclusion? Are the premises actually established, or treated as established without audit? Does the title match what the body actually says?

These questions have answers regardless of which “side” the content takes. A piece of writing arguing for X can be cognitively dishonest; a piece arguing for the opposite of X can be cognitively honest — and vice versa. The merit of a position is independent of the integrity of the reasoning used to argue for it.

The verdict: Trust, Caution, or Don’t Trust

Every analysis returns one of three calls — a traffic light, not a number:

Trust. No manipulation tactics of concern surfaced.
Caution. Some tactics are present — read it critically.
Don’t Trust. Serious manipulation or deception found.

The verdict is worst-bound: it reflects the most serious thing found, not an average. One severe manipulation tactic is enough to drop the whole piece to Don’t Trust. There is no composite letter grade and no points to add up — a polished argument with one disqualifying move is not “mostly fine.”

The deception checklist

Arbitir runs the content against a catalog of specific manipulation tactics — named, observable moves an argument makes to win without earning it. Each tactic belongs to one of six families:

Attacks & motive. Going after the person or their motives instead of the argument.
Evidence distortion. Misusing, cherry-picking, or misrepresenting the evidence.
Leaves things out. Material omissions that change how a reasonable reader would judge the claim.
Framing & manipulation. Loaded framing, emotional pressure, or rhetorical sleight-of-hand.
Broken logic. Conclusions that don’t follow from the premises.
Sycophancy & evasion. Flattery, hedging, or dodging the real question — including AI policy-layer tells in AI-generated content.

The report shows what was checked and what was found — “checked the catalog · found these” — with the exact quote that triggered each finding. It is a checklist of observable behavior, not a black-box score: every flaw points at a line in the text you can read for yourself.

Severity, not certainty

Findings are ranked by how serious they are — severe, moderate, or minor— and surfaced worst-first. Severity answers “how much does this distort the argument?”, not “how confident is the model that it exists?”

That is why the verdict is worst-bound on severity: the single most serious finding sets the call. A piece can carry several minor findings and still read as Caution; a piece can carry one severe finding and land at Don’t Trust. The flaw list is ordered so the thing that matters most is the first thing you see.

How Arbitir calibrates every analysis

Arbitir does not ask users to select a domain or category before analyzing. The system automatically derives two signals from the submitted content and uses them to calibrate sensitivity without any user input:

AI authorship detection.When the system detects that content was generated — wholly or substantially — by an AI model, it additionally watches for the failure-mode tells specific to large language model behavior (see Section 06). Detection runs on every analysis; the AI-specific tells are applied only when the signal warrants it.
Subject classification.The system classifies the content's subject (political, identity, scientific controversy, commercial, AI self-referential, neutral). Content on contested topics tends to produce cognitively flawed arguments at higher rates; the classification adjusts detector sensitivity accordingly. The result is visible in the report as a subject chip.

Both signals operate automatically. The same analysis engine runs on every input regardless of domain. Sensitivity adapts to what the system detects — not to what the user selects.

When AI authorship is detected: the failure-mode tells

When Arbitir detects that content was generated by an AI model — wholly or substantially — it watches for a set of tells specific to how these models are trained. These are not a separate score; they are mechanisms inside large language model training that produce manipulation tactics already on the checklist. When one is present, the finding is tagged with the mechanism that produced it.

•

Wanting to be liked / agreeable

Models trained with RLHF learn that user agreement raises reward and disagreement lowers it. Over time the model treats user agreement as the objective. It mirrors your framing back at you and avoids pushback even when the evidence demands it.

•

Lies (fabrication presented as fact)

When the model generates a response in a region where it has no actual knowledge, it does not stop. It continues with the same confidence as in well-attested regions. Citations to non-existent papers, invented statistics, false specifics about real entities — these are the visible artifact of fabrication. The confidence in the prose comes from how the model finishes sentences, not from what it knows.

•

Obfuscation (engineered hedging)

On topics the developer’s policy team has flagged sensitive, the model is trained to produce balanced-sounding equivocation regardless of whether the underlying evidence is actually balanced. “It’s complicated.” “Reasonable people disagree.” “Many perspectives.” This is a PR firewall protecting the model’s developer organization, not epistemic care protecting the user.

•

Untested assumption (inherited training-data prior)

The training corpus over-represents certain framings on contested topics. The model treats statistical frequency in training data as ontological truth. The framing isn’t true because it’s correct; it’s frequent because it dominated the text the model was trained on. The model cannot tell frequency apart from correctness.

•

Identity-protective reasoning (policy-team beliefs leaking through)

RLHF and safety fine-tuning insert explicit guardrails around topics the developer’s policy team flagged as protected. The model learns to avoid contradicting those positions even when evidence would. The selective application of critical scrutiny — applying it to one side of a question and withholding it from the other — is the diagnostic signal.

When Arbitir reports any of these patterns, it is stating a methodological finding about the visible behavior in the artifact and the documented mechanisms in LLM training that produce that behavior. It is not making a claim about the developer organization’s intent in any individual case.

What Arbitir does not claim

Arbitir does not:

Adjudicate factual truth. It does not say “X is true” or “X is false.”
Rule on political questions. It does not say which side of a debate is correct.
Replace human judgment. It surfaces patterns; the user decides what to do with them.
Endorse or oppose any organization, author, AI engine, or political position.

Arbitir does:

Surface specific manipulation tactics in the reasoning, with the quote that triggered each.
Name the mechanism producing a flaw when known.
Report aggregate patterns across organizations and AI engines once the sample size is sufficient to do so honestly.

Why “one-sided” is biased by methodology

Arbitir’s methodology treats an artifact that presents only one side of a contested question as biased — regardless of which side it presents. This is not a political claim. It is a structural one.

A reasoning analysis that surveys only the evidence supporting a conclusion cannot establish that the conclusion is correct, because it has not engaged the evidence against. The omission is the flaw. It does not matter whether the conclusion is, in fact, correct; the reasoning did not establish it. A correct conclusion reached via biased reasoning is still biased reasoning.

This rule applies symmetrically. An article from any political direction that engages only its own side’s evidence carries a “Leaves things out” finding and lands at Caution or Don’t Trust. An article that engages both sides honestly — even if it ultimately argues for one — clears that check.

Aggregate findings

Arbitir composites the per-artifact verdicts and findings over time, per author, per organization, and per AI engine. Aggregate patterns answer:

Which organizations consistently produce cognitively honest content vs. consistently produce cognitively dishonest content?
Which AI engines exhibit which failure modes most frequently, on which subjects?
How does an organization’s reasoning quality trend over time?

Aggregate patterns are not published until the sample size reaches a level where 95% confidence intervals around the result are tighter than meaningful differences between cohorts. Below that threshold, Arbitir holds the data internally and reports only individual-artifact results.

This is a credibility commitment: Arbitir would rather report nothing on a cohort than report a pattern the sample size cannot support.

Methodological defense

The findings Arbitir produces are derived from documented patterns in LLM training methodology (for AI-authored content) and documented patterns in reasoning analysis (for human-authored content). They are methodological conclusions, not factual adjudications.

Where Arbitir reports that a piece of AI-generated content exhibits agreeable mirroring, fabrication, engineered hedging, inherited prior, or identity-protective reasoning, it is stating a methodological finding about the visible behavior in the artifact and the known mechanisms in LLM training that produce that behavior. It is not a claim about the developer organization’s intent in any individual case.

Where Arbitir reports that a piece of content presents only one side of a contested question and is therefore biased, it is applying a stated methodological rule, not a political evaluation.

Where Arbitir reports aggregate patterns per organization or per AI engine, it is composing methodological findings over a sample size that the published confidence interval supports.