Abstract
This paper presents WIBA 2.0 (What Is Being Argued), a comprehensive framework for training and deploying language models specialized in argumentation mining. We describe the complete pipeline from dataset organization through DSPy-based reasoning generation, supervised fine-tuning with LoRA adapters, optional reinforcement learning refinement via GRPO, and final model merging for vLLM deployment. The framework leverages formal argumentation theory from multiple academic sources including Walton's argumentation schemes, Toulmin's warrant model and an examination of argumentation reasoning. Our approach produces vLLM-compatible models capable of hierarchical argument analysis with structured JSON outputs enforced through guided decoding.
Keywords: Argumentation Mining, Large Language Models, DSPy, LoRA Fine-tuning, vLLM, Stance Classification, Topic Extraction
1. Introduction
Argumentation mining, the automatic identification and analysis of argumentative structures in text, represents a critical capability for applications ranging from fact-checking to deliberative democracy platforms. This technical report documents the complete methodology for creating the WIBA argument analysis model, a vLLM-compatible system that performs three interrelated tasks:
- Argument Detection: Binary classification determining whether text contains an argument (claim + premises) or not
- Topic Extraction: Hierarchical identification of fine-grained and broad topics being argued
- Stance Classification: Determining the position (Favor/Against) taken toward identified topics
- Argument Scheme Classification: Determining the argumentation scheme a text uses to put forward a perspective
- Argument Type Classification: Determining the type of argument that is being made
The distinguishing features of our approach include:
- A multi-stage DSPy pipeline grounded in formal argumentation theory for human-in-the-loop data augmentation and labeling
- Hierarchical topic and stance modeling (fine and broad granularity)
- Optimized deployment via vLLM with guided JSON decoding
2. Theoretical Foundations
2.1 Formal Argumentation Framework
Our approach synthesizes several theoretical frameworks from the argumentation literature:
Walton's Argumentation Schemes (Walton, Reed & Macagno, 2008): We implement 11 classical argument scheme patterns including:
argument_from_authority: Citing expert testimony
argument_from_analogy: Reasoning from similarity
causal_argument: Claiming causal connections
argument_from_consequences: Arguing based on outcomes
practical_reasoning: Means-end reasoning
moral_argument: Value-based reasoning
Toulmin Model (Toulmin, 2003): Our system extracts and reconstructs warrants—the general rules licensing inferences from premises to conclusions. Warrants may be explicit or implicit, requiring reconstruction when not stated.
Defeasible Reasoning (Pollock, 1987): We analyze defeasible justification structures, identifying epistemic hedges, modal qualifiers, and potential defeaters (undercutting and rebutting) as additional argumentation indicators.
2.2 Task Definitions
Following Mohammad et al. (2016) for stance detection conventions:
Definition 1 (Argument): A text contains an argument if and only if it includes at least one claim (conclusion/assertion) AND at least one premise (evidence/reasoning supporting the claim).
Definition 2 (Hierarchical Topics):
- topic_fine: The specific issue being argued (e.g., "vaccine mandates")
- topic_broad: The broader policy domain (e.g., "Healthcare")
Definition 3 (Hierarchical Stance):
- stance_fine: Position on the specific topic (Favor | Against | NoArgument)
- stance_broad: Position on the broader policy area (may differ from stance_fine)
3. Dataset Organization
3.1 Training Data Sources
The primary training data derives from multiple annotated corpora:
UKP Lima Training Data: Primary source containing sentence-level annotations with columns for sentence, annotation, topic (broad), wiba_topics (fine), and wiba_stance. Binary label mapping: "NoArgument" preserved, all others mapped to "Argument".
IBM ArgQual Dataset: Used for validation, filtered for test=True. Provides additional NoArgument examples.
3.2 Hierarchical Annotation Structure
Each training example contains:
{
'text': str, // Input text
'label': str, // 'Argument' | 'NoArgument'
'topic_fine': str, // Specific topic (1-3 words)
'topic_broad': str, // Policy domain
'stance_fine': str, // 'Favor' | 'Against' | 'NoArgument'
'stance_broad': str // May differ from stance_fine
}
4. DSPy-Based Reasoning Generation
4.1 Architecture Overview
The system employs a "Separated Concerns Architecture" implementing a multi-stage DSPy pipeline. Each stage is defined as a DSPy Signature with typed input/output fields.
4.2 Stage 1: Structured Argument Analysis
The StructuredArgumentAnalysis signature performs initial decomposition:
Discourse-Level Analysis: arguer, epistemic_stance, speech_act
Claim Analysis: claim_text (JSON list), claim_type (factual | evaluative | policy | definitional | causal | comparative)
Premise Analysis: premises (JSON list), premise_types per Walton's taxonomy
Warrant Analysis: argument_scheme, warrant_explicit, warrant_reconstruction
4.3 Stage 2-3: Verification and Evaluation
The ArgumentConstructionVerification signature applies formal validity checks including premise-claim independence, inference validity, proof burden assessment, and defeater handling.
The FormalArgumentEvaluation signature produces final classifications with gate check logic: IF claim_text has ≥1 element AND premises has ≥1 element, then is_argument = "Argument".
4.4 Stage 4-7: Topic, Stance, and Synthesis
Subsequent stages handle topic extraction, stance reasoning chains, broad stance mapping, and final hierarchical synthesis with consistency validation.
5. Training Data Formatting
5.1 Schema Types
Three output schemas are supported:
DETECT Schema (Minimal):
{"is_argument": true, "confidence": 0.95}
COMPREHENSIVE Schema (Core Fields):
{
"is_argument": true,
"claims": ["Claim text here"],
"premises": ["Premise text here"],
"topic_fine": "specific topic",
"topic_broad": "Policy Domain",
"stance_fine": "Favor",
"stance_broad": "Favor",
"argument_type": "Inductive",
"argument_scheme": "argument_from_example",
"confidence": 0.92
}
6. Fine-Tuning Pipeline
6.1 Base Model Selection
The framework supports multiple Qwen model variants including Qwen2.5-3B-Instruct (default), Qwen3-4B-Instruct-2507, and Qwen3-8B.
6.2 LoRA Configuration
Parameter-Efficient Fine-Tuning via Low-Rank Adaptation with r=32, lora_alpha=64, targeting attention layers (q_proj, k_proj, v_proj, o_proj) and FFN layers (gate_proj, up_proj, down_proj).
6.3 Training Hyperparameters
| Parameter | Reasoning Mode | Non-Reasoning Mode |
| Epochs | 7 | 3 |
| Learning Rate | 1e-4 | 5e-5 |
| LoRA Dropout | 0.25 | 0.15 |
7. Model Deployment
7.1 Merging Process
The merge_and_save_for_vllm() function prepares the model by loading the base model in FP16, merging LoRA adapters, and saving the merged model and tokenizer.
7.2 vLLM Deployment
The --guided-decoding-backend outlines flag enables JSON schema enforcement at generation time, ensuring outputs conform to expected structure without post-processing failures.
8. Evaluation Methodology
8.1 Topic Evaluation: BILUO Sequence Tagging
Topics are evaluated using sequence labeling methodology with span finding, token alignment, BILUO to BIO conversion, and token-level F1 calculation.
8.2 Task-Specific Metrics
| Task | Primary Metric |
| DETECT | F1 (is_argument) |
| EXTRACT | BERTScore (topic similarity) |
| STANCE | F1 (stance classification) |
9. Conclusion
This paper has presented the complete methodology for creating WIBA, a vLLM-compatible model for argument detection, topic extraction, and stance classification. Key contributions include:
- Theoretically-Grounded Architecture: Multi-stage DSPy pipeline implementing formal argumentation theory from Walton, Toulmin, Prakken, and Pollock.
- Hierarchical Analysis: Fine and broad granularity for both topics and stances, enabling nuanced argument understanding.
- Flexible Output Schemas: Three schema types (detect, comprehensive, full) supporting different deployment requirements.
- Optimized Training Pipeline: LoRA fine-tuning with auxiliary heads for clean gradients, optional GRPO refinement with asymmetric reward shaping.
- Production-Ready Deployment: Automated merging and vLLM deployment with guided JSON decoding for reliable structured outputs.
References
- Christiano, P. F., et al. (2017). Deep reinforcement learning from human preferences. NeurIPS.
- Copi, I. M., et al. (2016). Introduction to Logic (14th ed.).
- Dung, P. M. (1995). On the acceptability of arguments. Artificial Intelligence.
- Gordon, T. F., & Walton, D. (2009). Proof burdens and standards. Argumentation in AI and Law.
- Mohammad, S., et al. (2016). SemEval-2016 Task 6: Detecting stance in tweets. SemEval.
- Pollock, J. L. (1987). Defeasible reasoning. Cognitive Science.
- Prakken, H. (2010). An abstract framework for argumentation. Argument & Computation.
- Toulmin, S. E. (2003). The Uses of Argument (Updated ed.).
- Walton, D., Reed, C., & Macagno, F. (2008). Argumentation Schemes.