🧠 KoRA: Kolmogorov-inspired Compositional Adapters for Robust Fine-Tuning

KoRA is a novel Parameter-Efficient Fine-Tuning (PEFT) strategy that introduces inter-adapter communication to learn robust, generalizable representations that transfer across domains — addressing a key limitation in methods like LoRA.

📚 Table of Contents

🎯 The Problem: The Brittleness of Specialization
💡 The KoRA Solution: From Specialists to a Coordinated Team
🔧 Architecture: How It Works
📊 Experimental Results
🚀 Getting Started
🗺️ Roadmap & Future Work

🎯 The Problem: The Brittleness of Specialization

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have been revolutionary — allowing us to adapt massive pre-trained models to specific tasks at a fraction of the cost.
However, this efficiency comes with a hidden price: brittleness.

LoRA injects small, independent adapters into each layer.
While highly effective for a single task, these adapters often overfit, learning features that don’t generalize well.
When a LoRA adapter trained on one task is transferred to another (especially across domains), performance drops significantly.

Method	Source Task (CIFAR-100)	Transfer Task (Tiny ImageNet)
LoRA	92.48% (Excellent Specialization)	71.04% (Poor Generalization)
KoRA	83.96% (Controlled Specialization)	97.37% (Superior Generalization)

❓ Can we create an adapter that learns fundamental, transferable knowledge, even if it sacrifices a few points on the source task?

💡 The KoRA Solution: From Specialists to a Coordinated Team

The key limitation of LoRA is isolation — the query, key, and value adapters in a transformer block never communicate.

KoRA changes this paradigm.

Inspired by the Kolmogorov–Arnold Representation Theorem (which states that complex functions can be decomposed into compositions of simpler ones), KoRA introduces a learnable CompositionBlock.

🧩 Analogy

LoRA: A team of three brilliant specialists (Query, Key, Value) who never talk.
KoRA: The same team, but they discuss their findings with a manager (the CompositionBlock), who synthesizes their insights into a unified decision.

This CompositionBlock creates functional dependency between adapters, forcing them to learn a shared, compositional representation of the task.

🔧 Architecture: How It Works

The architecture is a simple but powerful extension of LoRA.

Step 1. Generate Low-Rank Signals

For input x, standard LoRA adapters compute deltas for Query, Key, and Value:

Step 2. Compose the Signals

Instead of applying them directly, concatenate and pass through the CompositionBlock (an MLP):

Step 3. Apply a Holistic Update

This single composed delta is then applied to the model using a learnable gate ( g ):

The gate ( g ) starts at zero, ensuring optimization stability and allowing the model to gradually “turn up the volume” on the compositional signal.

🧮 Figure 1: KoRA CompositionBlock Flow

flowchart TB
    subgraph KoRA_Adapter_Module
        direction TB
        IN([Input x]) --> Q_ADAPTER["Q-Adapter (LoRA)"]
        IN --> K_ADAPTER["K-Adapter (LoRA)"]
        IN --> V_ADAPTER["V-Adapter (LoRA)"]

        Q_ADAPTER --> DELTA_Q("δ_q")
        K_ADAPTER --> DELTA_K("δ_k")
        V_ADAPTER --> DELTA_V("δ_v")

        DELTA_Q --> CONCAT["Concat [δ_q, δ_k, δ_v]"]
        DELTA_K --> CONCAT
        DELTA_V --> CONCAT

        CONCAT --> COMPOSE_BLOCK["CompositionBlock (MLP)"]
        COMPOSE_BLOCK --> DELTA_COMP("δ_comp")
    end

    subgraph Applying_Update
        direction TB
        DELTA_COMP --> GATE_MULT["⊗ (Multiply)"]
        GATE["g (Learnable Gate)"] --> GATE_MULT
        
        V_ORIG["V_original"] --> ADD["⊕ (Add)"]
        GATE_MULT --> ADD
        
        ADD --> V_NEW["V_new"]
    end
    
    IN --> V_ORIG

flowchart LR
subgraph LORA["a) Low-Rank Adaptation (LoRA)"]
  direction TB
  IN_L["Input Embeddings (tokens x d)"] --> ORIG_L["Original Weights (Linear Layer)"]
  ORIG_L --> OUT_L["Output Embeddings (tokens x d)"]

  IN_L --> A_LoRA["Down Projection: A [r x d]"]
  A_LoRA --> B_LoRA["Up Projection: B [d x r]"]
  B_LoRA --> DELTA_L["Delta (tokens x d)"]
  DELTA_L --> SCALE_L["Scale by alpha (α)"]
  SCALE_L --> ADD_L["Elementwise Add (⊕)"]
  ADD_L --> OUT_L
end

subgraph KORA["b) KoRA (this work)"]
  direction TB
  IN_K["Input Embeddings (tokens x d)"] --> BACKBONE["ViT Backbone (Frozen)"]
  BACKBONE --> OUT_K["Backbone Outputs (tokens x d)"]
  BACKBONE -.-> CAPTURE["Captured Inputs (via forward hooks)"]

  CAPTURE --> ADAPTERS["Adapter Banks {Ai, Bi} (low-rank)"]
  ADAPTERS --> PROJ["Adapter Projections (compose to r–d representations)"]
  PROJ --> PSI["Psi (Ψ): Couplings"]
  PROJ --> PHI["Phi (Φ): Composers"]

  PSI --> COMPOSED["Composed Delta (Delta_compose)"]
  PHI --> COMPOSED

  COMPOSED --> SCALE_K["Scale by alpha (α)"]
  SCALE_K --> ADD_K["Add to Backbone Outputs (⊕)"]
  ADD_K --> OUT_K
end

LORA --- KORA

📊 Experimental Results

4.1 In-Domain Specialization (CIFAR-100)

Fine-tuned for 5 epochs on CIFAR-100.

Tuning Method	Params Tuned (%)	CKA Sim.	Accuracy (%)	F1 Score
LoRA (r=8)	1.45	0.73	92.48	0.924
Adapter Fusion	1.45	0.71	92.22	0.922
KoRA (d_comp=4)	1.80	0.76	83.96	0.840
KoRA (d_comp=8)	2.18	0.76	84.19	0.842

🧠 Analysis:
LoRA dominates on single-task performance.
KoRA shows higher CKA similarity, suggesting more structured learning, even though raw accuracy drops slightly — a deliberate trade-off for generalization.

4.2 Cross-Domain Generalization (Tiny ImageNet)

Models pre-trained on CIFAR-100, fine-tuned for one epoch on 1% of Tiny ImageNet.

Tuning Method	Accuracy (%)	F1 Score
LoRA (r=8)	71.04	0.8307
Adapter Fusion	46.67	0.6364
KoRA (d_comp=4)	97.37	0.9867
KoRA (d_comp=8)	98.24	0.9911

🚀 Analysis:
KoRA’s compositional approach achieves near-perfect transfer — dramatically outperforming LoRA.
Suggests that KoRA learns more robust, transferable features.

4.3 Representation Analysis (CKA)

Preliminary CKA (Centered Kernel Alignment) shows that KoRA learns more structured, cross-layer dependencies.

Figure 2:
CKA similarity matrices for KoRA (left) and LoRA (right) between blocks 0, 6, and 11.
KoRA maintains higher similarity between distant layers (0.50 vs 0.45), indicating a more unified feature hierarchy.

4.4 Dense Prediction (NYU Depth V2)

Models pre-trained on CIFAR-100, fine-tuned for one epoch on 1% of NYU Depth V2 (monocular depth estimation).

Method	Metric	Score
LoRA	RMSE	0.2800
	AbsRel	0.5629
KoRA	RMSE	0.3327
	AbsRel	0.6271

📉 Analysis:
LoRA performs better here — showing KoRA’s current bias towards classification tasks.
Future work will adapt KoRA for dense spatial reasoning.

🚀 Getting Started

Prerequisites

Python 3.8+
PyTorch 2.0+
Transformers
timm

Installation

git clone https://github.com/onepunchmonk/kora.git

🗺️ Roadmap & Future Work

The goal is to validate and refine KoRA to establish compositional adaptation as a superior approach for out-of-domain generalization.

🎯 Objective 1: Architectural Refinement

Explore attention-based inter-adapter communication
Experiment with different basis functions for adapter projections

🧪 Objective 2: Extensive Empirical Validation

Extend evaluation to Object Detection (MS COCO)
Extend evaluation to Semantic Segmentation (MS COS & PASCAL VOC)
Conduct ablation studies on CompositionBlock complexity and gate impact
Investigate dense prediction performance further

🧩 Objective 3: Theoretical and Representation Analysis

Perform full-layer CKA analysis
Study Kolmogorov complexity connection by comparing parameter compressibility (KoRA vs LoRA)

🧭 Summary:
KoRA reframes fine-tuning as a cooperative, compositional process, not a set of isolated perturbations — achieving robust generalization and domain transferability far beyond current PEFT methods.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
img		img
src		src
README.md		README.md
kora_research_proposal.pdf		kora_research_proposal.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 KoRA: Kolmogorov-inspired Compositional Adapters for Robust Fine-Tuning

📚 Table of Contents

🎯 The Problem: The Brittleness of Specialization

💡 The KoRA Solution: From Specialists to a Coordinated Team

🧩 Analogy

🔧 Architecture: How It Works

Step 1. Generate Low-Rank Signals

Step 2. Compose the Signals

Step 3. Apply a Holistic Update

🧮 Figure 1: KoRA CompositionBlock Flow

📊 Experimental Results

4.1 In-Domain Specialization (CIFAR-100)

4.2 Cross-Domain Generalization (Tiny ImageNet)

4.3 Representation Analysis (CKA)

4.4 Dense Prediction (NYU Depth V2)

🚀 Getting Started

Prerequisites

Installation

🗺️ Roadmap & Future Work

🎯 Objective 1: Architectural Refinement

🧪 Objective 2: Extensive Empirical Validation

🧩 Objective 3: Theoretical and Representation Analysis

About

Uh oh!

Releases

Packages

Languages

OnePunchMonk/KoRA

Folders and files

Latest commit

History

Repository files navigation

🧠 KoRA: Kolmogorov-inspired Compositional Adapters for Robust Fine-Tuning

📚 Table of Contents

🎯 The Problem: The Brittleness of Specialization

💡 The KoRA Solution: From Specialists to a Coordinated Team

🧩 Analogy

🔧 Architecture: How It Works

Step 1. Generate Low-Rank Signals

Step 2. Compose the Signals

Step 3. Apply a Holistic Update

🧮 Figure 1: KoRA CompositionBlock Flow

📊 Experimental Results

4.1 In-Domain Specialization (CIFAR-100)

4.2 Cross-Domain Generalization (Tiny ImageNet)

4.3 Representation Analysis (CKA)

4.4 Dense Prediction (NYU Depth V2)

🚀 Getting Started

Prerequisites

Installation

🗺️ Roadmap & Future Work

🎯 Objective 1: Architectural Refinement

🧪 Objective 2: Extensive Empirical Validation

🧩 Objective 3: Theoretical and Representation Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages