TensorGuard-Lite: Auditing Sovereign AI Claims Through Gradient-Based Model Provenance
Adarsh Mishra
# TensorGuard-Lite
## Problem
* No reliable way to verify if "Sovereign AI" models are genuinely indigenous or fine-tuned foreign models ("open-washing").
* Lack of accountability in state-funded AI compute programs.
## Solution
* White-box AI provenance auditor for open-weight LLMs.
* Uses gradient fingerprints, tokenizer analysis, and governance scoring.
* Runs on Google Colab T4 (16GB VRAM).
## Core Method
### 1. Gradient Fingerprinting
* Extracts deterministic 16-dimensional fingerprints.
* Uses attention, FFN, embedding, and structural features.
* Compares models using cosine similarity and Euclidean distance.
### 2. Token Tax Analysis
* Measures tokenizer fertility across:
* English
* Hindi
* Tamil
* Estimates token inflation and attention-cost overhead.
### 3. Governance Scorecard
* 10 transparency dimensions.
* Generates risk level:
* Low
* Medium
* High
* Unverifiable
## Models Audited
* Llama-3.2-1B
* SmolLM2-1.7B-Instruct
* Qwen2.5-1.5B
* DeepSeek-R1-Distill-Qwen-1.5B
## Key Features
* Training-free auditing
* Deterministic fingerprints
* PCA lineage clustering
* Cosine + Euclidean similarity
* Tokenizer Jaccard similarity
* Governance transparency scoring
* Interactive Gradio dashboard
* JSON / CSV / LaTeX / PNG / SVG exports
## Key Findings
* DeepSeek clusters near Qwen, supporting lineage detection.
* API-only models remain difficult to verify.
* Indic languages show significantly higher tokenization costs than English.
## Limitations
* Requires safetensors / white-box access.
* Similar architectures may create false positives.
* Small tokenizer corpus.
* Best suited for models ≤3B parameters.
## Policy Proposal
**Compute-Conditional Disclosure Policy**
Organizations receiving public AI compute subsidies should disclose:
* Data lineage proofs
* Tokenizer vocabulary
* Model weights for regulators
* Architecture documentation
* Safety evaluation results
## Implementation
* `tensorguard_lite_colab.py`
* `TensorGuard_Lite_Colab.ipynb`
* Google Colab (T4 GPU)
* GPL v3 License
## Repository
https://github.com/Adarsh-Me/Sovergien
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) TensorGuard-Lite: Auditing Sovereign AI Claims Through Gradient-Based Model Provenance
},
author={
Adarsh Mishra
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


