Computational Art Analysis

ArtSleuth

Where connoisseurship meets computation

A framework formalising what connoisseurs have done for centuries — examining the physical evidence a painter leaves on canvas — using self-supervised vision transformers, Bayesian inference, and cross-attention fusion.

Python 3.10+PyTorch DINOv2 + CLIPCross-Attention Gaussian ProcessBayesian Mixture MCP ServerMIT Licence

Live Demo

Analyse Any Painting

Upload an artwork for style classification, artist attribution, forgery screening, workshop decomposition, and temporal dating.

CPU inference — first analysis may take a moment while models load

Capabilities

Six Analytical Axes

Brushstroke Analysis

Structure-tensor analysis and coherence clustering reveal characteristic facture through patch-level DINOv2 embeddings.

Structure TensorDINOv2K-Means

Style Classification

Multi-axis classification along period, school, and genre using CLIP embeddings projected through independently trained linear heads.

CLIP ViT-L/14Multi-AxisCalibrated

Artist Attribution

Cosine-similarity ranking against a reference gallery with Bayesian confidence intervals and temporal cross-validation.

Cosine SimilarityBayesian CI

Anomaly Screening

One-class anomaly detection via Mahalanobis distance with per-dimension z-scores, stress-tested against adversarial forgery.

MahalanobisAdversarialZ-Score

Workshop Decomposition

DPGMM infers how many distinct hands contributed to a painting without requiring the number as input.

DPGMMBayesianNon-parametric

Temporal Dating

Gaussian process regression over dated embeddings estimates when an undated work was most likely produced.

Gaussian ProcessStyle DriftPCA

Methodology

The Morellian Tradition

In the 1870s, Giovanni Morelli proposed that attribution should focus on incidental details — the shape of fingernails, the curl of earlobes, the rendering of drapery folds. These peripheral passages reveal the artist’s hand more reliably than any consciously composed focal passage.

ArtSleuth formalises the Morellian intuition as a feature-extraction problem: the “incidental details” correspond to the low-level textural features that self-supervised vision transformers encode.

Dual Backbone

DINOv2 & CLIP

DINOv2 learns visual representations without labelled data via self-distillation. The resulting feature space encodes texture, directionality, and granularity — the physical surface qualities that connoisseurs evaluate.

CLIP encodes images and text in a shared embedding space. Critical for style classification, where categories like “Baroque” are culturally constructed labels.

Pipeline

Preprocessing & Patch Extraction

Standard ViT preprocessing: resize, centre-crop, ImageNet normalisation. Optional art-specific corrections: varnish correction, craquelure suppression, canvas texture normalisation.

Paintings are divided into patches via Grid, Salient, or Adaptive strategies.

Pipeline

Brushstroke & Style Analysis

Structure tensor eigenvalues yield orientation, coherence, and energy. DINOv2 patch embeddings are clustered to reveal homogeneous brushwork.

Coherence = (λ₁ − λ₂) / (λ₁ + λ₂)

CLIP embeddings are projected through three linear heads (period, school, genre) producing calibrated distributions.

Pipeline

Attribution & Forgery Detection

Fused embedding compared via temperature-scaled cosine similarity. Mahalanobis distance for anomaly scoring.

D_M(x) = √( (x − μ)^T Σ⁻¹ (x − μ) )

Pipeline

Fusion, Temporal & Workshop

Cross-attention lets CLIP queries attend over DINOv2 patch tokens. GP with RBF kernel models temporal drift. Bayesian GMM with Dirichlet process prior infers workshop hands.

Caveats

Limitations

Training data bias: Western photographic pre-training bias.
Probabilistic: Statistical estimates, not definitive verdicts.
Reference dependency: Quality scales with gallery size.

1 / 7

Architecture & Results

Artwork Input

↓

Art-Specific Preprocessing

↓

DINOv2 · Texture

CLIP · Semantics

↓

Cross-Attention Fusion

↓

Brushstroke

Style

Attribution

Screening

Workshop

Temporal

↓

Unified Report

0

Style Acc %

0

Artist Top-1 %

0

Artist Top-5 %

0

Median AUC

Backbone	Style	F1	Artist	Top-5	Genre
DINOv2 ViT-B/14	57.5%	0.553	64.7%	90.9%	71.0%
CLIP ViT-L/14	67.1%	0.656	74.6%	95.9%	75.0%
Fusion (frozen)	65.0%	0.633	71.0%	94.2%	74.2%
Fusion (fine-tuned) †	71.6%	0.703	77.8%	96.2%	75.1%
Fusion (end-to-end) †	72.7%	—	79.0%	96.9%	76.6%

WikiArt 81,444 images. Macro-averaged. Top 3 rows reproducible via benchmarks/wikiart.py. † Separate training run; training code not included. 125-artist forgery validation: mean AUC 0.958 (CLIP), 0.873 (DINOv2), 0.897 (Fused).

Command Line

          
          # Install
pip install artsleuth

          # Quick analysis
artsleuth analyse painting.jpg

          # With web UI
pip install artsleuth[web,benchmarks]
artsleuth demo

          # Compare two works
artsleuth compare a.jpg b.jpg

Python API

          
          # Full pipeline
from artsleuth import analyse

          result = analyse("painting.jpg")
print(result.summary())

          # Forgery screening
from artsleuth import ForgeryDetector

          det = ForgeryDetector()
result = det.detect(image,
  reference_artist="Vermeer")

References

Morelli, G. (1890). Italian Painters: Critical Studies of Their Works. John Murray.
Berenson, B. (1902). The Study and Criticism of Italian Art. George Bell & Sons.
Ainsworth, M. W. (2005). From Connoisseurship to Technical Art History. Getty Research Journal, 159–176.
Lyu, S., Rockmore, D. & Farid, H. (2004). A digital technique for art authentication. PNAS, 101(49). doi
Johnson, C. R. et al. (2008). Image processing for artist identification. IEEE Signal Proc. Mag., 25(4). doi
Saleh, B. & Elgammal, A. (2016). Large-scale classification of fine-art paintings. JOCCH, 8(4). doi
Vaswani, A. et al. (2017). Attention is all you need. NeurIPS. arXiv
Caron, M. et al. (2021). Emerging properties in self-supervised vision transformers. ICCV. arXiv
Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. ICML. arXiv
Blei, D. M. & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1). doi
Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
Oquab, M. et al. (2024). DINOv2: Learning robust visual features without supervision. TMLR. arXiv
Jose, J. et al. (2025). DINOv2 meets text. CVPR. arXiv
Anonymous (2025). PATCH: heterogeneity of artistic practice in historical paintings. arXiv
Wölfflin, H. (1915). Principles of Art History.