A
Scroll
Computational Art Analysis

ArtSleuth

Where connoisseurship meets computation

A framework formalising what connoisseurs have done for centuries — examining the physical evidence a painter leaves on canvas — using self-supervised vision transformers, Bayesian inference, and cross-attention fusion.

Python 3.10+PyTorch DINOv2 + CLIPCross-Attention Gaussian ProcessBayesian Mixture MCP ServerMIT Licence
Live Demo

Analyse Any Painting

Upload an artwork for style classification, artist attribution, forgery screening, workshop decomposition, and temporal dating.

CPU inference — first analysis may take a moment while models load
Capabilities

Six Analytical Axes

Brushstroke Analysis

Structure-tensor analysis and coherence clustering reveal characteristic facture through patch-level DINOv2 embeddings.

Structure TensorDINOv2K-Means

Style Classification

Multi-axis classification along period, school, and genre using CLIP embeddings projected through independently trained linear heads.

CLIP ViT-L/14Multi-AxisCalibrated

Artist Attribution

Cosine-similarity ranking against a reference gallery with Bayesian confidence intervals and temporal cross-validation.

Cosine SimilarityBayesian CI

Anomaly Screening

One-class anomaly detection via Mahalanobis distance with per-dimension z-scores, stress-tested against adversarial forgery.

MahalanobisAdversarialZ-Score

Workshop Decomposition

DPGMM infers how many distinct hands contributed to a painting without requiring the number as input.

DPGMMBayesianNon-parametric

Temporal Dating

Gaussian process regression over dated embeddings estimates when an undated work was most likely produced.

Gaussian ProcessStyle DriftPCA
Methodology

The Morellian Tradition

In the 1870s, Giovanni Morelli proposed that attribution should focus on incidental details — the shape of fingernails, the curl of earlobes, the rendering of drapery folds. These peripheral passages reveal the artist’s hand more reliably than any consciously composed focal passage.

ArtSleuth formalises the Morellian intuition as a feature-extraction problem: the “incidental details” correspond to the low-level textural features that self-supervised vision transformers encode.

Dual Backbone

DINOv2 & CLIP

DINOv2 learns visual representations without labelled data via self-distillation. The resulting feature space encodes texture, directionality, and granularity — the physical surface qualities that connoisseurs evaluate.

CLIP encodes images and text in a shared embedding space. Critical for style classification, where categories like “Baroque” are culturally constructed labels.

Pipeline

Preprocessing & Patch Extraction

Standard ViT preprocessing: resize, centre-crop, ImageNet normalisation. Optional art-specific corrections: varnish correction, craquelure suppression, canvas texture normalisation.

Paintings are divided into patches via Grid, Salient, or Adaptive strategies.

Pipeline

Brushstroke & Style Analysis

Structure tensor eigenvalues yield orientation, coherence, and energy. DINOv2 patch embeddings are clustered to reveal homogeneous brushwork.

Coherence = (λ1 − λ2) / (λ1 + λ2)

CLIP embeddings are projected through three linear heads (period, school, genre) producing calibrated distributions.

Pipeline

Attribution & Forgery Detection

Fused embedding compared via temperature-scaled cosine similarity. Mahalanobis distance for anomaly scoring.

DM(x) = √( (x − μ)T Σ−1 (x − μ) )
Pipeline

Fusion, Temporal & Workshop

Cross-attention lets CLIP queries attend over DINOv2 patch tokens. GP with RBF kernel models temporal drift. Bayesian GMM with Dirichlet process prior infers workshop hands.

Caveats

Limitations

  • Training data bias: Western photographic pre-training bias.
  • Probabilistic: Statistical estimates, not definitive verdicts.
  • Reference dependency: Quality scales with gallery size.
1 / 7

Architecture & Results

Artwork Input
Art-Specific Preprocessing
DINOv2 · Texture
CLIP · Semantics
Cross-Attention Fusion
Brushstroke
Style
Attribution
Screening
Workshop
Temporal
Unified Report
0
Style Acc %
0
Artist Top-1 %
0
Artist Top-5 %
0
Median AUC
BackboneStyleF1ArtistTop-5Genre
DINOv2 ViT-B/1457.5%0.55364.7%90.9%71.0%
CLIP ViT-L/1467.1%0.65674.6%95.9%75.0%
Fusion (frozen)65.0%0.63371.0%94.2%74.2%
Fusion (fine-tuned) †71.6%0.70377.8%96.2%75.1%
Fusion (end-to-end) †72.7%79.0%96.9%76.6%

WikiArt 81,444 images. Macro-averaged. Top 3 rows reproducible via benchmarks/wikiart.py. † Separate training run; training code not included. 125-artist forgery validation: mean AUC 0.958 (CLIP), 0.873 (DINOv2), 0.897 (Fused).

Command Line

# Install
pip install artsleuth

# Quick analysis
artsleuth analyse painting.jpg

# With web UI
pip install artsleuth[web,benchmarks]
artsleuth demo

# Compare two works
artsleuth compare a.jpg b.jpg

Python API

# Full pipeline
from artsleuth import analyse

result = analyse("painting.jpg")
print(result.summary())

# Forgery screening
from artsleuth import ForgeryDetector
det = ForgeryDetector()
result = det.detect(image,
  reference_artist="Vermeer")

References

  1. Morelli, G. (1890). Italian Painters: Critical Studies of Their Works. John Murray.
  2. Berenson, B. (1902). The Study and Criticism of Italian Art. George Bell & Sons.
  3. Ainsworth, M. W. (2005). From Connoisseurship to Technical Art History. Getty Research Journal, 159–176.
  4. Lyu, S., Rockmore, D. & Farid, H. (2004). A digital technique for art authentication. PNAS, 101(49). doi
  5. Johnson, C. R. et al. (2008). Image processing for artist identification. IEEE Signal Proc. Mag., 25(4). doi
  6. Saleh, B. & Elgammal, A. (2016). Large-scale classification of fine-art paintings. JOCCH, 8(4). doi
  7. Vaswani, A. et al. (2017). Attention is all you need. NeurIPS. arXiv
  8. Caron, M. et al. (2021). Emerging properties in self-supervised vision transformers. ICCV. arXiv
  9. Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. ICML. arXiv
  10. Blei, D. M. & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1). doi
  11. Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
  12. Oquab, M. et al. (2024). DINOv2: Learning robust visual features without supervision. TMLR. arXiv
  13. Jose, J. et al. (2025). DINOv2 meets text. CVPR. arXiv
  14. Anonymous (2025). PATCH: heterogeneity of artistic practice in historical paintings. arXiv
  15. Wölfflin, H. (1915). Principles of Art History.