Vis-Sieve Demonstration

Important context
The public web demo (https://iszhiyangwang.github.io/MMLLA/) is populated with the VisImages dataset (≈ 35 k chart and diagram excerpts) to ensure open evaluation and repeatability.
Institutional corpora (e.g., Princeton 2022–2023, ≥ 11 k papers, ≥ 15 k figures) can be swapped into the same interface with minimum code changes.

Vis-Sieve is a pipeline and interface for surveying, annotating, and interactively exploring large-scale collections of scientific figures. The system supports visualization-service providers by enabling:

Facility & tool planning — evidence-based decisions on software/hardware investment.
Technique discovery — rapid browsing of exemplar charts beyond the provider’s core domain.
Trend analysis — longitudinal views of chart-type adoption.
Expert identification — spotting advanced visualization practitioners for collaboration or hiring.

System Overview

Phase	Key Steps	Technologies
1 · Data Acquisition	Harvest PDFs + metadata via OpenAlex → store in DuckDB	Python · Playwright
2 · Figure Extraction	`pdffigures2` ⇢ image + caption pairs; multipart detection via VisImages-Detection (Faster R-CNN)	Java/Golang · PyTorch
3 · Automated Annotation	Zero-shot chart-type labeling with GPT-4o-mini (image + caption prompt)	OpenAI API
4 · Interactive Visualization	• 2 D Faceted Browser (filter/search/sort)
	• 3 D Exploration (WebGL)	D3.js · Three.js · Observable

35 ,016 VisImages fragments auto-labeled in ≈ 90 min for $32.69 in API cost.
Manual spot-checks (n = 300) show 91.2 % mean accuracy; confusion-matrix comparison with the native VisImages labels shows an overall 89 % agreement after multipart handling.

Reproducibility & Extensibility

Dataset agnostic. Swap in any institution’s PDF corpus; only a config file changes.
Modular annotation. Replace GPT-4o with any vision-language model or human-in-the-loop step.
Scalable front-end. Tiled-montage loading keeps runtime memory low even for collections > 30 k.

Vis-Sieve Demonstration

System Overview

2D Faceted Browser

3D Free-Exploration Interface

Accuracy & Efficiency Highlights

Reproducibility & Extensibility