Vis-Sieve Demonstration

Important context
The public web demo (https://iszhiyangwang.github.io/MMLLA/) is populated with the VisImages dataset (≈ 35 k chart and diagram excerpts) to ensure open evaluation and repeatability.
Institutional corpora (e.g., Princeton 2022–2023, ≥ 11 k papers, ≥ 15 k figures) can be swapped into the same interface with minimum code changes.

Vis-Sieve is a pipeline and interface for surveying, annotating, and interactively exploring large-scale collections of scientific figures. The system supports visualization-service providers by enabling:


System Overview

Phase Key Steps Technologies
1 · Data Acquisition Harvest PDFs + metadata via OpenAlex → store in DuckDB Python · Playwright
2 · Figure Extraction pdffigures2 ⇢ image + caption pairs; multipart detection via VisImages-Detection (Faster R-CNN) Java/Golang · PyTorch
3 · Automated Annotation Zero-shot chart-type labeling with GPT-4o-mini (image + caption prompt) OpenAI API
4 · Interactive Visualization 2 D Faceted Browser (filter/search/sort)
3 D Exploration (WebGL) D3.js · Three.js · Observable

2D Faceted Browser

The 2D dashboard presents a tabular view of figures with sortable columns (chart type, year, venue, etc.) and facet filters. When the demo runs on VisImages, all VisImages taxonomy classes are available for instant filtering.


3D Free-Exploration Interface

Embedding & Layout
For large collections the system first embeds each figure using CLIP features, then Treemap-inspired packing — leverages chart-type frequencies to partition space and minimize overlap.

Interaction
– Pan/zoom/rotate in WebGL.
– Hover → live thumbnail + metadata.
– Click → deep-link back to the 2 D record.


Accuracy & Efficiency Highlights


Reproducibility & Extensibility