VideoMask SDK: Video → Segmentation Datasets (SAM-3, Python)

Hey HF community :waving_hand:

Just shipped VideoMask SDK — a Python toolkit for turning raw videos into segmentation-ready datasets.

Storyboard Hero GIF

What it does:

  • Takes video files → extracts frames → runs SAM-3 → exports clean masks + metadata

  • Pluggable backends (SAM-3 with GPU, dummy for testing)

  • Simple folder output: frames_raw/, masks/, metadata.json

  • CLI and Python API

Why I built it: Working on Physical AI datasets, I kept hitting the same bottleneck: getting from raw videos to usable segmentation masks required too much glue code. SAM-3 is great but wiring it up reliably is annoying.

Example:

from videomask.pipeline.segmenter import VideoSegmenter

seg = VideoSegmenter(
    backend="sam3",
    fps=1,
    resize=512,
    backend_kwargs={"text_prompt": "person", "device": "cuda"}
)

seg.run("video.mp4", out_dir="output/")

Includes a Colab notebook for GPU workflows.

This is v0.1 — planning COCO export, HF dataset integration, and concept-level backends next.

Repo: GitHub - msunbot/videomask: A lightweight Python SDK that turns raw videos into segmentation-ready datasets using ffmpeg, SAM-3, and pluggable backends.
Colab: Google Colab

Would love feedback from folks building vision datasets or working with SAM3! Planning HF Hub integration in v0.2.

1 Like