Hey HF community ![]()
Just shipped VideoMask SDK — a Python toolkit for turning raw videos into segmentation-ready datasets.

What it does:
-
Takes video files → extracts frames → runs SAM-3 → exports clean masks + metadata
-
Pluggable backends (SAM-3 with GPU, dummy for testing)
-
Simple folder output:
frames_raw/,masks/,metadata.json -
CLI and Python API
Why I built it: Working on Physical AI datasets, I kept hitting the same bottleneck: getting from raw videos to usable segmentation masks required too much glue code. SAM-3 is great but wiring it up reliably is annoying.
Example:
from videomask.pipeline.segmenter import VideoSegmenter
seg = VideoSegmenter(
backend="sam3",
fps=1,
resize=512,
backend_kwargs={"text_prompt": "person", "device": "cuda"}
)
seg.run("video.mp4", out_dir="output/")
Includes a Colab notebook for GPU workflows.
This is v0.1 — planning COCO export, HF dataset integration, and concept-level backends next.
Repo: GitHub - msunbot/videomask: A lightweight Python SDK that turns raw videos into segmentation-ready datasets using ffmpeg, SAM-3, and pluggable backends.
Colab: Google Colab
Would love feedback from folks building vision datasets or working with SAM3! Planning HF Hub integration in v0.2.