CVPR Demo Track

non-profit

http://cvpr2022.thecvf.com/

Activity Feed Request to join this org

AI & ML interests

CVPR Demo Track @ CVPR 2022

Recent Activity

victor submitted a paper 19 days ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

ma-xu authored a paper 20 days ago

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

liangyuch authored a paper 27 days ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

View all activity

Nymbo

posted an update 1 day ago

Post

4225

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

victor

submitted a paper to Daily Papers 19 days ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published 20 days ago • 43

ma-xu

authored a paper 20 days ago

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

Paper • 2602.09439 • Published Feb 10 • 13

noamrot

submitted a paper to Daily Papers 29 days ago

SemanticMoments: Training-Free Motion Similarity via Third Moment Features

Paper • 2602.09146 • Published Feb 9 • 21

ma-xu

submitted a paper to Daily Papers about 1 month ago

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

Paper • 2602.09439 • Published Feb 10 • 13

victor

posted an update about 2 months ago

Post

1252

Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill

gagan3012

authored 2 papers 2 months ago

From RAG to Agentic RAG for Faithful Islamic Question Answering

Paper • 2601.07528 • Published Jan 12 • 2

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics

Paper • 2601.04946 • Published Jan 8

noamrot

authored a paper 2 months ago

CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature

Paper • 2601.03319 • Published Jan 6 • 53

Nymbo

posted an update 2 months ago

Post

2486

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

Nymbo

posted an update 3 months ago

Post

2795

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

victor

posted an update 3 months ago

Post

3449

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

Nymbo

posted an update 4 months ago

Post

5277

🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)

Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool

The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot

1 reply

yumingj

authored a paper 4 months ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published Nov 21, 2025 • 28

BwZhang

authored a paper 4 months ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113

noamrot

authored a paper 4 months ago

Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising

Paper • 2511.08633 • Published Nov 9, 2025 • 55

abidlabs

authored 3 papers 4 months ago

posted an update 4 months ago

Post

10175

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

AI & ML interests

Recent Activity

Team members 265

CVPR's activity