EdgeTAM: The On-Device Track-Anything AI Engine

Deploy EdgeTAM on your mobile or embedded device and instantly segment, track and mask any object in video with high quality and blazing speed. From phones to IoT cameras, EdgeTAM brings advanced video understanding to every edge.

What Is EdgeTAM?

EdgeTAM is a breakthrough promptable segmentation & tracking engine designed for edge-device execution (e.g., iPhone, mobile GPU, embedded boards). Unlike bulky cloud-only models, you simply give it a point, box or mask prompt and it tracks anything you select—across time—with minimal latency.

  • Prompt → Mask & Track (Instant)
    Provide a click, bounding box or scribble prompt on a frame; EdgeTAM propagates object masks forward and backward across video at real-time speeds—ideal for live AR, mobile apps or smart cameras.
  • Mobile-Optimised & High-Throughput
    EdgeTAM runs at ~16 FPS on iPhone 15 Pro Max and is >20× faster than standard SAM 2 on the same device, enabling true edge AI without server latency.
  • Video-First Temporal Intelligence
    Built with a novel 2D Spatial Perceiver and distilled video memory module, EdgeTAM seamlessly handles object masks over time, not just static images—making it ideal for video workflows.

Typical Use Cases

EdgeTAM fits into a wide array of video-centric workflows—from mobile AR, smart cameras, robotics to creative post-production and more.

Augmented Reality & Mobile Apps

Enable real-time object tracking and masking in AR games, mobile filters or interactive experiences. EdgeTAM runs natively on device so users stay responsive and offline-capable.

Smart Cameras & IoT Analytics

Deploy EdgeTAM on surveillance cameras or embedded vision boards to track people, vehicles or equipment continuously—on-device, light on power, high on performance.

Creative Post-Production & VFX

Let video editors, VFX artists or content creators quickly mask and track objects in footage—EdgeTAM speeds up rotoscoping, motion graphics and compositing workflows.

Why Choose EdgeTAM

For creators, developers and enterprises who demand edge-level video understanding, EdgeTAM delivers unmatched efficiency, versatility and deployment freedom for promptable segmentation and tracking.

Design Prompts Instead of Pre-Labels

Rather than relying on fixed object classes, EdgeTAM accepts free-form prompts (points, boxes, masks) and tracks user-defined targets over time—ensuring flexibility across domains.

Fine-Grained Temporal Control

EdgeTAM preserves spatial memory and leverages both global and patch-level cues to deliver accurate mask propagation across video frames—reducing drift and preserving detail.

Seamless Input: Image or Video Pipeline

Whether your asset is a single image, live camera feed, or pre-recorded clip, EdgeTAM adapts. Use static segmentation or full-video tracking in the same pipeline.

Edge Deployment & Real-Time Streaming

Optimised for mobile and embedded hardware, EdgeTAM operates with under-100 ms latency on edge devices—no cloud required—making it ideal for AR, robotics and live analytics.

Commercial-Friendly Licence & Integration

Released under the Apache 2.0 licence, EdgeTAM supports commercial use, modification and deployment without per-frame fees—perfect for startups, OEMs and enterprise projects.

Developer-First API & SDK Support

EdgeTAM provides Python, C++ and mobile-SDKs plus REST interfaces so you can integrate segmentation & tracking into apps, firmware and cloud-to-edge pipelines with ease.

Technology Deep Dive

Dive into EdgeTAM’s architecture, efficient codec integration, memory-aware temporal engine and edge-deployment workflow.

1

Model Architecture & Efficiency Engine

EdgeTAM builds on the foundation of SAM 2 and integrates a novel 2D Spatial Perceiver module to drastically cut latency by encoding frame-level memory via a fixed query set. This design enables EdgeTAM to run on mobile hardware while retaining strong segmentation accuracy.

2

Temporal Memory & Tracking Pipeline

Rather than re-inferring each frame independently, EdgeTAM maintains and propagates mask states using a memory bank. The system splits queries into global and patch-level groups to preserve spatial structure and ensure smooth object tracking across time.

3

Latency, Edge Scalability & Inference

Optimised for on-device deployment, EdgeTAM yields ~16 FPS on iPhone 15 Pro Max and exceeds 20× speedups over SAM 2 on comparable hardware—making real-time video intelligence feasible without the cloud.

4

Integration & Developer Workflow

EdgeTAM ships with PyTorch models, mobile-ready SDKs, and usage examples for image- and video-based pipelines. Licensed under Apache 2.0, it supports commercial build-outs, edge device firmware and hybrid cloud-edge workflows.

Testimonials

What Our Users Say About EdgeTAM

Hear from innovators, engineers and creators who use EdgeTAM to unlock edge-video intelligence in novel ways.

Alex Johnson

AR Engineer

“EdgeTAM transformed our mobile AR app—object tracking with masks is now smooth and runs entirely on the phone, no server needed.”

Li Wei

IoT Vision Lead

“Deploying EdgeTAM on our embedded device cut latency and bandwidth by half, while still reliably tracking people and objects in live camera feeds.”

Maria Hernandez

Post-Production Supervisor

“We used EdgeTAM to automatically mask and track elements in our VFX pipeline—what used to take hours is now done in minutes.”

Satoshi Nakamura

Robotics Software Architect

“With EdgeTAM’s prompt-based tracking interface, our robotic system dynamically identifies and follows tools and components in industrial video streams—intuitive and efficient.”

Nina Petrov

Startup Founder

“As a startup building smart cameras, the Apache 2.0 licence of EdgeTAM meant we could experiment freely and integrate the model on-device without hidden fees.”

Omar Farouk

Surveillance Product Lead

“EdgeTAM running on edge hardware enables real-time object tracking in our security product without cloud dependence—great for privacy and performance.”

Zara Ali

Mobile Game Producer

“We embedded EdgeTAM in our mobile game to allow players to create their own AR/object-tracking challenges—performance was solid and integration was simple.”
FAQ

EdgeTAM — Frequently Asked Questions

Answers to the most common queries about using EdgeTAM for on-device segmentation & tracking.

1

What is EdgeTAM?

EdgeTAM is a promptable segmentation and tracking model designed for edge-device execution. It accepts object prompts and tracks them across images or video in real-time.

2

How does EdgeTAM differ from standard segmentation tools?

Unlike fixed-class models, EdgeTAM lets you prompt any object and handles video mask propagation, while being optimised for mobile and embedded devices rather than just servers.

3

How fast can it process video?

On an iPhone 15 Pro Max it runs at approximately 16 frames per second and compared to SAM 2 shows over 20× speed improvement in edge settings.

4

Can it be used in commercial projects?

Yes — EdgeTAM is released under the Apache 2.0 licence, so commercial deployment, modification and distribution are permitted without per-frame licensing fees.

5

What input formats or prompts are supported?

You can supply a point, box or rough mask on an image or video frame. EdgeTAM will then segment and track that object across frames. Static image segmentation is also supported.

6

Which devices and platforms are supported?

EdgeTAM is optimized for devices like iPhone 15 Pro Max and is deployable on mobile GPUs, embedded vision boards or IoT cameras. It also supports desktop inference via PyTorch.

7

How is user data and privacy handled?

Since EdgeTAM runs on-device, sensitive video data does not need to leave the hardware. Deployment can be fully offline—ideal for privacy-critical applications.