FAST: Efficient Action Tokenization for Vision-Language-Action Models - AI Robotics Tool

Overview

FAST (FAST+) is a universal action tokenizer that maps robot action sequences into dense, discrete tokens. It provides a pre-trained tokenizer and tooling to train custom tokenizers for autoregressive vision-language-action model training.

Key Features

  • Maps robot action sequences to dense, discrete tokens
  • Pre-trained FAST+ tokenizer available for immediate use
  • Tools and scripts to train custom tokenizers on user data
  • Designed for autoregressive vision-language-action model training
  • Produces compact token sequences suitable for sequence modeling

Ideal Use Cases

  • Tokenizing robot action datasets for model training
  • Training autoregressive vision-language-action models
  • Fine-tuning tokenizers on custom robot behavior data
  • Compressing action sequences for sequence modeling and evaluation

Getting Started

  • Visit the Hugging Face repository: https://huggingface.co/physical-intelligence/fast
  • Load the provided pre-trained FAST+ tokenizer into your pipeline
  • Prepare robot action sequences in the repository's expected format
  • Run the supplied training scripts to train a custom tokenizer
  • Convert tokenized sequences into inputs for autoregressive model training

Pricing

Not disclosed; repository available on Hugging Face.

Key Information

  • Category: Robotics
  • Type: AI Robotics Tool