Home › Vision Models › Florence-2-large

Florence-2-large - AI Vision Models Tool

Overview

Florence-2-large is a Microsoft vision foundation model for vision and vision-language tasks. It uses a prompt-based sequence-to-sequence transformer pretrained on the FLD-5B dataset and supports zero-shot and fine-tuned settings for tasks such as captioning, object detection, OCR, and segmentation.

Key Features

Prompt-based sequence-to-sequence transformer architecture
Pretrained on the FLD-5B dataset
Supports zero-shot inference
Supports fine-tuning for downstream tasks
Handles image captioning
Performs object detection
Performs OCR extraction
Supports image segmentation
Designed as a vision foundation model

Ideal Use Cases

Generate descriptive captions for images
Detect and localize objects in images
Extract text from scanned documents
Produce segmentation masks for images
Fine-tune for custom vision tasks

Getting Started

Open the model page on Hugging Face: https://huggingface.co/microsoft/Florence-2-large
Read the model card and available documentation
Load the model into your preferred ML framework
Run zero-shot prompts on sample images
Fine-tune with a labeled dataset for specific tasks

Pricing

Pricing is not disclosed on the model page. Check Hugging Face or Microsoft for licensing and hosting costs.

Key Information

Category: Vision Models
Type: AI Vision Models Tool

Visit Official Website

Florence-2-large - AI Vision Models Tool

Overview

Key Features

Ideal Use Cases

Getting Started

Pricing

Key Information

Related Tools

Recraft V3

Real-ESRGAN

CodeFormer

DeepBrain AI Studios

Submagic

NSFWGenerator