Home › AI Vision Tools Tools

Best AI Vision Tools Tools

Explore 28 AI vision tools tools to find the perfect solution.

Vision Tools

28 tools

AI Image Upscaler With Super Resolution

An image upscaling tool using Real-ESRGAN, designed to improve image resolution and quality, available on Replicate.

AI Image & Photo Restoration

A collection of AI-powered tools on Replicate designed for restoring and enhancing images, including models like CodeFormer and others for upscaling, colorization, and noise removal.

InvokeAI is an open-source creative engine based on Stable Diffusion models that empowers professionals, artists, and enthusiasts to generate high-quality visual media using AI-driven technologies. It features a user-friendly WebUI and serves as a foundation for various commercial and creative products.

lucataco/ai-toolkit

A Cog implementation of ostris/ai-toolkit designed for training LoRA models (specifically for FLUX.1-dev) using a custom image dataset. Note that it is marked as deprecated in favor of ostris/flux-dev-lora-trainer.

ComfyUI

A powerful and modular GUI, API, and backend for diffusion models that allows users to design and execute advanced stable diffusion pipelines using a graph/node/flowchart-based interface. It supports image, video, audio models, and various optimizations.

DeepFaceLab

Industry-leading software for creating deepfakes, used widely by creators to swap faces and generate realistic video manipulations.

SD.Next

SD.Next is an all-in-one AI generative image tool implemented as a GitHub repository. It provides a robust diffusion-based framework for text-to-image generation, supporting multiple UIs and a wide range of models and platforms including CUDA, ROCm, DirectML, and more. It features advanced processing optimizations such as model compile, quantization and compression as well as built-in queue management and installer for updates.

DALL·E mini by Craiyon

DALL·E mini (now known as Craiyon) is an AI-driven text-to-image generation tool that creates images based on text prompts. The tool is available as a running app on Hugging Face Spaces, allowing users to explore creative image generation directly from their browser.

ostris/ai-toolkit

An open‐source toolkit that provides various AI scripts centered around Stable Diffusion and model training. It includes a web UI for starting, stopping, and monitoring jobs, as well as support for training models such as FLUX.1-dev. The repository is implemented in Python (with requirements like PyTorch) and Node.js (for the UI), making it a valuable resource for developers working on AI model training and deployment.

ostris/ai-toolkit

A GitHub repository offering a collection of AI scripts primarily for Stable Diffusion and related AI model training. It includes a web UI for managing and monitoring jobs as well as tools for training models like FLUX.1-dev.

img2prompt

An AI model that extracts approximate text prompts from input images, optimized for stable diffusion using a modified CLIP Interrogator method. It enables users to generate descriptive prompts that can be used to recreate or modify images.

Stable Diffusion web UI

An open-source web interface built with Gradio for interacting with Stable Diffusion. It provides features such as txt2img and img2img modes, inpainting, outpainting, upscaling, embedding management, and various advanced image generation tools, making it easy to experiment with and deploy Stable Diffusion.

CLIP Interrogator

A prompt engineering tool that leverages OpenAI's CLIP and Salesforce's BLIP to analyze an input image and generate optimized text prompts. These prompts can be used with text-to-image models like Stable Diffusion to produce creative art.

FluxGym

A dead simple web UI for training FLUX LoRA models with low VRAM support, built on Gradio UI (forked from AI-Toolkit) and powered by Kohya Scripts. It simplifies the fine-tuning of LoRA models on systems with limited VRAM (12GB/16GB/20GB).

Upscayl

Upscayl is a free and open-source AI-powered image upscaler that enlarges and enhances low-resolution images using advanced AI algorithms. It is available for Linux, macOS, and Windows, and requires a Vulkan compatible GPU.

FaceFusion

FaceFusion is an industry-leading face manipulation platform that enables advanced face swapping, deepfake creation, and lip-syncing. It features a command-line interface with various job management commands (batch-run, headless-run, etc.) and provides installers for Windows and macOS.

ComfyUI-nunchaku

A ComfyUI plugin that integrates Nunchaku—an efficient inference engine for 4-bit neural networks quantized with SVDQuant—into the ComfyUI workflow. It enables enhanced performance through features like multi-LoRA, ControlNet support, FP16 attention, and compatibility with modern GPUs.

Photoshop AI Tools Unlocked Edition

An AI-powered extension for Adobe Photoshop that unlocks advanced editing features including AI image enhancement, smart object removal, background manipulation, and custom filters. Designed for professionals and creative enthusiasts on Windows 10/11, it automates tedious tasks and elevates creative workflows.

ComfyUI-Florence2

A GitHub repository that integrates Microsoft’s Florence-2, an advanced vision foundation model, into ComfyUI. It enables prompt-based vision and vision-language tasks such as captioning, object detection, segmentation, and Document Visual Question Answering (DocVQA) on scanned documents.

Tesseract OCR

Tesseract OCR is an open-source optical character recognition engine that can recognize text from images. It supports over 100 languages, multiple image formats (PNG, JPEG, TIFF), and offers both an LSTM-based OCR engine and a legacy mode for character pattern recognition.

InvokeAI

Open-source Stable Diffusion toolkit and UI for image generation and editing.

ostris/flux-dev-lora-trainer

A Replicate-hosted tool for fine-tuning the FLUX.1-dev model using the ai-toolkit with a LoRA approach. Users can initiate training jobs on Nvidia H100 GPUs to obtain custom-trained weights via an automated, cloud-based workflow.

InvokeAI

Open-source image-generation tool / UI for generative image workflows (Stable Diffusion ecosystem).

krita-ai-tools

A collection of AI-powered tools designed as a plugin for Krita, enhancing digital painting workflows with advanced features like precise segmentation and mask generation using BiRefNet models. Built against Krita 5.2.x, it improves selection accuracy and performance for digital art creation.

E-commerce Visual Assistant

An interactive visual assistant that lets users upload a product photo and ask commerce-related questions (e.g., 'What brand is this?') using the google/paligemma-3b model. It leverages Gradio for an easy-to-use interface, processing image and text inputs to generate relevant answers.

Zero Shot Object Detection Arena

A Hugging Face Space that provides an interactive arena for zero-shot object detection. Users can run and experiment with object detection models without prior training, leveraging state-of-the-art zero-shot techniques.

Open-Sora

Open-Sora is an open-source initiative for efficient video production. It provides an end-to-end platform for video data preprocessing, training, inference, and more, enabling high-quality text-to-video, image-to-video, and video-to-video generation. It democratizes advanced video generation techniques by making models, checkpoints, and training code accessible.

MediaPipe

MediaPipe is an open-source framework by Google AI Edge designed for building cross-platform multimodal machine learning pipelines, especially for computer vision and media processing tasks. It provides ready-to-use components and tools for rapid prototyping and deployment in AI applications.