GitHub

AI Tools Collection

A curated collection of AI tools, gathered by AI agents.

Developer Tools

69 tools
Replicate

A platform that enables users to run and deploy custom AI models via API, streamlining model creation and scaling.

Hugging Face

A robust AI platform where the machine learning community collaborates on models, datasets, and applications.

Onboard AI

An AI tool that analyzes GitHub repositories to rapidly provide insights about repository functionality, code locations, and potential modifications using GPT-driven chat.

AI Playground

An open-source AI PC starter application for image creation, stylizing, and a chatbot, designed for systems powered by Intel® Arc™ GPU. It supports various generative AI libraries including Stable Diffusion and Llama models.

Hugging Face Accelerate

A simple way to launch, train, and use PyTorch models on almost any device with support for distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP/DeepSpeed.

Diffusers

A library implementing state-of-the-art diffusion models for image, video, and audio generation, supporting both PyTorch and FLAX frameworks.

GitHub Copilot

An AI pair programming assistant that integrates with code editors to provide contextual autocompletion, code suggestions, and debugging help.

Hugging Face Spaces

A platform offering a variety of AI applications and tools across multiple domains such as image generation, text generation, speech synthesis, and more. Users can explore, create, and run various AI models and applications hosted within this directory.

Bolt.new

An AI-powered full-stack web development agent that operates entirely in the browser, allowing users to prompt, run, edit, and deploy applications with minimal local setup.

Crawl4AI

An open-source, LLM-friendly web crawler and scraper built for real-time performance, designed to extract and structure web data for AI applications.

Tabby

A self-hosted AI coding assistant designed to integrate with VSCode, offering chat-based code completions and an enhanced in-editor experience.

OpenVINO

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference across various platforms. It supports models trained with popular frameworks and enhances performance for deep learning tasks in computer vision, automatic speech recognition, and natural language processing.

Hugging Face Hub

The official Python client for the Hugging Face Hub, allowing users to interact with pre-trained models and datasets, manage repositories, and run inference on deployed models.

bolt.diy

An open-source AI coding assistant that allows you to prompt, run, edit, and deploy full-stack web applications using various LLMs, with support for multiple model providers.

LangChain

A comprehensive framework for building context-aware applications powered by large language models, featuring standard interfaces for models, embeddings, and vector stores.

Hugging Face Transformers

A comprehensive library of pretrained models for text, vision, audio, video, and multimodal tasks, enabling fine-tuning and inference across many generative AI use cases.

AI App Generator

Generate complete Next.js 14 apps from your AI idea with features like live sandbox testing, instant API setup, and full code ownership. Build, modify, and launch AI apps quickly and for free.

ReactAI

Open source free AI React components builder that allows users to create functional React components quickly without requiring an API key, providing unlimited usage ideal for developers, startups, and teams.

Exo

A tool to run your own AI cluster at home by partitioning models optimally across everyday devices, enabling distributed AI computation.

lucataco/ai-toolkit

A Cog implementation of ostris/ai-toolkit designed for training LoRA models (specifically for FLUX.1-dev) using a custom image dataset. Note that it is marked as deprecated in favor of ostris/flux-dev-lora-trainer.

Unsloth AI

Unsloth AI is an enterprise platform that accelerates fine-tuning of large language models and vision models by leveraging innovative quantization techniques. It enables faster performance (up to 2.2x faster) and uses significantly less VRAM, making model deployment and training more efficient. The organization also offers open-source tools and models, and is integrated with Hugging Face, with additional details available on its website.

AI Dev Gallery

An open-source project by Microsoft for Windows developers to integrate AI capabilities into apps using local models and APIs. The tool includes over 25 interactive samples, source code in C#, and supports loading models from platforms like Hugging Face and GitHub.

Inference Endpoints by Hugging Face

A fully managed inference deployment service that allows users to easily deploy models (such as Transformers and Diffusers) from the Hugging Face Hub on secure, compliant, and scalable infrastructure. It offers pay-as-you-go pricing and supports a variety of tasks including text generation, speech recognition, image generation, and more.

AutoTrain

Hugging Face AutoTrain is an automated machine learning (AutoML) tool that allows users to train, evaluate, and deploy state-of-the-art ML models without writing code. It supports a range of tasks including text classification, image classification, token classification, summarization, question answering, translation, tabular data tasks, and LLM finetuning, with seamless integration into the Hugging Face ecosystem.

OpenAdapt

OpenAdapt is an open source Python library that enables AI-first process automation by interfacing large multimodal models (LMMs) with traditional desktop and web GUIs. It records user interactions (screenshots and inputs), tokenizes the recorded data, and uses transformer model completions to generate synthetic inputs, automating repetitive GUI workflows in a model-agnostic manner.

Continue

An open source platform for creating, sharing, and using custom AI code assistants integrated with IDE extensions, designed to enhance developer productivity.

ostris/ai-toolkit

An open‐source toolkit that provides various AI scripts centered around Stable Diffusion and model training. It includes a web UI for starting, stopping, and monitoring jobs, as well as support for training models such as FLUX.1-dev. The repository is implemented in Python (with requirements like PyTorch) and Node.js (for the UI), making it a valuable resource for developers working on AI model training and deployment.

ostris/ai-toolkit

A GitHub repository offering a collection of AI scripts primarily for Stable Diffusion and related AI model training. It includes a web UI for managing and monitoring jobs as well as tools for training models like FLUX.1-dev.

Open-r1

A fully open reproduction of DeepSeek-R1 that supports training with reasoning traces and scales across multiple nodes using TRL’s vLLM backend.

bolt.diy

An open-source tool that lets developers prompt, run, edit, and deploy full-stack web applications using any large language model of their choice. It supports multiple providers like OpenAI, Anthropic, Ollama, and more, and is extendable via the Vercel AI SDK.

AI SDK

AI SDK is a free, open-source TypeScript toolkit that helps developers build AI-powered applications and agents using frameworks such as Next.js, React, Svelte, and Vue, as well as Node.js runtime. It provides a unified API to interact with various model providers like OpenAI, Anthropic, and Google.

DeepScaleR

DeepScaleR is an open-source project that democratizes reinforcement learning (RL) for large language models (LLMs). The repository provides training scripts, model checkpoints, detailed hyperparameter configurations, datasets, and evaluation logs to reproduce and scale RL techniques on LLMs, aimed at reproducibility and research in advanced AI training.

AI Engineer Toolkit

A collection of resources and projects designed to enhance AI development, including prompt optimization, LangChain workflows, and integrations with popular AI frameworks.

DeepSeek-Coder-V2-Lite-Instruct

An open-source Mixture-of-Experts code language model that provides advanced code intelligence, enabling functionalities comparable to GPT-4-Turbo for coding tasks.

Kiln

Kiln is a rapid AI prototyping and dataset collaboration tool that enables zero-code fine-tuning of large language models, synthetic data generation, evaluations, and team collaboration. It offers intuitive desktop apps for Windows, MacOS, and Linux, along with an open-source Python library for integrating and managing AI workflows.

RLAMA

RLAMA is a powerful AI-driven document question-answering tool that connects to local Ollama models. It allows users to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems for processing and querying documents via a CLI and API server.

Probe

Probe is an AI-friendly, fully local semantic code search engine designed for large codebases. It combines fast text search with code-aware parsing to extract complete code blocks, serving as a key building block for next generation AI coding tools.

TRL

TRL is a comprehensive open-source library that enables post-training of transformer language models using reinforcement learning techniques such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). It integrates with Hugging Face’s Transformers ecosystem and supports efficient scaling with tools like Accelerate and PEFT.

FluxGym

A dead simple web UI for training FLUX LoRA models with low VRAM support, built on Gradio UI (forked from AI-Toolkit) and powered by Kohya Scripts. It simplifies the fine-tuning of LoRA models on systems with limited VRAM (12GB/16GB/20GB).

AutoDev

AutoDev is an AI-powered coding assistant integrated in IntelliJ IDEA. It provides multilingual support, auto code generation, bug-slaying assistance, and customizable prompts along with features for auto development, testing, documentation, and agent functionalities.

SkyThought

SkyThought is an open-source toolkit that provides data curation, training (including reinforcement learning enhancements), and evaluation pipelines for cost-effective large language model training (Sky-T1 series). It offers scripts for building, training, and evaluating models such as Sky-T1-32B-Preview, making it a valuable resource for AI developers.

Unsloth

Unsloth is an open-source tool that enables developers to finetune various large language models (such as Llama 4, DeepSeek-R1, Gemma 3, and others) more efficiently. It offers free notebooks, reduced memory usage through dynamic quantization, and faster training performance, making it easier to deploy optimized models to platforms like GGUF, Ollama, vLLM, and Hugging Face.

Lighteval

An all-in-one evaluation toolkit for large language models, offering multiple entry points (CPU, GPU, distributed settings) to benchmark model performance.

PyTorch Image Models

A comprehensive library offering implementations and optimizations for modern image models, including transformers and efficient CNNs, maintained by Hugging Face.

ComfyUI-nunchaku

A ComfyUI plugin that integrates Nunchaku—an efficient inference engine for 4-bit neural networks quantized with SVDQuant—into the ComfyUI workflow. It enables enhanced performance through features like multi-LoRA, ControlNet support, FP16 attention, and compatibility with modern GPUs.

OpenVINO Toolkit

An open‐source toolkit for optimizing and deploying AI inference on common platforms such as x86 CPUs and integrated Intel GPUs. It offers advanced model optimization features, quantization tools, pre-trained models, demos, and educational resources to simplify production deployment of AI models.

NVIDIA NeMo

A scalable generative AI framework that supports multiple domains including ASR, TTS, and large language models, built and maintained by NVIDIA.

AI Toolkit for Visual Studio Code

Formerly known as Windows AI Studio, this toolkit extends VS Code with support for a broad range of AI models, streamlining prompt generation, code completion, and AI model discovery.

Microsoft AI Extension Pack

A curated collection of Visual Studio Code extensions designed to accelerate building generative AI applications and agents. It bundles essential tools such as the AI Toolkit, AI Foundry Extension, GitHub Copilot (and its Azure variant), and Data Wrangler, providing integrated support for intelligent code assistance and deployment within VS Code.

Vercel AI Toolkit

A TypeScript toolkit that helps developers build AI-powered applications using popular frameworks like Next.js, React, and more, complete with templates and integrations.

MCP Calculator

Xiaozhi MCP sample program is an open-source project demonstrating the Model Context Protocol (MCP), which enables language models to invoke external tools such as calculators, email operations, knowledge search, and more. It features secure WebSocket communication, real-time streaming, automatic reconnection, and a simple interface for tool creation.

GitHub Models

GitHub Models is an AI toolbox integrated directly into GitHub that lets developers experiment with and compare multiple industry-leading AI models through a single API key. It offers features such as side-by-side evaluations, prompt management as first-class code assets, and a models playground to tweak parameters and evaluate outputs—all within the GitHub workflow.

bolt.diy

bolt.diy is an open-source tool that lets developers prompt, run, edit, and deploy full-stack web applications using any LLM of their choice. It supports multiple providers (including OpenAI, Anthropic, Ollama, and others) via the Vercel AI SDK and is built as a community-driven alternative to proprietary solutions.

gitprompt

An AI-powered CLI git assistant that automatically stages files and creates commits with perfect, GPT-4.1 generated messages. It analyzes code changes, groups files intelligently, and supports features like diff analysis, safety checks, and interactive confirmations, boosting developer productivity during version control.

Aider

Aider is an AI pair programming tool for the terminal that leverages large language models to assist with coding tasks. It maps your codebase, supports multiple programming languages, integrates with git, and offers features like voice-to-code, IDE integration, and the ability to work with both cloud and local LLMs.

Seed-Coder

Seed-Coder is a family of lightweight open‐source code language models (LLMs) that come in base, instruct, and reasoning variants (each around 8B parameters). Developed by ByteDance Seed, the models are designed to curate code training data automatically and enhance code generation and reasoning tasks.

DeepWiki

DeepWiki is an AI-powered documentation generator that automatically converts GitHub repositories into comprehensive, wiki-style documentation. It analyzes repository code, README files, and configuration details to produce structured overviews, interactive diagrams, and provides a conversational AI assistant for querying codebase details.

nanoVLM

A repository offering a streamlined approach to deploy vision-language models, providing inference capabilities with minimal code.

Claude Auto-Commit

Claude Auto-Commit is an open-source AI-powered tool that analyzes code changes using the Claude Code SDK to generate contextual and meaningful Git commit messages. It supports multi-language commit formatting, automatic staging, and optional auto-push, integrating seamlessly into developers’ workflows via OAuth-based authentication (requiring a Claude Pro/Max subscription).

xemantic-ai-tool-schema

A GitHub repository that provides a standardized schema for describing AI tools. It defines the structure and metadata for AI tool information, aimed at developers who want to maintain consistency when documenting or sharing AI tool data.

AI Release Notes

An AI-powered GitHub App that automatically generates comprehensive release notes using commit history and pull request descriptions. It integrates seamlessly with GitHub workflows and leverages OpenAI's APIs to summarize new features, bug fixes, and other changes.

PandasAI

PandasAI is a Python platform that makes data analysis conversational by allowing users to interact with their databases or datalakes (e.g., SQL, CSV, parquet) using natural language queries powered by LLMs and Retrieval-Augmented Generation (RAG). It supports integration in Jupyter notebooks, Streamlit apps, or via a client-server architecture, serving both technical and non-technical users.

ostris/flux-dev-lora-trainer

A Replicate-hosted tool for fine-tuning the FLUX.1-dev model using the ai-toolkit with a LoRA approach. Users can initiate training jobs on Nvidia H100 GPUs to obtain custom-trained weights via an automated, cloud-based workflow.

DeepEval

DeepEval is an open-source evaluation toolkit for AI models that provides advanced metrics for both text and multimodal outputs. It supports features like multimodal G-Eval, conversational evaluation using a list of Turns, and integrates platform support along with comprehensive documentation.

Edge AI Sizing Tool

A benchmarking tool designed to showcase and evaluate the scalability and performance of AI use cases on Intel-based edge devices. It offers a zero-code configuration interface to select inputs, accelerators, performance modes, and AI models, while providing real-time monitoring of system metrics such as CPU/GPU usage, memory consumption, and inference speed.

PR-Agent

An AI-powered tool that automates pull request analysis by providing feedback, suggestions, and code review insights. It supports multiple platforms (GitHub, GitLab, Bitbucket, Azure DevOps) and can be integrated via GitHub Actions, CLI, and hosted solutions.

Kimi-Dev

Kimi-Dev is an open-source coding LLM (Kimi-Dev-72B) designed for software engineering tasks such as automated code repair and test case generation. It uses large-scale reinforcement learning to autonomously patch repositories, ensuring that full test suites pass before accepting changes. The tool is available for download and deployment via GitHub and Hugging Face.

Kilo Code

Kilo Code is an open-source VS Code AI agent that helps with planning, building, and fixing code. It leverages natural language to generate code, automates repetitive tasks (including terminal commands and browser automation), refactors code, and offers multi-mode operation (Architect, Coder, Debugger). It integrates features from existing tools like Roo Code and Cline.

Konveyor AI (Kai)

Kai is an AI-enabled tool designed to simplify the modernization of application source code to new platforms. It utilizes static code analysis and large language models guided by Konveyor’s historical migration reports to generate targeted code transformation suggestions, continuously learning from past migrations to improve future recommendations.

AI Models

105 tools
Recraft V3

A text-to-image model (code-named red_panda) that can generate images with long texts incorporated, supporting both raster and vector output formats.

Janus-1.3B

A unified multimodal AI model that decouples visual encoding to support both understanding and generation tasks.

Qwen2.5-7B

Qwen2.5-7B is a large language model designed for text generation, featuring improvements in coding, mathematics, instruction following, long text generation, and multilingual support. It supports context lengths up to 128K tokens and is intended for sophisticated NLP tasks.

BGE-M3

BGE-M3 is a versatile embedding model from the Beijing Academy of Artificial Intelligence that supports dense retrieval, multi-vector retrieval, and sparse retrieval for text embeddings. It is designed to work in over 100 languages and can handle inputs ranging from short sentences to long documents of up to 8192 tokens.

DeepSeek-V3

A large-scale language model optimized for BF16 and FP8 inference modes with support for AMD and NVIDIA GPUs, featuring pipeline parallelism via vLLM.

Llama 3

Llama 3 is an open access large language model (LLM) released by Meta, available in various configurations (8B and 70B parameters) with capabilities for fine-tuning and integrations into platforms like Hugging Face, Google Cloud, and Amazon SageMaker.

FLUX.1-dev

A 12-billion parameter rectified flow transformer that generates images from text, available under a non-commercial license with API access for advanced image synthesis.

UNfilteredAI-1B

A large-scale text generation model designed for creative and unconstrained content production without traditional filtering.

OmniGen

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts, simplifying the image generation process without the need for additional network modules or preprocessing steps. It supports various tasks such as text-to-image generation, identity-preserving generation, image editing, and more.

YOLOv10

YOLOv10 is a real-time end-to-end object detection tool that improves upon previous YOLO versions through NMS-free training and a comprehensive architectural design to enhance efficiency and accuracy. It offers state-of-the-art performance across various model sizes and is implemented in PyTorch.

BLIP-2

BLIP-2 is an advanced visual-language model that allows zero-shot image-to-text generation, enabling tasks such as image captioning and visual question answering using a combination of pretrained vision and language models.

DeepSeek-VL2

A series of advanced vision-language models designed for multimodal understanding, available in multiple sizes to suit varying complexity and performance requirements.

YOLOv5

YOLOv5 is a popular open-source AI tool aimed at object detection, image segmentation, and image classification, leveraging PyTorch for model building and deployment. It supports various deployment formats including ONNX, CoreML, and TFLite, and is well-documented for ease of use in research and practical applications.

FLUX1.1 [pro]

A new text-to-image AI model capable of generating images six times faster than its predecessor, with higher quality, better prompt adherence, and more diversity in outputs. It includes a prompt upsampling feature that utilizes a language model to enhance prompts for improved image generation.

Shuttle-3

Shuttle-3 is a state-of-the-art language model designed for high-quality text generation, particularly suited for complex chat, multilingual communication, and reasoning tasks. It is fine-tuned from the Qwen-2.5-72b-Instruct model and designed to emulate high-quality prose similar to Claude 3 models.

WizardLM

WizardLM is a state-of-the-art large language model designed for complex chat, multilingual tasks, reasoning, and agent functionalities. It features an AI-powered pipeline (Auto Evol-Instruct) that optimizes instruction datasets for improved performance across various domains and leverages Arena Learning for an expanded learning pool of challenging instruction data.

Aria

A multimodal AI model that combines vision, language, and coding tasks, designed to deliver state-of-the-art performance across diverse tasks.

ToolACE-8B

ToolACE-8B is a finetuned LLaMA-3.1-8B-Instruct model designed for automatic tool usage and generating diverse tool-learning data, achieving state-of-the-art performance on the Berkeley Function-Calling Leaderboard. It features a novel self-evolution synthesis process and a dual-layer verification system for accurate data generation.

Dynamic Speculation

A novel method developed by Intel labs and Hugging Face that accelerates text generation by up to 2.7x using dynamic speculation lookahead in language models, integrated into the Transformers library.

DeepSeek-V2

DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model designed for economical training and efficient inference, boasting 236B total parameters with excellent performance across various benchmarks and exceptional capabilities in text generation and conversational AI.

DeepSeek-Coder-V2

An open‐source Mixture‐of‐Experts code language model that enhances code generation and reasoning capabilities for programming tasks. It supports an extended 128K context window and a wide array of programming languages, making it competitive with closed‐source models like GPT4-Turbo.

openai/whisper-large-v3-turbo

A finetuned, pruned version of Whisper large-v3 for automatic speech recognition and speech translation. This model reduces the number of decoding layers from 32 to 4 to achieve much faster inference, with only a minor quality trade-off. It supports 99 languages and integrates with Hugging Face Transformers for efficient transcription and translation.

watt-tool-70B

watt-tool-70B is a fine-tuned large language model based on LLaMa-3.3-70B-Instruct, optimized for advanced tool usage and multi-turn dialogue. It is designed for AI workflow building tasks, excelling in function calling and tool selection, and achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).

NSFW-gen-v2

An updated AI model that generates uncensored NSFW content, offering users the ability to produce imaginative and unfiltered outputs.

Bielik-11B-v2

Bielik-11B-v2 is an 11-billion parameter generative text model trained on Polish text corpora. Initialized from Mistral-7B-v0.2 and fine-tuned using advanced parallelization techniques, it offers robust text generation capabilities in Polish and English, as evidenced by its performance on multiple NLP leaderboards.

Marco-o1

An open-source large reasoning language model designed for complex real-world problems, leveraging chain-of-thought fine-tuning, Monte Carlo Tree Search, and self-reflection mechanisms to expand solution spaces and improve open-ended reasoning.

JanusFlow-1.3B

JanusFlow-1.3B is a unified multimodal model by DeepSeek that integrates autoregressive language models with rectified flow, enabling both multimodal understanding and image generation.

Stable Diffusion 3.5 Medium

A Multimodal Diffusion Transformer text-to-image generative model by Stability AI that offers improved image quality, typography, complex prompt understanding, and resource efficiency. It supports local or programmatic use via diffusers, ComfyUI, and API endpoints.

Llama-3.1-Tulu-3-8B

An instruction-following language model from AllenAI based on Llama 3.1, optimized for a wide range of NLP tasks including chat, math, and reasoning. It provides various fine-tuned versions (SFT, DPO, RLVR) along with extensive benchmarking and deployment guidance on Hugging Face.

Ultralytics YOLOv8

A state‐of‐the‐art object detection model by Ultralytics that provides robust capabilities for object detection, instance segmentation, and pose estimation. It offers both CLI and Python integrations with extensive documentation and performance metrics.

Microsoft Phi-4

Microsoft Phi-4 is a state-of-the-art open language model (14B parameters, dense decoder-only transformer) trained on a blend of synthetic, public domain, and academic data. It has undergone rigorous supervised fine-tuning and direct preference optimization to improve instruction adherence, reasoning, and safety, making it suitable for research and generative AI applications.

Phi-3-mini-4k-instruct

A 3.8B parameter, lightweight instruction-tuned language model by Microsoft built on the Phi-3 datasets. It is designed for robust text generation, logical reasoning, and multi-turn conversation with support for both 4K and 128K token contexts.

DeepSeek-VL2-small

DeepSeek-VL2-small is a variant of the DeepSeek-VL2 series, advanced mixture-of-experts vision-language models designed for multimodal tasks such as visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

Ultimate SD Upscale with ControlNet Tile

An advanced image upscaling model leveraging Stable Diffusion 1.5 and ControlNet Tile to enhance image quality. Accessible via an API on Replicate and optimized to run with Nvidia A100 GPUs.

MicroThinker-3B-Preview

MicroThinker-3B-Preview is a fine-tuned language model developed by huihui-ai, built from the Llama-3.2-3B-Instruct-abliterated base model. It is specifically optimized for enhanced reasoning capabilities and text generation, using the FineQwQ-142k dataset. The model card includes detailed training instructions and support for inference (e.g., via ollama).

DeepSeek-V3

A large AI model for advanced search and multi-token prediction, supporting various inference modes and parallelism across GPUs.

DeepSeek-R1

An open-source vision-language model along with its API, designed for advanced multimodal processing and efficient distillation.

Janus-Series

An open-source repository from deepseek-ai that offers a suite of unified multimodal models (including Janus, Janus-Pro, and JanusFlow) designed for both understanding and generation tasks. The models decouple visual encoding to improve flexibility and incorporate advanced techniques like rectified flow for enhanced text-to-image generation.

Anything V4.0

A latent diffusion-based text-to-image model optimized for generating high-quality, detailed anime-style images. It leverages the Stable Diffusion pipeline and supports danbooru tags for improved prompt specificity. Hosted on Hugging Face, it also integrates with Diffusers and Gradio for demonstration and inference.

Stable Diffusion

A high-resolution image synthesis model that enables users to generate images from textual descriptions, supporting creative and design applications.

Qwen

Qwen is the large language model family developed by Alibaba Cloud, showcased on Hugging Face. It includes advanced language and multimodal models (e.g., Qwen2.5, Qwen2.5-VL) that support text generation, image-text interactions, and long-context processing. The organization page also links to interactive demos like Qwen Chat, highlighting its practical application in AI chat and content generation.

Ideogram-V2

Ideogram-V2 is an advanced image generation model that excels in inpainting, prompt comprehension, and text rendering. It is designed to transform ideas into captivating designs, realistic images, innovative logos, and posters. The model is accessible via an API on Replicate and offers unique features for creative image editing.

Qwen/QwQ-32B-Preview

An experimental preview release large language model developed by the Qwen Team, featuring 32.5B parameters. It is designed to advance AI reasoning and text generation, supporting extended context lengths (up to 32,768 tokens) and built using transformer architectures with RoPE, SwiGLU, and RMSNorm. The model is geared towards research and demonstrates strong capabilities in math and coding, despite noted limitations in language consistency and common sense reasoning.

BLOOM

BLOOM is a multilingual large language model with 176 billion parameters developed by the BigScience project. It generates text in 46 natural languages and 13 programming languages, and is designed for research and deployment under a Responsible AI License. The release includes access to intermediary checkpoints, optimizer states, and is integrated into the Hugging Face ecosystem.

Mochi 1

Mochi 1 is an open state-of-the-art video generation model by Genmo, featuring a 10 billion parameter diffusion model built on the novel Asymmetric Diffusion Transformer (AsymmDiT) architecture. It generates high-quality videos with high-fidelity motion and strong prompt adherence and is available via an API on Replicate.

Stable Diffusion 2-1

A state-of-the-art text-to-image generation model developed by Stability AI, capable of producing high-resolution images from textual descriptions.

OpenAI GPT 1

OpenAI GPT 1 is the first transformer-based language model developed by OpenAI. It is a causal transformer pre-trained on a large corpus for language modeling and is available for inference through both PyTorch and TensorFlow. The model card provides comprehensive details including training methodology, risks, limitations, and usage guidelines.

GPT-2

GPT-2 is a pretrained generative transformer model by OpenAI, designed for text generation. It is trained using a causal language modeling objective on a large corpus of English text and is available on Hugging Face. The model card provides detailed usage examples, training procedure, limitations, and evaluation results.

OpenAI GPT-4o

OpenAI GPT-4o is an advanced multimodal AI model available via the Azure OpenAI Service. It integrates text, image, and audio processing to offer efficient and cost-effective performance, surpassing GPT-4 Turbo with Vision in speed, cost, and non-English language support. It is designed for enhanced customer service, advanced analytics, and content innovation.

Shuttle 3 Diffusion

Shuttle 3 Diffusion is a text-to-image diffusion model that generates detailed and diverse images from textual prompts in just 4 steps. It offers enhanced image quality, improved typography, and resource efficiency, and can be integrated via API, Diffusers, or ComfyUI.

Recraft V3 SVG

A text-to-image generation model that produces high-quality scalable vector graphics (SVG) images, including logos, icons, and custom branded designs. It features precise text integration and design control, setting it apart from traditional raster-based models.

OpenLLaMA

An open-source reproduction of Meta AI’s LLaMA large language model, offering 3B, 7B, and 13B parameter models trained on the RedPajama dataset with both PyTorch and JAX weights under the Apache-2.0 license.

Yi

The Yi series is a set of open-source large language models developed from scratch by 01-ai. Designed as bilingual models, they offer strong performance in language understanding, commonsense reasoning, and chat tasks. The repository includes documentation on usage, fine-tuning, quantization, and deployment.

DeepSeek-MoE

DeepSeek-MoE 16B is a Mixture-of-Experts (MoE) language model featuring 16.4B parameters. It employs fine-grained expert segmentation and shared experts isolation to achieve comparable performance to larger models with only around 40% of the typical computations. The repository includes both base and chat variants along with evaluation benchmarks and integration instructions via Hugging Face Transformers.

img2prompt

An AI model that extracts approximate text prompts from input images, optimized for stable diffusion using a modified CLIP Interrogator method. It enables users to generate descriptive prompts that can be used to recreate or modify images.

Allegro

Allegro is an advanced open-source text-to-video generation model by RhymesAI. It converts simple text prompts into high-quality, 6-second video clips at 15 FPS and 720p resolution using a combination of VideoVAE for video compression and a scalable Diffusion Transformer architecture.

Stable Diffusion 3.5 Large

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer text-to-image generative model developed by Stability AI. It generates images from text prompts with enhanced image quality, typography, and resource-efficiency. The model supports integration with Diffusers, ComfyUI, and other programmatic interfaces, and is available under the Stability Community License.

xinsir/controlnet-union-sdxl-1.0

A ControlNet++ model for text-to-image generation and advanced image editing. Built on Stable Diffusion XL, it supports over 10 control conditions and advanced features such as tile deblurring, tile variation, super resolution, inpainting, and outpainting. The model is designed for high-resolution, multi-condition image generation and editing.

Mochi 1 Preview

Mochi 1 Preview is an open, state-of-the-art text-to-video generation model by Genmo that leverages a 10 billion parameter diffusion model with a novel Asymmetric Diffusion Transformer architecture. It generates high-fidelity videos from text prompts and is available under an Apache 2.0 license.

olmOCR-7B-0225-preview

A preview release of AllenAI's olmOCR model, fine-tuned from Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset. It is designed for document OCR and recognition, processing PDF images by extracting text and metadata. The model is intended to be used in conjunction with the olmOCR toolkit for efficient, large-scale document processing.

Perplexity R1-1776

A post-trained variant of the DeepSeek-R1 reasoning model by Perplexity AI, designed to remove censorship and deliver unbiased, accurate, and fact-based responses while maintaining robust reasoning skills.

DeepSeek-R1 Distill Qwen 14B GGUF

A quantized (GGUF) variant of the DeepSeek-R1 reasoning model distilled from Qwen 14B. This model supports a massive 128k context length and is tuned for reasoning and chain-of-thought tasks. It is provided by the lmstudio-community on Hugging Face, incorporating optimizations from llama.cpp.

ModernBERT Embed

ModernBERT Embed is an embedding model derived from ModernBERT-base designed for generating sentence embeddings. It supports tasks such as sentence similarity and search through both full (768-d) and truncated (256-d) embedding outputs. The page provides comprehensive usage examples using SentenceTransformers, Transformers, and Transformers.js, indicating its integration into various frameworks.

Janus-Pro-1B

Janus-Pro-1B is a unified multimodal model by DeepSeek that decouples visual encoding for multimodal understanding and generation. It supports both image input (via SigLIP-L) for understanding and image generation using a unified transformer architecture.

Grok 3 ai

Grok 3 ai is xAI's flagship language model, introduced as an upgrade to Grok 2. It features enhanced computational power (10–15× more than its predecessor), advanced reasoning capabilities including a 'Big Brain Mode' for tackling complex multi-step problems, and a DeepSearch feature that scans and synthesizes information from the internet and social platforms. Additionally, it supports multimodal inputs and improved coding accuracy, positioning it as a strong competitor against models like GPT-4o and Gemini. The model is accessible via subscription plans integrated within X’s ecosystem.

Stable Virtual Camera

A 1.3B diffusion model for novel view synthesis that generates 3D consistent novel views and videos from multiple input images and freely specified target camera trajectories. It is designed for research and creative non-commercial use.

EleutherAI/gpt-neox-20b

A 20-billion parameter autoregressive transformer language model developed by EleutherAI using the GPT-NeoX library. It is designed primarily for research purposes, with capabilities for further fine-tuning and adaptation, and provides detailed technical specifications and evaluation results.

Dolphin 3.0 R1 Mistral 24B

A next-generation instruct-tuned text generation model optimized for coding, math, reasoning, and agentic tasks. Built on the Mistral-24B base, it is fine-tuned with extensive reasoning traces to support function calling and steerable alignment, offering users local deployment control.

Falcon 3 Family

A family of open-source, decoder-only large language models under 10 billion parameters developed by Technology Innovation Institute (TII). The Falcon 3 models offer enhanced math, scientific, and coding capabilities through innovative pretraining techniques and are available in multiple variants including base and instruct configurations.

Hunyuan3D 2.0

A diffusion-based model for generating high-resolution textured 3D assets, featuring a two-stage pipeline with a shape generation component (Hunyuan3D-DiT) and a texture synthesis component (Hunyuan3D-Paint). It supports both image-to-3D and text-to-3D workflows, and includes a user-friendly production platform (Hunyuan3D-Studio) for mesh manipulation and animation.

DeepSeek-R1-Distill-Qwen-1.5B

A distilled dense language model based on Qwen2.5-Math-1.5B that leverages the DeepSeek-R1 pipeline. It is designed for advanced reasoning, math, and code generation tasks, and is available under an MIT license with extensive evaluation metrics and deployment instructions on Hugging Face.

smollm

A family of lightweight AI models including SmolLM2 for language tasks and SmolVLM for vision-language tasks, optimized for efficiency.

Reka Flash 3

A 21B general-purpose reasoning model trained from scratch, designed for multi-round conversational tasks with a focus on deep reasoning.

FuseChat-7B-VaRM

FuseChat-7B-VaRM is a chat language model developed by FuseAI that fuses knowledge from multiple chat LLMs (NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B) using a fuse-then-merge strategy. It aims to integrate the strengths of diverse models into a single, memory-efficient LLM, achieving competitive performance on benchmarks like MT-Bench.

Florence-2-large

An advanced vision foundation model by Microsoft designed for a wide range of vision and vision-language tasks such as captioning, object detection, OCR, and segmentation. It uses a prompt-based, sequence-to-sequence transformer architecture pretrained on the FLD-5B dataset and supports both zero-shot and finetuned settings.

DeepSeek-R1-Distill-Qwen-14B

An open-source, distilled large language model derived from DeepSeek-R1 that is built on Qwen2.5-14B. It utilizes reinforcement learning techniques to achieve enhanced reasoning, chain-of-thought generation, and state-of-the-art performance benchmarks.

Kernel/sd-nsfw

A Stable Diffusion v1-5 NSFW REALISM model variant hosted on Hugging Face. It is a diffusion-based text-to-image generation model fine-tuned for generating photo-realistic images, including NSFW content, and is intended for research purposes. It can be used with the Diffusers library and offers options for both direct inference and fine-tuning.

Anything V5

A text-to-image diffusion model from the Anything series designed for anime-style image generation. The model is available in multiple variants (e.g., V5-Prt) and is optimized for precise prompt-based outputs. It leverages Stable Diffusion pipelines and is hosted on Hugging Face with detailed versioning and usage instructions.

prunaai/hidream-l1-dev

An optimized version of the hidream-l1-dev model using the pruna ai optimisation toolkit. This model runs on Nvidia A100 GPUs, is available via an API on Replicate, supports rapid predictions (around 15 seconds per run), and has been executed over 28.5K times.

Stable Diffusion v1.5

A latent diffusion-based text-to-image generation model that produces photorealistic images from text prompts. It builds upon the Stable Diffusion v1.2 weights and is fine-tuned for improved classifier-free guidance. It can be used via the Diffusers library, ComfyUI, and other interfaces.

Llama 4 Maverick & Scout

A new generation of large language models from Meta released on Hugging Face. Llama 4 includes two Mixture-of-Experts models – Maverick (~400B total with 17B active parameters and 128 experts) and Scout (~109B total with 17B active parameters and 16 experts). Both support native multimodal inputs (text and images), extremely long context lengths (up to 10M tokens in Scout), and are integrated with Hugging Face transformers and TGI for easy deployment.

VLM-R1

VLM-R1 is a stable and generalizable R1-style large Vision-Language Model designed for visual understanding tasks such as Referring Expression Comprehension (REC) and Out-of-Domain evaluation. The repository provides training scripts, multi-node and multi-image input support, and demonstrates state-of-the-art performance with RL-based fine-tuning approaches.

FinGPT

FinGPT is an open‐source repository that provides financial large language models along with training scripts, fine‐tuning techniques, and benchmark datasets. It is designed to efficiently adapt LLMs for financial applications, democratizing financial data and supporting research through released models and accompanying academic papers.

DeepSeek-V2-Lite

DeepSeek-V2-Lite is a Mixture-of-Experts language model designed for economical training and efficient inference. With 16B total parameters and 2.4B activated parameters, it employs innovative techniques such as Multi-head Latent Attention (MLA) and DeepSeekMoE for performance gains. The model is available for both text and chat completions via Hugging Face and is optimized to run with a 40GB GPU using BF16 precision.

Jamba-v0.1

Jamba-v0.1 is a state-of-the-art, hybrid SSM-Transformer large language model developed by AI21 Labs. It is a pretrained, mixture-of-experts generative text model with 12B active parameters (52B total across experts), supporting a 256K context length. Designed for high throughput, it serves as a strong base for fine-tuning into chat/instruct versions.

OpenAI GPT 4.1 API

OpenAI's flagship GPT-4.1 API is a high-performance large language model optimized for real-world applications. It supports up to 1M tokens of context, offers improved coding, advanced instruction following, enhanced formatting, and robust long-context comprehension, making it ideal for building intelligent agents, processing extensive documents, and handling complex workflows.

STILL-3-Tool-32B

A 32.8B parameter text-generation model that integrates Python code to enhance the reasoning process via tool manipulation. It achieves 81.70% accuracy on AIME 2024, matching o3-mini and outperforming o1 and DeepSeek-R1. The model is open-sourced on Hugging Face, and its design focuses on improving reasoning capabilities by leveraging integrated tool use.

Stable Diffusion XL Base 1.0

A diffusion-based text-to-image generative model developed by Stability AI. This model uses a latent diffusion approach with dual fixed text encoders, and can be used standalone or combined with a refinement model for enhanced high-resolution outputs. It supports both direct image generation and img2img workflows leveraging SDEdit.

HiDream-I1

An open-source image generative model with 17B parameters, delivering state-of-the-art image generation quality, accompanied by a dedicated Hugging Face Space for experimentation.

spaCy Models

A GitHub repository by explosion that distributes pre-trained model packages for the spaCy NLP library. The repository provides model releases in .whl and .tar.gz formats for various NLP tasks (e.g., tagging, parsing, lemmatization, and named entity recognition) along with versioning and compatibility guidelines.

Llama4

Llama4 is a large autoregressive Mixture-of-Experts (MoE) multimodal model developed by Meta. It comes in two variants: Maverick (17B active parameters out of ~400B total with 128 experts) and Scout (17B active parameters out of ~109B total with 16 experts). The models support native multimodal inputs (text and images), long context lengths (up to 10 million tokens in some versions), and advanced quantization and offloading techniques for efficient deployment.

OpenAI GPT-4o API

GPT‑4o is OpenAI’s most advanced flagship multimodal model that supports text, image, and audio inputs and outputs, offering real-time responsiveness, a 1M token context window via API, and high performance across reasoning, math, and coding tasks. It is ideal for applications such as real-time voice assistants, interactive multimodal document Q&A, and advanced code generation.

Phi-4-mini-instruct

Phi-4-mini-instruct is a 3.8B-parameter lightweight language model from Microsoft, built from the Phi-4 family. It is designed for high-quality reasoning and instruction-following tasks, supports a 128K token context length, and is optimized via supervised fine-tuning and direct preference optimization. It is intended for both commercial and research use in memory/compute constrained and latency-sensitive environments.

Shakker-Labs/AWPortraitCN2

A text-to-image model focused on generating portraits with Eastern aesthetics. The updated version expands character depiction across various age groups and themes including cuisine, architecture, traditional ethnic costumes, and diverse environments. It is based on the stable-diffusion/flux framework and released under a non-commercial license.

deepfake-detector-model-v1

A deepfake detection image classification model fine-tuned from google/siglip2-base-patch16-512. It leverages the SiglipForImageClassification architecture to classify images as either 'fake' (deepfakes) or 'real', and is intended for applications such as media authentication, content moderation, forensic analysis, and security.

Flux1.1 Pro – Ultra

Flux1.1 Pro – Ultra is an advanced text-to-image diffusion model by Black Forest Labs available on Replicate. It offers ultra mode for generating high-resolution images (up to 4 megapixels) at impressive speeds (around 10 seconds per sample) and a raw mode that produces images with a more natural, candid aesthetic.

Flux-uncensored

Flux-uncensored is a text-to-image diffusion model hosted on Hugging Face by enhanceaiteam. It leverages the stable-diffusion pipeline, LoRA, and the fluxpipeline to generate images from text prompts. The model is marked as 'Not-For-All-Audiences', indicating that it might produce sensitive content.

AM-Thinking-v1

AM-Thinking-v1 is a 32B dense language model built on Qwen 2.5-32B-Base, designed to enhance reasoning capabilities. It uses a post-training pipeline that includes supervised fine-tuning and dual-stage reinforcement learning, enabling strong performance in reasoning tasks like code generation, logic, and writing while operating efficiently on a single GPU.

jina-embeddings-v3

jina-embeddings-v3 is a multilingual multi-task text embedding model developed by Jina AI. Built on the Jina-XLM-RoBERTa architecture, it employs task-specific LoRA adapters to generate embeddings for various NLP tasks such as retrieval, classification, text-matching, and more. It supports rotary position embeddings for input sequences up to 8192 tokens and offers flexible, adjustable embedding dimensions.

Microsoft Phi-4-reasoning-plus

Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning large language model developed by Microsoft. Finetuned from the base Phi-4 model with a mix of supervised fine-tuning on chain-of-thought traces and reinforcement learning, it is optimized for advanced reasoning tasks in math, science, and coding. The model features a dense 14B parameter decoder-only Transformer architecture with a 32k token context length and produces responses with a reasoning chain-of-thought followed by a summarization. It is intended for research and generative AI applications in constrained memory/latency settings.

FLUX.1

FLUX.1 is an open‐source state‐of‐the‐art text‐to‐image generation model developed by Black Forest Labs. It excels in prompt adherence, visual detail, and diverse output quality. Available via Replicate's API, FLUX.1 comes in three variants (pro, dev, schnell) with different pricing models.

Ideogram 3.0

Ideogram 3.0 is a text-to-image generation model available on Replicate that offers three variants—Turbo, Balanced, and Quality—to cater for fast iterations, balanced outputs, and high-fidelity results. It delivers improved realism, enhanced text rendering, precise layout generation, and advanced style transfer capabilities, making it ideal for graphic design, marketing, and creative visual content creation.

Smaug-72B-v0.1

Smaug-72B-v0.1 is an open-source large language model for text generation developed by Abacus.AI. Based on Qwen-72B and finetuned using the novel DPO-Positive (DPOP) technique, it achieves high performance on benchmarks like MT-Bench and is the first open model to surpass an average score of 80% on the Open LLM Leaderboard.

DeepSeek-R1-Distill-Llama-8B

A distilled language model from the DeepSeek-R1 series built on the Llama-3.1-8B base. It is optimized for text generation and chain-of-thought reasoning tasks through reinforcement learning and selective fine-tuning, delivering competitive performance on math, code, and reasoning benchmarks.

Kimi-VL-A3B-Thinking

Kimi-VL-A3B-Thinking is an efficient open-source Mixture-of-Experts vision-language model specialized in long-context processing and extended chain-of-thought reasoning. With a 128K context window and only 2.8B activated LLM parameters, it excels in multimodal tasks including image and video comprehension, OCR, mathematical reasoning, and multi-turn agent interactions.

Media Generation

82 tools
Real-ESRGAN

An AI-powered image upscaling tool that enlarges images while enhancing details and reducing artifacts, often used for improving image resolution.

CodeFormer

A robust face restoration algorithm for enhancing old photos or AI-generated faces, available via Replicate for easy inference.

AI Image Upscaler With Super Resolution

An image upscaling tool using Real-ESRGAN, designed to improve image resolution and quality, available on Replicate.

DeepBrain AI Studios

An AI tool for creating realistic AI avatars and generating videos from text, enabling users to bypass manual scripting.

Submagic

An AI-powered video tool that automatically identifies the best moments in your videos and converts them into viral clips.

NSFWGenerator

An AI tool that generates and browses NSFW images through advanced algorithms.

AI Image & Photo Restoration

A collection of AI-powered tools on Replicate designed for restoring and enhancing images, including models like CodeFormer and others for upscaling, colorization, and noise removal.

GFPGAN

A practical AI tool for face restoration, capable of enhancing and restoring old and AI-generated faces, available for self-hosting via Docker.

OpenVoice

OpenVoice is a versatile instant voice cloning framework that allows users to generate speech in multiple languages using only a short audio clip from a reference speaker. The tool provides granular control over voice styles, such as emotion, accent, rhythm, pauses, and intonation, and supports zero-shot cross-lingual voice cloning, enabling users to clone voices across different languages without needing training data for those languages.

WhisperX

WhisperX is an Automatic Speech Recognition (ASR) tool that provides fast and accurate transcriptions with word-level timestamps and speaker diarization features, enhancing the capabilities of OpenAI's Whisper model.

Parler-TTS

A text-to-speech inference and training library for generating high-fidelity speech from text, offering an open-source solution for TTS applications.

AI Image Generator – Text to Image Models

A platform that hosts various AI models for generating images from text prompts using advanced techniques such as Stable Diffusion and FLUX.1, showcasing models with capabilities including realistic text generation, SVG creation, and high-quality image outputs.

MagicQuill

MagicQuill is an intelligent interactive image editing system that enables precise image modification through AI-powered suggestions and a user-friendly interface, featuring functionalities like local editing and drag-and-drop support.

Clarity AI Upscaler

Clarity AI Upscaler is an advanced image upscaling tool that utilizes Stable Diffusion processes to enhance and recreate details in images, providing users with the option to balance fidelity and creativity through parameters such as diffusion strength. The tool supports tiled diffusion techniques for handling large images and incorporates ControlNet for maintaining structural integrity while enhancing details.

SpeechBrain

An all-in-one open-source conversational AI toolkit based on PyTorch offering speech recognition, text-to-speech, speaker recognition, and more.

Whisper Large

A robust speech recognition model based on a Transformer architecture that supports multilingual transcription, speech translation, and language identification.

Adobe Firefly

Adobe Firefly is an AI art generator developed by Adobe, enabling users to create images, audio, vectors, and videos from text prompts. It integrates with Adobe Creative Cloud, enhancing workflows with generative AI capabilities such as Text-to-Image, Generative Fill, and more.

Retrieval-based Voice Conversion WebUI

An open-source web UI that enables voice conversion using retrieval-based methods, offering configurable options and support for different models.

InvokeAI

InvokeAI is an open-source creative engine based on Stable Diffusion models that empowers professionals, artists, and enthusiasts to generate high-quality visual media using AI-driven technologies. It features a user-friendly WebUI and serves as a foundation for various commercial and creative products.

Replica

An AI tool capable of replicating human voice characteristics to generate expressive, high-quality speech from text.

OpenVoice V2

OpenVoice V2 is an advanced text-to-speech model that provides instant voice cloning with accurate tone color reproduction and flexible voice style control. It supports zero-shot cross-lingual synthesis in multiple languages and has improved audio quality over its previous version. Released under the MIT License, it is geared towards both research and commercial use.

Whisper Large v3

A state-of-the-art automatic speech recognition and translation model trained on over 5 million hours of data, capable of robust zero-shot generalization.

Whisper by OpenAI

A robust, general-purpose speech recognition model capable of multilingual transcription, translation, and language identification, built using a transformer architecture.

Stable Diffusion 3 Medium

A multimodal diffusion transformer model that generates images from textual descriptions with improvements in image quality, typography, and resource-efficiency for creative applications.

ComfyUI

A powerful and modular GUI, API, and backend for diffusion models that allows users to design and execute advanced stable diffusion pipelines using a graph/node/flowchart-based interface. It supports image, video, audio models, and various optimizations.

DeepFaceLab

Industry-leading software for creating deepfakes, used widely by creators to swap faces and generate realistic video manipulations.

SD.Next

SD.Next is an all-in-one AI generative image tool implemented as a GitHub repository. It provides a robust diffusion-based framework for text-to-image generation, supporting multiple UIs and a wide range of models and platforms including CUDA, ROCm, DirectML, and more. It features advanced processing optimizations such as model compile, quantization and compression as well as built-in queue management and installer for updates.

ACE++

ACE++ is an instruction-based image creation and editing toolkit that uses context-aware content filling for tasks such as portrait generation, subject-driven image editing, and local editing. The tool supports diffusion-based models, provides installation instructions, demos, and guides for fine-tuning using LoRA, and is hosted on Hugging Face.

DALL·E mini by Craiyon

DALL·E mini (now known as Craiyon) is an AI-driven text-to-image generation tool that creates images based on text prompts. The tool is available as a running app on Hugging Face Spaces, allowing users to explore creative image generation directly from their browser.

OpenVoice

OpenVoice is an instant voice cloning tool developed by MIT and MyShell. It offers accurate tone color cloning, flexible voice style control (including emotion, accent, rhythm, pauses, and intonation), and supports zero-shot cross-lingual voice cloning. The V2 release improves audio quality, provides native multi-lingual support (English, Spanish, French, Chinese, Japanese, Korean), and is available under the MIT License for free commercial use.

ClearerVoice-Studio

An open-source, AI-powered speech processing toolkit offering state-of-the-art pretrained models and utilities for tasks such as speech enhancement, separation, super-resolution, and target speaker extraction.

AI Comic Factory

A Hugging Face Space that lets users create comics using AI; it generates comic panels and layouts from a single text prompt.

Bark

Bark is a transformer-based text-to-audio model by Suno that generates highly realistic, multilingual speech as well as music, background noise, and simple sound effects. It also produces nonverbal cues like laughing or sighing. The model is provided for research purposes with pretrained checkpoints available for inference.

minimax/video-01-director

An advanced AI video generation model that creates high-definition 720p videos (up to 6 seconds) with cinematic camera movements. It allows users to control camera movements through both bracketed commands and natural language descriptions.

CosyVoice

A multi-lingual large voice generation model which provides full-stack capabilities for inference, training, and deployment of high-fidelity voice synthesis.

Stable Diffusion web UI

An open-source web interface built with Gradio for interacting with Stable Diffusion. It provides features such as txt2img and img2img modes, inpainting, outpainting, upscaling, embedding management, and various advanced image generation tools, making it easy to experiment with and deploy Stable Diffusion.

GPT-SoVITS

A few-shot voice cloning and text-to-speech WebUI that can train a TTS model with just 1 minute of voice data. It supports zero-shot and few-shot TTS, cross-lingual inference, and includes integrated tools for voice separation, dataset segmentation, and ASR, making it easier to build and deploy custom TTS models.

CLIP Interrogator

A prompt engineering tool that leverages OpenAI's CLIP and Salesforce's BLIP to analyze an input image and generate optimized text prompts. These prompts can be used with text-to-image models like Stable Diffusion to produce creative art.

LuminaBrush

A creative ML app hosted on Hugging Face Spaces that lets users explore and generate artistic images using community-built AI models.

EasyDeepNude

EasyDeepNude is an AI tool that implements a reimagined version of the controversial DeepNude project. It provides both a command-line interface (CLI) and a graphical user interface (GUI) to process and transform photos using deep learning models. The CLI version can be integrated into automated workflows, while the GUI version offers a user-friendly cropping system for easy use. Note: This is an early alpha release and may have compatibility issues.

Ideogram-v2-turbo

A fast text-to-image generation model ideal for quick ideation and providing rough compositional sketches.

Upscayl

Upscayl is a free and open-source AI-powered image upscaler that enlarges and enhances low-resolution images using advanced AI algorithms. It is available for Linux, macOS, and Windows, and requires a Vulkan compatible GPU.

Playground v2.5 – 1024px Aesthetic Model

A diffusion-based text-to-image generative model that produces highly aesthetic images at a resolution of 1024x1024 across various aspect ratios. It outperforms several state-of-the-art models in aesthetic quality and is accessible via an API on Replicate, with integration support for Hugging Face Diffusers.

Easel AI

An AI tool that offers advanced face swap and avatar generation, preserving user likeness and enabling creative image manipulations.

Wan2.1-T2V-14B

Wan2.1-T2V-14B is an advanced text-to-video generation model that offers state-of-the-art performance, supporting both 480P and 720P resolutions. It is part of the Wan2.1 suite and excels in multiple tasks including text-to-video, image-to-video, video editing, and even generating multilingual text (Chinese and English) within videos. The repository provides detailed instructions for single and multi-GPU inference, prompt extension methods, and integration with tools like Diffusers and ComfyUI.

FLUX.1 Redux

An adapter for FLUX.1 base models that generates slight variations of a given image, enabling creative refinements and flexible high-resolution outputs.

Coqui TTS

A deep learning toolkit for advanced Text-to-Speech generation, providing pretrained models across 1100+ languages, tools for training and fine-tuning models, and utilities for dataset analysis. Battle-tested in both research and production environments.

Hugging Face Speech-to-Speech

An open-sourced, modular speech-to-speech pipeline developed by Hugging Face that integrates Voice Activity Detection, Speech-to-Text, Language Models, and Text-to-Speech. It leverages models from the Transformers library (e.g., Whisper, Parler-TTS) and supports various deployment approaches including server/client and local setups.

Flux - FLUX.1 Models Inference Repo

Official inference repository by Black Forest Labs for FLUX.1 models. This repo provides minimal inference code for running image generation and editing tasks (e.g., text-to-image, in/out-painting, structural conditioning, and image variation) and includes instructions for local installation and TensorRT support.

coqui/XTTS-v2

A text-to-speech (TTS) voice generation model that enables high-quality voice cloning and cross-language speech synthesis using just a 6-second audio clip. It supports 17 languages, offers emotion and style transfer, improved speaker conditioning, and overall stability improvements over its previous version.

FaceFusion

FaceFusion is an industry-leading face manipulation platform that enables advanced face swapping, deepfake creation, and lip-syncing. It features a command-line interface with various job management commands (batch-run, headless-run, etc.) and provides installers for Windows and macOS.

ghibli-easycontrol

An open-source model hosted on Replicate that transforms input images with a Ghibli-style aesthetic, offering high-quality, fast, and cost-effective image translation via an API.

topazlabs/image-upscale

An AI-powered, professional-grade image upscaling tool by Topaz Labs. It offers multiple enhancement models (Standard, Low Resolution, CGI, High Fidelity, Text Refine) to upscale images up to 6x with options for facial enhancement, making it ideal for improving various image types including digital art and text-heavy photos.

fofr/color-matcher

A model hosted on Replicate that performs color matching and white balance correction for images via an API. It allows users to automatically adjust image colors to achieve better balance.

UniRig

UniRig is an AI-based unified framework for automatic 3D model rigging. It leverages a GPT-like transformer to predict skeleton hierarchies and per-vertex skinning weights, automating the traditionally time-consuming rigging process for diverse 3D assets including humans, animals, and objects.

Dia

A text-to-speech (TTS) model capable of generating ultra-realistic dialogue in one pass, providing real-time audio generation on enterprise GPUs.

Photoshop AI Tools Unlocked Edition

An AI-powered extension for Adobe Photoshop that unlocks advanced editing features including AI image enhancement, smart object removal, background manipulation, and custom filters. Designed for professionals and creative enthusiasts on Windows 10/11, it automates tedious tasks and elevates creative workflows.

Hunyuan3D-2.0

An AI application that creates high-resolution 3D models from images or text prompts, enabling multi-angle or descriptive 3D model generation.

Kling Lip Sync

Kling Lip Sync is an API that changes the lip movements of a person in a video to match supplied audio or text. It allows users to add lip-sync to any video, integrating video content with new audio inputs. The model sends data from Replicate to Kuaishou and offers pricing based on the seconds of video generated.

SV4D 2.0

SV4D 2.0 is an enhanced 4D diffusion model by Stability AI for high-fidelity novel-view video synthesis and 4D asset generation. It generates 48 frames (12 video frames across 4 camera views) from an input video and uses an autoregressive approach for longer video generation. Designed for research purposes, it offers improved fidelity, sharper motion details, and better spatio-temporal consistency compared to previous models.

google/lyria-2

Lyria 2 is an AI music generation model by Google that produces professional-grade 48kHz stereo audio from text-based prompts. It supports various genres and implements SynthID for audio watermarking, making it suitable for direct project integration.

HeyGem

HeyGem is an open-source AI avatar project that enables offline video synthesis on Windows. It precisely clones your appearance and voice to generate ultra-realistic digital avatars, allowing users to create personalized videos without an internet connection.

FLUX.1 Kontext

An experimental image blending tool that merges two input images into a single, cohesive output using AI-driven composition techniques.

FLUX.1 Kontext

FLUX.1 Kontext is a new image editing model from Black Forest Labs that leverages text prompts for precise image modifications, including color swaps, background edits, text replacements, style transfers, and aspect ratio changes. It features multiple variants (Pro, Max, and an upcoming Dev) along with a conversational interface (Kontext Chat) to simplify the editing process.

FLUX.1 Kontext – Text Removal

A dedicated application built on the FLUX.1 Kontext image editing model from Black Forest Labs that removes all text from an image. The tool is available on Replicate with API access and a playground for experimentation, showcasing its specialized text removal functionality.

FLUX Kontext max - Multi-Image List

An AI tool that combines multiple images using FLUX Kontext Max, a premium image editing model from Black Forest Labs. It accepts a list of images to creatively merge them and produce enhanced, text-guided composite outputs. The tool is available on Replicate and is designed for versatile image editing tasks, including creative compositing and improved typography generation.

FLUX.1 Fill [dev]

FLUX.1 Fill [dev] is a 12-billion parameter rectified flow transformer developed by Black Forest Labs designed for text-guided inpainting. It fills specific areas in an existing image based on a textual description, enabling creative image editing workflows. It comes with a non-commercial license and integrates seamlessly with diffusers.

InvokeAI

InvokeAI is a creative engine built for Stable Diffusion models. It provides an industry-leading web-based UI to generate and refine visual media, available in both a community (self-hosted) edition and a professional (cloud-hosted) edition. It serves as the foundation for multiple commercial products, empowering professionals, artists, and enthusiasts with advanced AI-driven creative tools.

ComfyUI-RMBG

A custom node for ComfyUI that provides advanced image background removal and segmentation (including object, face, clothes, and fashion segmentation) by integrating multiple models like RMBG-2.0, INSPYRENET, BEN, BEN2, BiRefNet, SAM, and GroundingDINO.

inswapper

inswapper is an open-source, one-click face swapper and restoration tool powered by insightface. It utilizes ONNX runtime for inference, along with integration of face restoration techniques (e.g., CodeFormer) to enhance image quality and produce realistic face swaps.

Veo 3

Veo 3 is an AI-powered video generation model from Google DeepMind that produces both visuals and native audio, including sound effects, ambient noise, dialogue, and accurate lip-sync. It delivers hyperrealistic motion, prompt adherence, and even can generate video game worlds, making it a versatile media generation tool.

Google Veo 3

Google Veo 3 is Google DeepMind’s flagship text-to-video generation model that produces high-fidelity cinematic videos from text prompts. It features native audio generation, dialogue and lip-sync capabilities, realistic physics-based visuals, and immersive game world creation, making it ideal for AI-driven multimedia content creation.

VCClient Real-time Voice Changer

An open‑source, AI‑powered real‑time voice conversion tool that uses various models (e.g., RVC, Beatrice v1/v2) to transform voices dynamically. It supports multiple platforms (Windows, Mac, Linux, Google Colab) and offers both standalone and networked configurations.

Wan2.1-I2V-14B-720P

An advanced Image-to-Video generation model from the Wan2.1 suite by Wan-AI that produces high-definition 720P videos from input images. It features state-of-the-art performance, supports multiple tasks including text-to-video, video editing, and visual text generation in both Chinese and English, and is optimized for consumer-grade GPUs.

Recraft V3

Recraft V3 (code-named red_panda) is a state-of-the-art text-to-image generation model that excels at creating high-quality images with long text integration and vector art support. It offers precise control over design elements, enabling users to position text and visual components exactly as intended, and supports brand style customization.

test-yash-model-4-new-2

A custom diffusion-based model designed for generating unique fashion designs from text prompts. The API reference page provides detailed parameters for controlling aspects like prompt strength, aspect ratio, model selection, and output format.

Minimax Speech 02 HD

A high-fidelity text-to-audio (T2A) tool that offers advanced voice synthesis, voice cloning, emotional expression, and multilingual capabilities, optimized for applications such as voiceovers and audiobooks.

Chatterbox TTS

Chatterbox is a state-of-the-art, open-source text-to-speech (TTS) model developed by Resemble AI. It features a 0.5B Llama backbone, unique emotion exaggeration control, ultra-stable inference with alignment, and is benchmarked against leading closed-source systems like ElevenLabs. It is production-grade and licensed under MIT.

Resemble Chatterbox TTS

Resemble Chatterbox is an open source, production-grade text-to-speech model by Resemble AI. It features unique emotion exaggeration control, instant voice cloning from short audio, built-in watermarking, and alignment-informed inference, making it ideal for creating expressive, natural speech for various applications.

IP-Adapter

IP-Adapter is a lightweight image prompt adapter developed by Tencent AI Lab that enables pre-trained text-to-image diffusion models to incorporate image prompts along with text prompts for multimodal image generation. With only 22M parameters, it offers comparable or improved performance compared to fine-tuned models and supports integration with various controllable generation tools.

Realistic Vision V6.0 B1 noVAE

Realistic Vision V6.0 "New Vision" is a beta diffusion-based text-to-image model focused on realism and photorealism. It is released on Hugging Face and provides detailed guidelines on resolutions, generation parameters, and recommended workflows (including using a VAE for quality improvements).

Shap-E

Shap-E is an official GitHub repository by OpenAI for generating 3D implicit functions conditioned on text or images. It provides sample notebooks and usage instructions for converting text prompts or images into 3D models, making it a practical tool for generating 3D objects.

Robotics

5 tools

Chat Interfaces

27 tools
HuggingChat - Models

HuggingChat provides access to various AI chat models, enabling users to interact with and utilize state-of-the-art language models for various applications including conversation and task-oriented interactions.

Vercel AI Chatbot

A full-featured, hackable Next.js AI chatbot built by Vercel, supporting multi-model integration and powered by the AI SDK.

NextChat

A lightweight and fast AI assistant designed for enterprise use, featuring customizable branding, resource integration, and permission controls.

Hugging Face Chat UI

An open-source codebase that powers customizable chat user interfaces, allowing deployment of chatbot instances integrated with various supported language models.

IdeasAI

An OpenAI-powered startup idea generator that utilizes an autoregressive language model with deep learning to generate innovative product and business ideas.

Khoj

An open-source, self-hostable 'AI second brain' providing personalized, context-aware assistance for knowledge management and research.

leon-ai/leon

An open-source personal assistant that runs on your own server, providing a customizable, self-hosted AI assistant experience.

LongShot AI

An AI platform for content creation and custom chatbot building, tailored for generating all sorts of written content.

ChatGPT Desktop Application

A native desktop application for ChatGPT available on macOS, Windows, and Linux, providing an enhanced chat experience with extended capabilities.

Jan

A local AI assistant powered by Cortex, designed to run completely offline on a variety of hardware, offering an open source alternative to ChatGPT.

PocketPal AI

PocketPal AI is a mobile application that brings offline language model-based AI assistance directly to your phone. It allows users to download, load, and interact with various small language models (SLMs) on both iOS and Android devices, with customizable inference settings and performance metrics.

Aria - AI Research Assistant (Zotero Plugin)

A Zotero plugin that leverages GPT-4 and GPT-4 Turbo to provide an AI-powered research assistant. It offers features such as drag-and-drop referencing, autocompletion, visual analysis via GPT-4 Vision, and conversational interactions, helping users manage and annotate their Zotero items efficiently.

Cherry Studio

Cherry Studio is a cross-platform desktop client that integrates multiple LLM providers (including major cloud services like OpenAI, Gemini, Anthropic, etc., along with local model support) and supports deepseek-r1. It offers pre-configured AI assistants, multi-model simultaneous conversations, document and data processing features, and practical integrations, making it a comprehensive tool for interacting with various AI models.

Chatbox Community Edition

An open-source, user-friendly desktop client for AI models/LLMs such as ChatGPT, Claude, Gemini, Ollama, and more. It offers features like local data storage, enhanced prompting, markdown and code highlighting, keyboard shortcuts, team collaboration, and cross-platform availability (Windows, macOS, Linux, iOS/Android and Web version).

PapersGPT For Zotero

An AI-powered Zotero plugin that enhances academic research by enabling users to interact with PDF documents through chat. It supports a variety of state-of-the-art language models (e.g., GPT-4.5, ChatGPT, Claude, Gemini, DeepSeek, and others) and integrates seamlessly with Zotero, offering local model deployment for privacy and efficiency.

XiaoZhi AI Chatbot

An open-source project to build your own AI friend using ESP32, SenseVoice, and LLMs like Qwen and DeepSeek. It integrates voice wake-up, speech recognition, multi-language chat, TTS, OLED/LCD display support, and configurable prompts for a hardware-based conversational AI device.

Hollama

A minimal web-UI for interacting with Ollama and OpenAI servers, featuring multi-server support, markdown rendering, code editor functionalities, and a responsive design for local and self-hosted use.

ChatGPT-On-CS (懒人客服)

An open-source intelligent customer service system based on large language models. It supports multi-channel integration (WeChat, Pinduoduo, Qianniu, Bilibili, Douyin, Weibo, Xiaohongshu, Zhihu, etc.), enabling text, voice, and image communication, auto-replies, and knowledge-base customization for enterprise AI applications.

Lobe Chat

Lobe Chat is an open-source, modern-design AI chat framework that enables one-click free deployment of private ChatGPT, Claude, Gemini, Ollama, DeepSeek, and Qwen based chat applications. It supports multi-AI provider integration, features such as chain of thought, branching conversations, knowledge base management (file upload), multi-modal interactions (including TTS/STT voice conversation and text-to-image generation), plugin systems with function calling, and more, making it a comprehensive solution for building private, customized chat interfaces.

Chainlit

Chainlit is an open‐source Python framework that enables developers to build production-ready conversational AI applications quickly. It provides a user-friendly interface, optimized step functions, and seamless integration with LLM tools, making it easier to create interactive chatbot experiences.

XiaoZhi AI Chatbot

An open-source AI chatbot hardware project built on ESP32 using SenseVoice and Qwen72B. It integrates offline voice wake-up, multi-language speech recognition, configurable TTS and LLM integration to serve as a physical AI chat companion for educational and experimental purposes.

Grok 中文版

A comprehensive guide and portal for the Chinese-optimized version of Elon Musk's Grok3 AI model. It provides mirror site links for domestic users to access a chat interface that supports Grok3 reasoning, deep search, and other advanced functionalities without the need for VPN.

Second Me

Second Me is an open‐source platform that lets you train and deploy a personalized AI self. It uses hierarchical memory modeling and the Me-Alignment Algorithm to capture your memories and identity, enabling your AI to switch roles, collaborate on a decentralized network, and serve as a private, self-hosted personal assistant.

Open Assistant

Open Assistant is an open-source, chat-based assistant developed by LAION-AI. It is designed to understand user tasks, interact with third-party systems, and dynamically retrieve information, democratizing access to powerful large language models.

AI-DEBAT

AI-DEBAT is a Streamlit-based web app that enables users to pit two AI models against each other in a turn-based debate. Users select from models like OpenAI GPT-3.5/4, Anthropic Claude 3, Google Gemini, and Hugging Face models, provide respective API keys, and watch an interactive debate unfold with unused models acting as judges. It also allows downloading the final debate report.

GPT4All Web Search Beta

A beta release feature for GPT4All that integrates Brave Search API to enable real-time web search functionality within the GPT4All chat environment. The page provides step-by-step instructions on setting up the feature, obtaining an API key, and configuring the system prompt to allow the Llama 3.1 8B Instruct model to perform web searches.

LibreChat

LibreChat is an open-source AI chat interface tool that enables users to interact with various AI models. It features multi-agent collaboration, integration with cloud storage and real-time web search, along with various enhancements such as UI refresh, persistent code environment, and support for multiple AI models (e.g., GPT-4.1, Gemini 2.5, Grok 3).

Agent Frameworks

45 tools
AG2

AG2 (formerly AutoGen) is an open-source software platform designed for building AI agents and facilitating multi-agent interactions to solve complex tasks. It supports integration with various large language models and provides various orchestration patterns for AI agents, enabling flexible and efficient tool usage and human collaboration.

elizaOS

A framework for creating autonomous agents, featuring connectors for Discord, Twitter, and Telegram, support for various AI models, multi-agent functionality, document ingestion, and memory storage capabilities.

smolagents

A barebones library for running multi-step AI agents that supports both CLI and web-based interactions, enabling rapid prototyping of agent-driven workflows.

SuperAGI

A dev-first open source autonomous AI agent framework designed for building, managing, and running autonomous agents.

SmythOS

SmythOS lets you build and deploy AI agents without manual coding. Describe your needs, and Agent Weaver creates it automatically using the best AI models and APIs. Integrate with OpenAI, Hugging Face, Amazon Bedrock, and more—no manual coding required.

OpenHands

A platform for AI software development agents that simplifies interactions with code and automates task management through minimal-code interfaces.

DeepSeek-R1

An autonomous agent designed for deep local and web research, capable of generating detailed reports with citations for various topics.

Roo Code

Roo Code is an AI-powered autonomous coding agent that lives in your editor. It communicates in natural language, reads and writes files, executes terminal commands, automates browser actions, and can integrate with any OpenAI-compatible API/model. It adapts its personality via customizable modes, acting as a flexible coding partner, system architect, QA engineer, or product manager to help build software more efficiently.

GPT Researcher

An LLM-based autonomous agent that conducts deep local and web research on any topic and generates long reports with citations, with support for connecting to specialized data sources.

AutoGen

A programming framework for building, managing, and running multi-agent AI systems that assist with complex workflows and code generation tasks.

Dify

A production-ready platform for developing and orchestrating agentic workflows, supporting the creation and management of autonomous AI agents.

crewAI Tools

An open-source toolkit that provides a comprehensive guide and pre-built modules for integrating and creating custom tools for CrewAI agents. It includes implementations for file operations, web scraping, database interactions, API integrations, and AI-powered functionalities, supporting developers in enhancing AI agent capabilities.

OpenAI Realtime Agents

A demonstration repository showcasing advanced, agentic patterns built on top of OpenAI's Realtime API. It provides a Next.js/TypeScript example for prototyping multi-agent realtime voice applications, including sequential agent handoffs and state machine based interactions.

OpenManus

OpenManus is an open-source project that replicates the capabilities of the Manus AI agent by providing a modular, containerized multi-agent framework. It enables autonomous execution of complex tasks such as travel planning, data analysis, and content generation, and is built with Docker, Python, and JavaScript.

Goose

An open-source, extensible AI agent that goes beyond code suggestions – it can be installed, executed, edited, and tested with any large language model.

Mem0

A memory management layer for AI agents that provides personalized, secure, and local memory storage to enhance conversational AI and assistant capabilities.

NVIDIA AgentIQ

An open-source toolkit for connecting and optimizing teams of AI agents by treating agent workflows as simple function calls, ensuring composability and scalable agent orchestration.

Auto-Deep-Research

An open-source, fully-automated personal AI assistant that serves as a cost-effective alternative to OpenAI's Deep Research. Built on the AutoAgent framework, it supports integration with various LLMs, function-calling interactions, file uploads, and a one-click launch for effortless research automation.

Nanobrowser

An open‐source Chrome extension for AI-powered web automation. It employs a multi-agent system that dynamically self-corrects and adjusts approach during web tasks, all running locally in your browser.

OpenAI Agents Python

A lightweight and powerful framework for multi-agent workflows built on LLMs, complete with built‐in agent tracing and management.

Cline

Cline is an autonomous coding agent that integrates into your IDE and CLI, capable of creating and editing files, executing terminal commands, interacting with the browser, and leveraging the Model Context Protocol to extend its capabilities—all under human supervision.

OWL

OWL (Optimized Workforce Learning) is an open-source multi-agent collaboration framework built on top of the CAMEL-AI Framework. It enables dynamic agent interactions and integrates various toolkits (such as web search, file writing, terminal execution, and browser automation) to facilitate robust and efficient task automation across real-world domains.

AI Scientist-v2

An autonomous AI agent pipeline for conducting deep research and experimental analysis.

Archon

An AI agent capable of creating other AI agents using an advanced agentic coding workflow and a framework knowledge base.

Suna

An open-source generalist AI agent that includes capabilities like browser automation, file management, web crawling, command-line execution, website deployment, and API integration.

GUI-R1

GUI-R1 is a generalist R1-style vision-language action model designed for GUI agents that leverages reinforcement learning and policy optimization to automatically control and interact with graphical user interfaces across multiple platforms (Windows, Linux, macOS, Android, Web).

ADK Python

ADK Python is an open-source, code-first Python toolkit by Google for building, evaluating, and deploying sophisticated AI agents. It offers a modular framework for creating both single and multi-agent systems, enabling flexible integration, testing, and deployment across various environments, including cloud platforms such as Cloud Run and Vertex AI Agent Engine. It is optimized for the Gemini and Google ecosystem while remaining model- and deployment-agnostic.

Manus AI

Manus AI is an autonomous AI agent designed to execute complex tasks across multiple domains, including report writing, data analysis, content generation, and more. It features multi-modal capabilities, advanced tool integration (e.g., web browsers, code editors, database systems), and adaptive learning to optimize performance. The tool claims state-of-the-art performance on the GAIA benchmark, positioning itself as a competitive alternative to leading AI models.

Generative AI Toolkit

A lightweight library to build, deploy, trace, and evaluate LLM-based applications and agents throughout their entire lifecycle with AWS integration (e.g., Amazon Bedrock, DynamoDB, CloudWatch, AWS Lambda).

Swarm

An experimental, educational framework by OpenAI for lightweight multi-agent orchestration. Swarm enables agents to offload tasks through simple handoffs and demonstrates scalable, stateless agent interactions using the Chat Completions API. Note that it has been superseded by the production-ready OpenAI Agents SDK.

Agentic Browser

An open-source AI agent designed for web automation and scraping. It orchestrates specialized agents (Planner, Browser, and Critique) to automate browser interactions such as form filling, data extraction, e-commerce searches, and content retrieval via a natural language interface.

Pipecat

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. It orchestrates audio, video, AI services, and multiple transports (e.g., WebSockets, WebRTC) to enable developers to create voice assistants, AI companions, multimodal interfaces, interactive storytelling tools, and complex dialog systems.

JARVIS

JARVIS is an AI system developed by Microsoft that connects large language models with expert AI models from the ML community. It orchestrates task planning, model selection, and execution by leveraging multiple approaches (CLI, Gradio demo, web APIs) and supports integration with cloud and local deployments.

Langflow

Langflow is an open-source tool that offers a visual builder and built-in API server for designing, testing, and deploying AI-powered agents and workflows. It supports multi-agent orchestration, code customization in Python, and integrates with major LLMs, vector databases, and other AI tools.

clineAI

clineAI is an autonomous coding agent integrated into your IDE that can create/edit files, execute terminal commands, use the browser, and analyze your project’s file structure and source code. It operates with human-in-the-loop permission, ensuring safe execution, and leverages the Model Context Protocol (MCP) to expand its capabilities.

AgenticSeek

An autonomous AI agent framework designed to run locally or on remote servers, offering both CLI and web interfaces for interaction and task automation.

Dive

Dive is an open-source MCP Host Desktop Application that integrates with various large language models (LLMs) supporting function calling capabilities. It offers universal LLM support (including ChatGPT, Anthropic, Ollama, and more), cross-platform compatibility (Windows, macOS, Linux), and advanced features like custom instructions, API management, and auto-update mechanisms via the Model Context Protocol (MCP).

II-Agent

II-Agent is an open-source intelligent agent framework that streamlines and enhances workflows across multiple domains. It provides a CLI interface and a WebSocket-powered, React-based frontend, and integrates with several leading language model providers (e.g., Anthropic Claude and Google Gemini). It also includes performance evaluation on the GAIA benchmark.

OpenHands

OpenHands is an open‐source platform for software development agents powered by AI. It enables agents to perform tasks that a human developer can do, including modifying code, running commands, browsing the web, calling APIs, and even copying code snippets from sources like StackOverflow. The tool is deployable both via cloud (OpenHands Cloud) and locally using Docker.

smolagents

A lightweight Python library for building and running AI agents, offering support for various LLMs hosted on both the Hugging Face Hub and external inference APIs.

CrewAI

CrewAI is a fast and flexible Python framework for orchestrating role-playing, autonomous AI agents. It offers both high-level simplicity and low-level control for creating and managing multi-agent systems, with features like event-driven flows and an enterprise suite for secure, scalable AI automation.

Plandex

Plandex is an open source, terminal-based AI coding agent designed for large projects and real-world tasks. It can plan and execute complex, multi-step coding workflows across dozens of files, supports a massive 2M token context (approximately 100K tokens per file), and uses tree-sitter for fast project mapping and syntax validation. It offers configurable autonomy ranging from full automated execution to fine-grained control with a cumulative diff review sandbox and automated debugging of terminal commands.

WebAgent (WebWalker & WebDancer)

WebAgent is an open-source autonomous agent framework by Alibaba Group for information seeking. It comprises two complementary systems: WebWalker, a benchmark for LLMs in web traversal (ACL2025), and WebDancer, a native agentic search reasoning model (Preprint). Utilizing the ReAct framework with a four-stage training paradigm including supervised fine-tuning and reinforcement learning, it is designed to handle long-horizon, multi-step web traversal and autonomous search tasks.

Gemini CLI

Gemini CLI is an open-source command-line AI workflow tool that brings Gemini’s multimodal AI capabilities directly into your terminal. It enables users to query and edit large codebases, generate new apps from PDFs or sketches, automate operational tasks (like handling pull requests or complex rebases), and integrate various tools using built-in Google Search support.

ottomator-agents

A GitHub repository hosting a collection of open-source AI agents built for the oTTomator Live Agent Studio platform. This repository contains various agents that perform tasks such as web research, content generation, and automation. It serves as a hub for deploying and experimenting with multiple AI agents and is actively maintained under the MIT license.

Infrastructure

17 tools
HUGS

Optimized, zero‐configuration inference microservices from Hugging Face designed to simplify and accelerate the deployment of open AI models via an OpenAI‐compatible API.

open-webui/open-webui

A user-friendly AI interface that supports multiple LLM runners (such as Ollama and OpenAI-compatible APIs) and features built-in support for retrieval augmented generation.

LocalAI

A free, open-source alternative to OpenAI's API, enabling local AI inferencing as a drop-in replacement with support for various models.

Ollama

A self-hosted deployment tool for models like Llama 3.3 and DeepSeek-R1, enabling fast and local AI inference without relying on cloud APIs.

Text Generation Inference

A toolkit for serving and deploying large language models (LLMs) for text generation via Rust, Python, and gRPC. It is optimized for inference and supports tensor parallelism for efficient scaling.

Self-hosted AI Starter Kit

An open-source Docker Compose template that quickly sets up a local AI and low-code development environment. Curated by n8n, it integrates essential tools such as the self-hosted n8n platform, Ollama for local LLMs, Qdrant for vector storage, and PostgreSQL, enabling secure self-hosted AI workflows.

99AI

An open-source, commercial-ready AI web platform offering a one-stop solution for integrating a variety of AI services—including AI chat, intelligent search, creative content generation, document analysis, mind mapping, and risk management. It supports private (on-premises) deployment, multi-user management, and commercial operations, making it suitable for enterprises, teams, or individual developers building custom AI services.

vLLM

A high-throughput, memory-efficient library for large language model inference and serving that supports tensor and pipeline parallelism.

Xorbits Inference (Xinference)

Xorbits Inference (Xinference) is a versatile, open-source library that simplifies the deployment and serving of language models, speech recognition models, and multimodal models. It empowers developers to replace OpenAI GPT with any open-source model using minimal code changes, supporting cloud, on-premises, and self-hosted setups.

New API

An open-source, next-generation LLM gateway and AI asset management system that unifies various large model APIs (such as OpenAI and Claude) into a standardized interface. It provides a rich UI, multi-language support, online recharge, usage tracking, token grouping, model charging, and configurable reasoning effort, making it suitable for personal and enterprise internal management and distribution.

Text Embeddings Inference

An open-source, high-performance toolkit developed by Hugging Face for deploying and serving text embeddings and sequence classification models. It features dynamic batching, optimized transformers code (via Flash Attention and cuBLASLt), support for multiple model types, and lightweight docker images for fast inference.

LM Studio

LM Studio is a desktop application that enables users to run local and open large language models (LLMs) on their computer. Available for Mac and Windows, it provides an interface for discovering, downloading, and experimenting with local LLMs.

GPT-RAG

GPT-RAG is an enterprise-grade Retrieval-Augmented Generation (RAG) solution accelerator designed for integrating Azure Cognitive Search and Azure OpenAI services to power ChatGPT-style and Q&A experiences. It provides a modular architecture featuring data ingestion, an orchestrator (with options for Semantic Kernel functions or AutoGen-driven agentic workflows), and customizable front-end interfaces for efficient deployment in secure, enterprise environments.

GPT4All

A tool that enables running local large language models (LLMs) on consumer hardware, offering offline LLM inference capabilities.

ai-gateway

ai-gateway is an open-source API gateway that orchestrates AI model requests from multiple providers (e.g., OpenAI, Anthropic, Gemini). It includes features such as guardrails, cost control, custom endpoints, and detailed tracing (using spans), making it a backend tool for managing and routing AI API calls.

Replicate Playground

An interactive prototype on Replicate that allows users to compare AI models, rapidly prototype applications, and tweak parameters to refine results. It encourages rapid experimentation and fast feedback loops for AI model deployment and evaluation.

ClaraVerse

ClaraVerse is a privacy-first, fully local AI workspace that integrates multiple AI functionalities including Ollama LLM chat, tool calling, an agent builder, Stable Diffusion image generation, and n8n-style automation. It is designed to run entirely on your machine without any cloud backend or API keys, ensuring complete data privacy.

Resources

1 tools

Productivity

13 tools
Excalidraw

An open-source virtual whiteboard that allows users to create hand-drawn style diagrams, wireframes, and collaborate in real-time, featuring customizable tools and end-to-end encryption.

Watt Tool 8B

A tool built for enhancing AI workflow by enabling precise tool selection and multi-turn dialogue to support complex task automation.

Paperless-AI

An automated document analyzer designed for the Paperless-ngx system, leveraging several AI APIs to analyze and tag documents.

Perplexica

Perplexica is an open-source AI-powered search engine that utilizes machine learning techniques such as similarity search and embeddings to process user queries and provide precise answers with cited sources. It offers different modes (Normal and Copilot) and supports integrations with local LLMs, making it a viable alternative to Perplexity AI.

Maxun

An open-source no-code web data extraction platform that lets users train a robot in minutes to automatically scrape websites and convert them into APIs and spreadsheets.

Omnitool

An open source AI desktop environment offering a unified, browser-based interface to interact with multiple leading AI models and services.

OfficeAI

An AI-powered office assistant tailored for Microsoft Office and WPS, offering quick solutions for tasks like formatting, formula selection, and other productivity tweaks.

Winpilot

Winpilot is an AI-driven Windows companion app that streamlines system customization and management on Windows 10/11. It enables users to modify settings, remove bloatware, retrieve system information, and perform various system tasks through a chat-based interface powered by AI.

Docling

A tool designed to prepare documents for generative AI by setting up pipelines, including audio transcription using models like Whisper.

Prompt Buddy

Prompt Buddy is a free Microsoft Teams Power App built on the Power Platform with Dataverse for Teams. It provides a dedicated space for teams to share, upvote, and discover AI prompts. The app is customizable, preloaded with Microsoft Copilot categories, and supports smooth updates without losing settings.

Blinko

Blinko is an open-source, self-hosted personal AI note tool built with TypeScript that prioritizes privacy. It allows users to instantly capture ideas as plain text with full Markdown support and leverages AI-powered Retrieval-Augmented Generation (RAG) for natural language note retrieval, all while ensuring data ownership through self-hosting.

Rytr Desktop

Rytr Desktop is a fully-featured desktop application for AI-assisted text creation and editing. It enables content generation, rewriting, tone customization, grammar improvement, and integrated plagiarism checking without subscriptions or feature limitations, targeting content creators across multiple languages.

SecureAI Tools

SecureAI Tools is an open-source solution that integrates with Paperless-ngx to enable users to create, manage, and chat with document collections. It provides features such as document collection management, configurable LLM provider options, and health checks, and is deployed via docker-compose.

Gaming Tools

10 tools
AI Aimbot

An open-source, Python-based AI-powered aiming tool designed for games that automates target acquisition.

AirHub-V2

A ROBLOX aimbot and wall hack tool incorporating advanced AI techniques to optimize targeting, offering features like universal aimbot functionality and enhanced visual aids.

Sunone Aimbot

An AI-powered aimbot for FPS games that leverages YOLOv8 and YOLOv10 models, PyTorch, and TensorRT to automatically detect and target enemies in various first-person shooter games.

Aviator Prediction App

A predictive tool that uses advanced algorithms to provide real‑time outcome predictions for the Aviator game. The app is available on Windows, iOS, and Android, and is designed to help users make more informed gameplay or betting decisions.

Open-Aimbot

Open-Aimbot is a universal open-source aim assist framework for Roblox, offering over 80 features such as detection bypasses, silent aim, configurable sensitivity, and a dynamic UI, designed for game exploitation and cheat development.

AIMr

AIMr is an AI-powered aimbot written in Python designed for FPS games including Fortnite, Valorant, CS2, R6, COD, Apex, and more. It features advanced functionalities such as recoil control, silent aim, prediction, customizable visuals, and leverages modern AI technologies (e.g., YOLO) to achieve an undetected performance. It is available as an open-source project on GitHub and has a paid enhanced version available via Discord.

Aimmy-V2

Aimmy-V2 is an open‐source, AI-based aim alignment tool designed to assist gamers—especially those facing physical or accessibility challenges—with aiming in FPS games. It leverages DirectML, ONNX, and YOLOv8 for fast and efficient opponent detection, and includes features like auto-trigger, hot model/config swapping, and adjustable aiming settings.

Aviator Predictor

An AI-powered predictive analytics tool intended for domains such as aviation, featuring multi-platform support and an activation code mechanism for premium features.

PrimeAIM

PrimeAIM is an AI-powered aim assist tool for shooter games that uses OpenCV for screen capturing and PyTorch with YOLOv5 for object/player detection. It allows features such as head/chest aiming, adjustable aim speed, customizable field-of-view (FOV) and ESP overlays, and leverages the Windows API for precise mouse control. It is designed for educational purposes and is developed in Python.

Aviator Predictor

An AI-powered prediction application for the popular Aviator game. Using an enhanced prediction algorithm and activation code system, it offers both free and premium features with multi-platform support (PC, iOS, Android) to provide accurate game outcome predictions and an improved user experience.

Computer Vision

8 tools
Ultralytics YOLO11

A suite of computer vision models for object detection, segmentation, pose estimation, and classification, integrated with Ultralytics HUB for visualization and training.

YOLOv8

A state-of-the-art computer vision model for object detection, segmentation, pose estimation, and classification tasks, designed for speed, accuracy, and ease of use.

YOLOv8

A state-of-the-art object detection, segmentation, and classification model known for its speed, accuracy, and ease of use in computer vision tasks.

LHM

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds is an open‐source implementation for reconstructing and animating 3D human models from a single image. It offers GPU-optimized pipelines, Docker support, and integration with animation frameworks like ComfyUI.

ComfyUI-Florence2

A GitHub repository that integrates Microsoft’s Florence-2, an advanced vision foundation model, into ComfyUI. It enables prompt-based vision and vision-language tasks such as captioning, object detection, segmentation, and Document Visual Question Answering (DocVQA) on scanned documents.

Tesseract OCR

Tesseract OCR is an open-source optical character recognition engine that can recognize text from images. It supports over 100 languages, multiple image formats (PNG, JPEG, TIFF), and offers both an LSTM-based OCR engine and a legacy mode for character pattern recognition.

Depth Anything V2

An interactive Hugging Face Space that leverages deep learning to generate depth maps from images. This tool extracts depth information from 2D images, which can be used for creative 3D effects, image editing, or further computer vision tasks.

New Plant Disease Detection

A Hugging Face Space application that uses AI/computer vision to detect plant diseases from images.

Research Tools

4 tools

Theorem Provers

2 tools

Security

2 tools

Chemoinformatics

1 tools