#Multimodal
26 tools tagged “Multimodal”
Showing 26 tools

ScreenAgent
Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

Grok 2
xAI's reasoning-focused chatbot with image generation and multi-modal input support.
Nexa AI
On-device AI runtime for running models locally across phones, PCs, and edge hardware.

Project Astra
Google DeepMind's universal AI agent that sees, hears, and reasons about the world in real time.

OmniVision
Compact vision-language model built for on-device and edge AI deployment.

Mistrezz AI
Uncensored gamified adult AI chat with private NSFW companions, voice, images and video.

Humain AI
A Saudi-backed AI company building large-scale infrastructure and multimodal Arabic LLMs for global AI services.

Uni-1 by Luma AI
Multimodal AI model for high-fidelity image generation with strong spatial reasoning and accurate text rendering.

Black Forest Labs
A pioneering AI startup specializing in state-of-the-art generative models for image and video synthesis.

Veo 4
Multi-shot cinematic AI video generation with native synchronized audio

OpenArt
Creative AI suite for generating art, video, and audio from text or images

SuperAnnotate
End-to-end data annotation and management platform for building high-quality AI training datasets.

Gemma 3
An open-source AI model optimized for single-GPU performance, supporting multimodal inputs and over 140 languages.

GLM-4.6V
Open-source multimodal GLM from Z.ai unifying vision, text, and tool calling for long-context reasoning, search, coding, and UI-to-code.

HappyHorse
Open-source model that generates video paired with synchronized audio from a single prompt.

OpenAI GPT-4
OpenAI's multimodal large language model for text, code, and image understanding.

Codex CLI
Open-source terminal AI assistant that reads, writes, and runs code locally with multimodal input.

LTX 2.3 Video Generator
AI video generator that unifies text prompts, images, and audio into cohesive short-form clips.

MyShell
No-code AI consumer platform to build, share, and own AI apps.

LiveKit Agents
Open-source framework for building real-time, multimodal voice and video AI agents.

Aivah
Build interactive AI avatar agents for immersive digital experiences

OpenAI Advanced Voice
Real-time, natural voice conversations with ChatGPT
Voice-gen.ai
Unified platform for AI-generated voiceovers, images, and videos in one workspace.

WebVoyager
An LMM-powered web agent completing user instructions end-to-end by interacting with real-world websites.

Jina AI
Multimodal search foundation for embeddings, reranking, and RAG pipelines.

Seedance 1.5 Pro
AI creation platform for generating videos with synchronized audio (voice, lip-sync, SFX) from text or images, plus image generation and AI image editing tools.
























