#Multimodal

26 tools tagged “Multimodal


Showing 26 tools

#1

ScreenAgent

Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

4.4 (5)
Freemium
#2

Grok 2

xAI's reasoning-focused chatbot with image generation and multi-modal input support.

4.3 (4)
Freemium
N
#3

Nexa AI

On-device AI runtime for running models locally across phones, PCs, and edge hardware.

4.8 (6)
Free
#4

Project Astra

Google DeepMind's universal AI agent that sees, hears, and reasons about the world in real time.

5.0 (4)
Freemium
#5

OmniVision

Compact vision-language model built for on-device and edge AI deployment.

4.6 (5)
Freemium
#6

Mistrezz AI

Uncensored gamified adult AI chat with private NSFW companions, voice, images and video.

4.8 (5)
Free
#7

Humain AI

A Saudi-backed AI company building large-scale infrastructure and multimodal Arabic LLMs for global AI services.

4.7 (6)
Contact
#8

Uni-1 by Luma AI

Multimodal AI model for high-fidelity image generation with strong spatial reasoning and accurate text rendering.

4.7 (6)
Freemium
#9

Black Forest Labs

A pioneering AI startup specializing in state-of-the-art generative models for image and video synthesis.

5.0 (6)
Freemium
#10

Veo 4

Multi-shot cinematic AI video generation with native synchronized audio

4.6 (5)
Freemium
#11

OpenArt

Creative AI suite for generating art, video, and audio from text or images

4.7 (6)
Freemium
#12

SuperAnnotate

End-to-end data annotation and management platform for building high-quality AI training datasets.

4.4 (5)
Freemium
#13

Gemma 3

An open-source AI model optimized for single-GPU performance, supporting multimodal inputs and over 140 languages.

4.8 (5)
Free
#14

GLM-4.6V

Open-source multimodal GLM from Z.ai unifying vision, text, and tool calling for long-context reasoning, search, coding, and UI-to-code.

4.3 (6)
Free
#15

HappyHorse

Open-source model that generates video paired with synchronized audio from a single prompt.

4.7 (6)
Free
#16

OpenAI GPT-4

OpenAI's multimodal large language model for text, code, and image understanding.

4.5 (6)
Freemium
#17

Codex CLI

Open-source terminal AI assistant that reads, writes, and runs code locally with multimodal input.

4.8 (5)
Free
#18

LTX 2.3 Video Generator

AI video generator that unifies text prompts, images, and audio into cohesive short-form clips.

4.5 (4)
Free
#19

MyShell

No-code AI consumer platform to build, share, and own AI apps.

4.8 (5)
Freemium
#20

LiveKit Agents

Open-source framework for building real-time, multimodal voice and video AI agents.

4.5 (6)
Freemium
#21

Aivah

Build interactive AI avatar agents for immersive digital experiences

4.8 (4)
Freemium
#22

OpenAI Advanced Voice

Real-time, natural voice conversations with ChatGPT

4.7 (6)
Freemium
#23

Voice-gen.ai

Unified platform for AI-generated voiceovers, images, and videos in one workspace.

4.8 (4)
Free
#24

WebVoyager

An LMM-powered web agent completing user instructions end-to-end by interacting with real-world websites.

5.0 (5)
Freemium
#25

Jina AI

Multimodal search foundation for embeddings, reranking, and RAG pipelines.

4.2 (5)
Free
#26

Seedance 1.5 Pro

AI creation platform for generating videos with synchronized audio (voice, lip-sync, SFX) from text or images, plus image generation and AI image editing tools.

4.8 (5)
Paid