Mistral unveils Pixtral 12B, its breakthrough multimodal model

French AI startup Mistral has introduced its first model capable of handling both images and text. Known as Pixtral 12B, this 12-billion-parameter model is approximately 24GB in size and promises enhanced problem-solving abilities compared to models with fewer parameters.

Pixtral 12B Features

Built on Mistral’s Nemo 12B text model, Pixtral 12B can process questions related to numerous images of varying sizes using URLs or base64-encoded images. Similar to other multimodal models, Pixtral 12B is designed to perform tasks like image captioning and object counting.

Availability and Usage

Pixtral 12B can be downloaded, customized, and utilized under an Apache 2.0 license. Mistral provides access to this model via a torrent link on GitHub and the Hugging Face platform.

Mistral’s Growth and Strategy

Following a successful $645 million funding round led by General Catalyst, Mistral has garnered attention in the AI community. As a newer player compared to industry giants like OpenAI, Mistral’s approach involves offering open models for free, providing managed versions for a fee, and delivering consulting services to corporate clients.

Overall, Pixtral 12B signifies Mistral’s commitment to innovation and advancement in the AI space, positioning itself as a significant player in the European AI landscape.