Sumario: | Join host George Anadiotis and guest Purvanshi Mehta, cofounder of Lica World, for a discussion about multimodal AI and its applications. Trained on various types of data from text to images to audio and video, multimodal AI models are expanding the possibilities for the kinds of AI applications we can build. New large AI models such as GPT-4, Gemini, and Claude 3 are all general-purpose multimodal foundational models. More specialized multimodal AI models, such as OpenAI’s yet-to-be-released Sora, which generates video from text, or Suno AI, which generates songs from text, are fueling the imagination with ways we might leverage AI to automate and augment tasks in robotics, entertainment, healthcare, manufacturing, and other industries. George and Purvanshi discuss where this technology stands and share their thoughts on where the field is headed.
|