AI superstream multimodal generative AI

While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, a...

Full description

Bibliographic Details
Corporate Author: O'Reilly (Firm), publisher (publisher)
Other Authors: Chang, Susan Shu, speaker (speaker), Gandhi, Rikin, speaker, Pai, Suhas, speaker, Alam, Nahid, speaker, Susevski, Anthony, speaker, Betlen, Andrei, speaker, Iyer, Shekhar, speaker, Gao, Jingying, speaker, Barth, Antje, speaker, Aldughayem, Omar, speaker, Fregly, Chris, speaker
Format: Online Video
Language:Inglés
Published: [Sebastopol, California] : O'Reilly Media, Inc 2024.
Edition:[First edition]
Subjects:
See on Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009850432906719
Description
Summary:While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, and even video. Multimodal AI models, such as GPT-4, are capable of working across different data formats, for example, to generate speech from text, text from images, or text from audio. By combining different modalities, multimodal AI can interact with humans in more natural, intuitive ways, mimicking how humans perceive and understand the world around them. The possibilities from processing inputs more holistically and providing more intuitive outputs are already nudging us closer to true artificial general intelligence.
Physical Description:1 online resource (1 video file (4 hr., 11 min.)) : sound, color