AI superstream multimodal generative AI

While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, a...

Descripción completa

Detalles Bibliográficos
Autor Corporativo: O'Reilly (Firm), publisher (publisher)
Otros Autores: Chang, Susan Shu, speaker (speaker), Gandhi, Rikin, speaker, Pai, Suhas, speaker, Alam, Nahid, speaker, Susevski, Anthony, speaker, Betlen, Andrei, speaker, Iyer, Shekhar, speaker, Gao, Jingying, speaker, Barth, Antje, speaker, Aldughayem, Omar, speaker, Fregly, Chris, speaker
Formato: Vídeo online
Idioma:Inglés
Publicado: [Sebastopol, California] : O'Reilly Media, Inc 2024.
Edición:[First edition]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009850432906719
Descripción
Sumario:While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, and even video. Multimodal AI models, such as GPT-4, are capable of working across different data formats, for example, to generate speech from text, text from images, or text from audio. By combining different modalities, multimodal AI can interact with humans in more natural, intuitive ways, mimicking how humans perceive and understand the world around them. The possibilities from processing inputs more holistically and providing more intuitive outputs are already nudging us closer to true artificial general intelligence.
Descripción Física:1 online resource (1 video file (4 hr., 11 min.)) : sound, color