-
Updated
Jul 8, 2020 - Python
multimodal
Here are 658 public repositories matching this topic...
🤖 A framework for building AI Agents with LLMs, integrating multimodal generative AI technologies including voice, images, videos, and digital humans 🌈💎✨
-
Updated
Jul 31, 2023
A notebook to learn about ML for astronomy through BTSbot.
-
Updated
Feb 7, 2024 - Jupyter Notebook
Visuo-haptic integration during texture exploration
-
Updated
Jan 12, 2024 - Processing
In this course, you’ll select open source models from Hugging Face Hub to perform NLP, audio, image and multimodal tasks using the Hugging Face transformers library.
-
Updated
Mar 22, 2024 - Jupyter Notebook
Collaborative generation of unique audiovisual experiences using NFC identity cards
-
Updated
Jan 20, 2021 - TypeScript
Todo o conteúdo produzido para a unidade curricular PF (Projeto FEUP), para o curso em Engenharia Informática e Computação na FEUP
-
Updated
Oct 11, 2021
Multitasking multimodal AI material that focus on human interaction and assistance
-
Updated
Apr 29, 2023 - PureBasic
Utilizing a multimodal architecture to predict the appropriate speaker turn in a dialogue.
-
Updated
Feb 21, 2024 - Python
This repo collects Multi-modal Machine Learning papers.
-
Updated
Jul 15, 2020
AMR extension for the spatial domain, with grounded frame of reference tracking
-
Updated
Oct 5, 2023
Accepted at The Web Conference 2024.
-
Updated
Feb 6, 2024 - Python
Multi-angle Lip Multimodal Video Data
-
Updated
Apr 18, 2024
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
-
Updated
Dec 22, 2020 - Python
Dataset from the paper "The Semantic Typology of Visually Grounded Paraphrases"
-
Updated
May 9, 2022
-
Updated
Jun 9, 2023 - HTML
Application template for choosing a hotel and tour for travel
-
Updated
Nov 24, 2023 - Kotlin
This repository contains the Personalized Real Estate Agent project implementation of Udacity's Generative AI NanoDegree
-
Updated
Mar 5, 2024 - Jupyter Notebook
IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS) is an open-access reproduction of Flamingo, a closed-source visual language model developed by Deepmind. Like GPT-4, the multimodal model accepts arbitrary sequences of image and text inputs and produces text outputs.
-
Updated
Oct 16, 2023 - Python
Improve this page
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."