Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.
-
Updated
Jun 8, 2024 - Python
Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.
A curated list of foundation models for vision and language tasks
Research and Materials on Hardware implementation of Transformer Model
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
AiTLAS implements state-of-the-art AI methods for exploratory and predictive analysis of satellite images.
Final project for 6.8301 - Computer Vision (spring 2024)
OpenMMLab Detection Toolbox and Benchmark
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Self-Supervised Vision Transformers for multiplexed imaging datasets
Transformers 3rd Edition
RadioCare: Fighting Inefficiencies in Medical Imaging
A Simplified PyTorch Implementation of Vision Transformer (ViT)
A lightweight and extensible toolbox for image classification
[Nature Biomedical Engineering 2023] Decoding surgical activity from videos with a vision transformer
This is a series of computer vision foundational projects for anyone diving into the field must tackle.
Official PyTorch implementation of the CVPR 2024 paper: State Space Models for Event Cameras (Spotlight).
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."