Vision-Language Models for Vision Tasks: A Survey Vision-Language Pretraining Methods - 搜索视频

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Oscar: Object-Semantics Aligned Pre-training for Vision-Language T…

2020年5月4日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Lear…

2018年11月27日

DINOv3: A Next-Gen Vision Model via Self-Supervised Learning | OpenCV University posted on the topic | LinkedIn

DINOv3: A Next-Gen Vision Model via Self-Supervised Learning | Op…

In vision-and-language pretraining (VLP), objects can be used as anchor points to make aligning semantics between image-text pairs easier. Learn how Oscar, a novel VLP framework utilizing objects, sets new state of the art on six vision-and-language tasks: https://aka.ms/AA8flix | Microsoft Research

In vision-and-language pretraining (VLP), objects can be used as anc…

已浏览 2.3万次2020年5月15日

FacebookMicrosoft Research

Research talk: Large-scale, self-supervised pretraining: From language to vision

Research talk: Large-scale, self-supervised pretraining: From lang…

2021年11月16日

NICE Session 80: ICCV 2025 Paper Sharing Session 2

NICE Session 80: ICCV 2025 Paper Sharing Session 2

已浏览 50 次3 个月之前

YouTubeNLP Academic Exchange Platform

Computer Vision: Did one breakthrough change everything?

Computer Vision: Did one breakthrough change everything?

YouTubeBig Ideas Only

Robot Foundation Models - The Path from RT-1 to RT-2 | Uplatz

已浏览 1 次1 个月前

Vision Language Models #GlobalSensorAwards#sensorawa…

YouTubeGlobal Sensor

The Future of AI That Thinks Before It Speaks | VL-JEPA Explained: Ho…

已浏览 87 次1 个月前

YouTubeNeural Nexus

#20. Types of Foundation Models

已浏览 16 次1 个月前

YouTubeTech With Mala

A Survey of Large Language Model Architectures and Their Impact o…

已浏览 4 次1 个月前

YouTubePaper to PPT : Natural Language Processing

Vision Encoders in Vision-Language Models: A Survey

已浏览 83 次1 个月前

YouTubeAI Papers Podcast Daily

RynnVLA-002: A Unified Vision-Language-Action and World Mode…

已浏览 28 次2 个月之前

YouTubeAI Papers Slop

What is Self-Supervised Learning?

YouTubeData Science Made Easy

VaulTech on Instagram: "End of LLMs? VL-JEPA stands for Vision …

已浏览 386 次1 个月前

Instagramvaultechi

Stanford Seminar - Robot Learning in the Era of Large Pretrained Mod…

已浏览 1万次2024年3月13日

YouTubeStanford Online

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contr…

2024年4月29日

HKML S3E11 - FinBERT: A Pretrained Language Model for Fi…

已浏览 2357 次2021年6月26日

Large Vision Language Models Tutorial for BRAILS ++

已浏览 587 次2024年9月12日

YouTubeNHERI DesignSafe

逐篇解析机器人基座模型和VLA经典论文（含投屏版）——“人就是最智 …

已浏览 3500 次10 个月之前

YouTube張小珺Xiaojùn Podcast

ICCV 2025论文分享第二场

已浏览 643 次5 个月之前

bilibiliNICE学术

A Survey on Efficient Vision-Language-Action Models(TJU 2025)

已浏览 717 次3 个月之前

bilibilimardinff

A Survey on Large Multimodal Reasoning Models-2-多模态大模型 …

已浏览 1568 次5 个月之前

bilibili小林绿子的怀中猫

多模态经典论文集7：BEiT-3

已浏览 3301 次9 个月之前

bilibiliDeepFinder

开源！首个原生3D高斯大模型SceneSplat，端到端解锁万物识别…

已浏览 4835 次5 个月之前

bilibili深蓝学院

一体式多模态大模型VL-BEIT做了什么？BERT有什么优点？

已浏览 776 次2024年6月14日

bilibiliPh-D-Vlog

2024最新大模型科研分享！10篇全新论文带你激发研究灵感，建议火速收 …

已浏览 1245 次2024年7月3日

bilibili账号已注销

OpenAI CLIP: ConnectingText and Images (Paper Explained)

已浏览 16.9万次2021年1月12日

YouTubeYannic Kilcher

从零开始“看懂”世界：DINOv3如何让AI学会视觉？

已浏览 5481 次5 个月之前

bilibili极市平台

观看更多视频