English
全部
搜索
图片
视频
地图
资讯
更多
购物
航班
旅游
笔记本
Top stories
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
时间不限
过去 1 小时
过去 24 小时
过去 7 天
过去 30 天
最佳匹配
最新
新浪网
2月
稳定训练、数据高效,清华大学提出「流策略」强化学习新方法SAC Flow
本文介绍了一种用高数据效率强化学习算法 SAC 训练流策略的新方案,可以端到端优化真实的流策略,而无需采用替代目标或者策略蒸馏。SAC FLow 的核心思想是把流策略视作一个 residual RNN,再用 GRU 门控和 Transformer Decoder 两套速度参数化。SAC FLow 在 MuJoCo、OGBench ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
New ICE shooting video
Economy added 50K jobs
Richard Dimitri dies
Suspended 80 games
To build $20B data center
US to provide $45M in aid
Trump on land drug cartels
Jan. 6 plaque to be displayed
Philippines landfill collapse
Woman killed in shark attack
Argentina has repaid US
Judge dismisses lawsuit
Released from prison early
Strikes deal w/ White House
US seizes fifth oil tanker
Prosecutors summon owners
Restricts image generation
Returns to federal court
Iran cuts internet access
Signs 3 nuclear power deals
WNO leaving Kennedy Center
CA completely drought-free
Blocked from freezing funds
To meet big oil executives
Winter storm hits UK, France
Prolific Broadway actor dies
NYPD kills man in hospital
Miami outlasts Ole Miss
Syria announces ceasefire
2026 PGA nominees
NCAA denies waiver request
US delegation in Venezuela
SC measles outbreak
反馈