Seguir
Zhaohan Daniel Guo
Zhaohan Daniel Guo
DeepMind
Dirección de correo verificada de google.com - Página principal
Título
Citado por
Citado por
Año
Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
61082020
Agent57: Outperforming the atari human benchmark
AP Badia, B Piot, S Kapturowski, P Sprechmann, A Vitvitskyi, ZD Guo, ...
International conference on machine learning, 507-517, 2020
6442020
koray kavukcuoglu, Remi Munos, and Michal Valko. Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
4552020
Never give up: Learning directed exploration strategies
AP Badia, P Sprechmann, A Vitvitskyi, D Guo, B Piot, S Kapturowski, ...
arXiv preprint arXiv:2002.06038, 2020
3292020
Joint semantic utterance classification and slot filling with recursive neural networks
D Guo, G Tur, W Yih, G Zweig
2014 IEEE Spoken Language Technology Workshop (SLT), 554-559, 2014
2462014
A general theoretical paradigm to understand learning from human preferences
MG Azar, ZD Guo, B Piot, R Munos, M Rowland, M Valko, D Calandriello
International Conference on Artificial Intelligence and Statistics, 4447-4455, 2024
1592024
Bootstrap latent-predictive representations for multitask reinforcement learning
ZD Guo, BA Pires, B Piot, JB Grill, F Altché, R Munos, MG Azar
International Conference on Machine Learning, 3875-3886, 2020
1452020
Neural predictive belief representations
ZD Guo, MG Azar, B Piot, BA Pires, R Munos
arXiv preprint arXiv:1811.06407, 2018
892018
A pac rl algorithm for episodic pomdps
ZD Guo, S Doroudi, E Brunskill
Artificial Intelligence and Statistics, 510-518, 2016
652016
Byol-explore: Exploration by bootstrapped prediction
Z Guo, S Thakoor, M Pîslar, B Avila Pires, F Altché, C Tallec, A Saade, ...
Advances in neural information processing systems 35, 31855-31870, 2022
582022
Using options and covariance testing for long horizon off-policy policy evaluation
Z Guo, PS Thomas, E Brunskill
Advances in Neural Information Processing Systems 30, 2017
482017
Nash learning from human feedback
R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ...
arXiv preprint arXiv:2312.00886, 2023
462023
Bootstrap your own latent: A new approach to self-supervised learning. arXiv
JB Grill, F Strub, F Altché, C Tallec, PH Richemond, E Buchatskaya, ...
arXiv preprint arXiv:2006.07733, 2020
412020
Geometric entropic exploration
ZD Guo, MG Azar, A Saade, S Thakoor, B Piot, BA Pires, M Valko, ...
arXiv preprint arXiv:2101.02055, 2021
402021
Concurrent pac rl
Z Guo, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
302015
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 33632-33656, 2023
272023
Generalized preference optimization: A unified approach to offline alignment
Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ...
arXiv preprint arXiv:2402.05749, 2024
232024
Pac continuous state online multitask reinforcement learning with identification
Y Liu, Z Guo, E Brunskill
Proceedings of the 2016 International Conference on Autonomous Agents …, 2016
212016
Understanding the performance gap between online and offline alignment algorithms
Y Tang, DZ Guo, Z Zheng, D Calandriello, Y Cao, E Tarassov, R Munos, ...
arXiv preprint arXiv:2405.08448, 2024
132024
Directed exploration for reinforcement learning
ZD Guo, E Brunskill
arXiv preprint arXiv:1906.07805, 2019
122019
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20