Vladimir Mikulik

Citado por

	Total	Desde 2019
Citas	3705	3702
Índice h	13	13
Índice i10	14	14

1700

850

425

1275

2020202120222023202454 418 651 905 1652

Acceso público

Ver todo

1 artículo

0 artículos

disponibles

no disponibles

Basado en requisitos de financiación

Seguir

Vladimir Mikulik

DeepMind

Dirección de correo verificada de google.com

AI Safety Interpretability NLP


Título Ordenar por citas Ordenar por año Ordenar por título	Citado por Citado por	Año
Inferring the effectiveness of government interventions against COVID-19 JM Brauner, S Mindermann, M Sharma, D Johnston, J Salvatier, ... Science 371 (6531), eabd9338, 2021	1076	2021
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023	1042	2023
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021	862	2021
Teaching language models to support answers with verified quotes J Menick, M Trebacz, V Mikulik, J Aslanides, F Song, M Chadwick, ... arXiv preprint arXiv:2203.11147, 2022	165	2022
Alignment of language agents Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving arXiv preprint arXiv:2103.14659, 2021	133	2021
Risks from learned optimization in advanced machine learning systems E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant arXiv preprint arXiv:1906.01820, 2019	116	2019
Specification gaming: the flip side of AI ingenuity V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ... DeepMind Blog 3, 2020	94	2020
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	41	2023
Tracr: Compiled transformers as a laboratory for interpretability D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik Advances in Neural Information Processing Systems 36, 2024	39	2024
Meta-trained agents implement Bayes-optimal agents V Mikulik, G Delétang, T McGrath, T Genewein, M Martic, S Legg, ... Advances in Neural Information Processing Systems 33, 2020	39	2020
The hydra effect: Emergent self-repair in language model computations T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg arXiv preprint arXiv:2307.15771, 2023	30	2023
Neural networks are a priori biased towards boolean functions with low entropy C Mingard, J Skalse, G Valle-Pérez, D Martínez-Rubio, V Mikulik, ... arXiv preprint arXiv:1909.11522, 2019	28	2019
Scaling Language Models: Methods JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... Analysis & Insights from Training Gopher. arXiv, 2021	23	2021
Causal analysis of agent behavior for ai safety G Déletang, J Grau-Moya, M Martic, T Genewein, T McGrath, V Mikulik, ... arXiv preprint arXiv:2103.03938, 2021	10	2021
Challenges with unsupervised LLM knowledge discovery S Farquhar, V Varma, Z Kenton, J Gasteiger, V Mikulik, R Shah arXiv preprint arXiv:2312.10029, 2023	7	2023

El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.

Artículos 1–15

Citas por año

Citas duplicadas

Citas combinadas

Añadir coautoresCoautores

Seguir

Citado por