Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023 | 3432 | 2023 |
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018 | 912 | 2018 |
Model-based reinforcement learning for atari L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, ... arXiv preprint arXiv:1903.00374, 2019 | 877 | 2019 |
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018 | 612 | 2018 |
Mesh-tensorflow: Deep learning for supercomputers N Shazeer, Y Cheng, N Parmar, D Tran, A Vaswani, P Koanantakool, ... Advances in neural information processing systems 31, 2018 | 363 | 2018 |
Scaling up models and data with t5x and seqio A Roberts, HW Chung, G Mishra, A Levskaya, J Bradbury, D Andor, ... Journal of Machine Learning Research 24 (377), 1-8, 2023 | 113 | 2023 |
Pathways: Asynchronous distributed dataflow for ml P Barham, A Chowdhery, J Dean, S Ghemawat, S Hand, D Hurt, M Isard, ... Proceedings of Machine Learning and Systems 4, 430-449, 2022 | 111 | 2022 |
Palm: Scaling language modeling with pathways. arXiv 2022 A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311 10, 2022 | 79 | 2022 |
Attention-based decoder-only sequence transduction neural networks NM Shazeer, LM Kaiser, E Pot, M Saleh, BD Goodrich, PJ Liu, R Sepassi US Patent 11,556,786, 2023 | 9 | 2023 |
Asynchronous distributed data flow for machine learning workloads JA Dean, S Roy, MA Isard, A Chowdhery, B Saeta, CA Thekkath, DW Hurt, ... US Patent 11,556,381, 2023 | | 2023 |