A study of BFLOAT16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019 | 300 | 2019 |
Mixed precision training of convolutional neural networks using integer operations D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ... arXiv preprint arXiv:1802.00930, 2018 | 186 | 2018 |
Ternary neural networks with fine-grained quantization N Mellempudi, A Kundu, D Mudigere, D Das, B Kaul, P Dubey arXiv preprint arXiv:1705.01462, 2017 | 131 | 2017 |
Performing power management in a multicore processor VW Lee, ET Grochowski, D Kim, Y Bai, S Li, NK Mellempudi, ... US Patent 10,234,930, 2019 | 126 | 2019 |
Fp8 formats for deep learning P Micikevicius, D Stosic, N Burgess, M Cornea, P Dubey, R Grisenthwaite, ... arXiv preprint arXiv:2209.05433, 2022 | 70 | 2022 |
Mixed precision training with 8-bit floating point N Mellempudi, S Srinivasan, D Das, B Kaul arXiv preprint arXiv:1905.12334, 2019 | 69 | 2019 |
Dynamic precision management for integer deep learning primitives N Mellempudi, D Mudigere, D Das, S Sridharan US Patent 10,643,297, 2020 | 45 | 2020 |
Optimized compute hardware for machine learning operations D Das, R Gramunt, M Smelyanskiy, J Corbal, D Mudigere, NK Mellempudi, ... US Patent 10,776,699, 2020 | 43 | 2020 |
On scale-out deep learning training for cloud and hpc S Sridharan, K Vaidyanathan, D Kalamkar, D Das, ME Smorkalov, ... arXiv preprint arXiv:1801.08030, 2018 | 34 | 2018 |
Mixed low-precision deep learning inference using dynamic fixed point N Mellempudi, A Kundu, D Das, D Mudigere, B Kaul arXiv preprint arXiv:1701.08978, 2017 | 28 | 2017 |
Performing power management in a multicore processor VW Lee, D Kim, Y Bai, S Ji, S Li, DD Kalamkar, NK Mellempudi US Patent 9,910,481, 2018 | 22 | 2018 |
Incremental precision networks using residual inference and fine-grain quantization A Kundu, N Mellempudi, D Mudigere, D Das US Patent 11,556,772, 2023 | 17 | 2023 |
Ternary residual networks A Kundu, K Banerjee, N Mellempudi, D Mudigere, D Das, B Kaul, ... arXiv preprint arXiv:1707.04679, 2017 | 14 | 2017 |
Conversion hardware mechanism N Mellempudi, D Das, MEI Chunhui, K Wong, DD Kalamkar, HH Jiang, ... US Patent 11,494,163, 2022 | 13 | 2022 |
Dynamic precision management for integer deep learning primitives N Mellempudi, D Mudigere, D Das, S Sridharan US Patent 11,321,805, 2022 | 7 | 2022 |
Technologies for scaling deep learning training NK Mellempudi, S Sridharan, D Mudigere, D Das US Patent 11,068,780, 2021 | 5 | 2021 |
High performance scalable FPGA accelerator for deep neural networks S Srinivasan, P Janedula, S Dhoble, S Avancha, D Das, N Mellempudi, ... arXiv preprint arXiv:1908.11809, 2019 | 5 | 2019 |
Performing power management in a multicore processor VW Lee, ET Grochowski, D Kim, Y Bai, S Li, NK Mellempudi, ... US Patent 10,775,873, 2020 | 4 | 2020 |
K-tanh: Hardware efficient activations for deep learning A Kundu, S Srinivasan, EC Qin, D Kalamkar, NK Mellempudi, D Das, ... arXiv preprint arXiv:1909.07729, 2019 | 4 | 2019 |
Hardware apparatuses and methods relating to elemental register accesses V Lee, U Echeruo, G Chrysos, N Mellempudi US Patent 9,996,347, 2018 | 3 | 2018 |