Injecting numerical reasoning skills into language models M Geva*, A Gupta*, J Berant Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 199 | 2020 |
Arithmetic circuits: A chasm at depth 3 A Gupta, P Kamath, N Kayal, R Saptharishi SIAM Journal on Computing 45 (3), 1064-1079, 2016 | 182* | 2016 |
Break It Down: A Question Understanding Benchmark T Wolfson, M Geva, A Gupta, M Gardner, Y Goldberg, D Deutch, J Berant Transactions of the Association for Computational Linguistics 8, 183-198, 2020 | 171 | 2020 |
Approaching the chasm at depth four A Gupta, P Kamath, N Kayal, R Saptharishi Journal of the ACM (JACM) 61 (6), 1-16, 2014 | 136 | 2014 |
On the parameterization and initialization of diagonal state space models A Gu, A Gupta, K Goel, C Ré Advances in Neural Information Processing Systems 35, 35971-35983, 2022 | 124 | 2022 |
Diagonal state spaces are as effective as structured state spaces A Gupta, A Gu, J Berant Advances in Neural Information Processing Systems 35, 22982-22994, 2022 | 114 | 2022 |
Long range language modeling via gated state spaces H Mehta, A Gupta, A Cutkosky, B Neyshabur The Eleventh International Conference on Learning Representations, 2023 | 96 | 2023 |
Scrolls: Standardized comparison over long language sequences U Shaham, E Segal, M Ivgi, A Efrat, O Yoran, A Haviv, A Gupta, W Xiong, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 76 | 2022 |
Analyzing transformers in embedding space G Dar, M Geva, A Gupta, J Berant Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 57 | 2023 |
Gmat: Global memory augmentation for transformers A Gupta, J Berant arXiv preprint arXiv:2006.03274, 2020 | 45 | 2020 |
Reconstruction of depth-4 multilinear circuits with top fan-in 2 A Gupta, N Kayal, S Lokam Proceedings of the forty-fourth annual ACM symposium on Theory of computing …, 2012 | 29 | 2012 |
Algebraic geometric techniques for depth-4 PIT & sylvester-gallai conjectures for varieties A Gupta Electronic Colloquium on Computational Complexity (ECCC) 21 (130), 1, 2014 | 26 | 2014 |
Memory-efficient Transformers via Top-k Attention A Gupta, G Dar, S Goodman, D Ciprut, J Berant Proceedings of the Second Workshop on Simple and Efficient Natural Language …, 2021 | 21 | 2021 |
Random arithmetic formulas can be reconstructed efficiently A Gupta, N Kayal, Y Qiao computational complexity 23, 207-303, 2014 | 21 | 2014 |
Efficient reconstruction of random multilinear formulas A Gupta, N Kayal, S Lokam 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, 778-787, 2011 | 18 | 2011 |
Diagonal state space augmented transformers for speech recognition G Saon, A Gupta, X Cui ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 15 | 2023 |
Simplifying and understanding state space models with diagonal linear rnns A Gupta, H Mehta, J Berant arXiv preprint arXiv:2212.00768, 2022 | 13 | 2022 |
Value-aware Approximate Attention A Gupta, J Berant Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 4 | 2021 |
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors I Amos, J Berant, A Gupta arXiv preprint arXiv:2310.02980, 2023 | 2 | 2023 |
Exploring the limits of decoder-only models trained on public speech recognition corpora A Gupta, G Saon, B Kingsbury arXiv preprint arXiv:2402.00235, 2024 | 1 | 2024 |