Cross-modal interaction networks for query-based moment retrieval in videos Z Zhang, Z Lin, Z Zhao, Z Xiao Proceedings of the 42nd International ACM SIGIR Conference on Research and …, 2019 | 239 | 2019 |
Weakly-supervised video moment retrieval via semantic completion network Z Lin, Z Zhao, Z Zhang, Q Wang, H Liu Proceedings of the AAAI Conference on Artificial Intelligence 34 (07), 11539 …, 2020 | 163 | 2020 |
Counterfactual contrastive learning for weakly-supervised vision-language grounding Z Zhang, Z Zhao, Z Lin, X He Advances in Neural Information Processing Systems 33, 18123-18134, 2020 | 131 | 2020 |
Where does it exist: Spatio-temporal video grounding for multi-form sentences Z Zhang, Z Zhao, Y Zhao, Q Wang, H Liu, L Gao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 113 | 2020 |
Cascaded prediction network via segment tree for temporal video grounding Y Zhao, Z Zhao, Z Zhang, Z Lin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 84 | 2021 |
UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis Z Zhang, J Ma, C Zhou, R Men, Z Li, M Ding, J Tang, J Zhou, H Yang Advances in Neural Information Processing Systems 34, 2021 | 74 | 2021 |
Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks. Z Zhao, Z Zhang, S Xiao, Z Yu, J Yu, D Cai, F Wu, Y Zhuang IJCAI 2018, 2018 | 72 | 2018 |
Regularized two-branch proposal networks for weakly-supervised moment retrieval in videos Z Zhang, Z Lin, Z Zhao, J Zhu, X He Proceedings of the 28th ACM International Conference on Multimedia, 4098-4106, 2020 | 71 | 2020 |
Moment retrieval via cross-modal interaction networks with query reconstruction Z Lin, Z Zhao, Z Zhang, Z Zhang, D Cai IEEE Transactions on Image Processing 29, 3750-3762, 2020 | 57 | 2020 |
Long-form video question answering via dynamic hierarchical reinforced networks Z Zhao, Z Zhang, S Xiao, Z Xiao, X Yan, J Yu, D Cai, F Wu IEEE Transactions on Image Processing 28 (12), 5939-5952, 2019 | 42 | 2019 |
Multi-turn video question answering via hierarchical attention context reinforced networks Z Zhao, Z Zhang, X Jiang, D Cai IEEE Transactions on Image Processing 28 (8), 3860-3872, 2019 | 33 | 2019 |
Connecting language and vision for natural language-based vehicle retrieval S Bai, Z Zheng, X Wang, J Lin, Z Zhang, C Zhou, H Yang, Y Yang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 32 | 2021 |
Video moment retrieval with noisy labels W Pan, Z Zhao, W Huang, Z Zhang, L Fu, Z Pan, J Yu, F Wu IEEE Transactions on Neural Networks and Learning Systems 35 (5), 6779-6791, 2022 | 29 | 2022 |
Temporal textual localization in video via adversarial bi-directional interaction networks Z Zhang, Z Zhao, Z Zhang, Z Lin, Q Wang, R Hong IEEE Transactions on Multimedia 23, 3306-3317, 2020 | 29 | 2020 |
Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding Z Zhang, Z Zhao, Z Lin, B Huai, NJ Yuan IJCAI 2020, 2020 | 27 | 2020 |
Text-guided image inpainting Z Zhang, Z Zhao, Z Zhang, B Huai, J Yuan Proceedings of the 28th ACM International Conference on Multimedia, 4079-4087, 2020 | 20 | 2020 |
Open-ended long-form video question answering via hierarchical convolutional self-attention networks Z Zhang, Z Zhao, Z Lin, J Song, X He IJCAI 2019, 2019 | 16 | 2019 |
Localizing Unseen Activities in Video via Image Query Z Zhang, Z Zhao, Z Lin, J Song, D Cai IJCAI 2019, 2019 | 14 | 2019 |
Learning to Rehearse in Long Sequence Memorization Z Zhang, C Zhou, J Ma, Z Lin, J Zhou, H Yang, Z Zhao ICML 2021, 2021 | 12 | 2021 |