[go: up one dir, main page]

Suivre
Xiaojie Jin, 靳潇杰
Xiaojie Jin, 靳潇杰
Bytedance Research, USA
Adresse e-mail validée de bytedance.com - Page d'accueil
Titre
Citée par
Citée par
Année
Dual path networks
Y Chen, J Li, H Xiao, X Jin, S Yan, J Feng
NIPS, 4467-4475, 2017
11452017
Deepvit: Towards deeper vision transformer
D Zhou, B Kang, X Jin, L Yang, X Lian, Z Jiang, Q Hou, J Feng
arXiv preprint arXiv:2103.11886, 2021
8782021
Conflict-averse gradient descent for multi-task learning
B Liu, X Liu, X Jin, P Stone, Q Liu
NeurIPS, 2021, 34, 18878-18890, 2021
6232021
Deep learning with s-shaped rectified linear activation units
X Jin, C Xu, J Feng, Y Wei, J Xiong, S Yan
AAAI, 2016, 2016
3042016
All tokens matter: Token labeling for training better vision transformers
ZH Jiang, Q Hou, L Yuan, D Zhou, Y Shi, X Jin, A Wang, J Feng
NeurIPS, 2021, 34, 18590-18602, 2021
2922021
Contrastive masked autoencoders are stronger vision learners
Z Huang, X Jin, C Lu, Q Hou, MM Cheng, D Fu, X Shen, J Feng
IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (4), 2506-2517, 2023
2792023
Deep self-taught learning for weakly supervised object localization
Z Jie, Y Wei, X Jin, J Feng, W Liu
CVPR, 2017, 1377-1385, 2017
2462017
Pixellm: Pixel reasoning with large multimodal model
Z Ren, Z Huang, Y Wei, Y Zhao, D Fu, J Feng, X Jin
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2252024
Tree-structured reinforcement learning for sequential object localization
Z Jie, X Liang, J Feng, X Jin, W Lu, S Yan
NIPS, 2016, 127-135, 2016
1612016
Human-centric spatio-temporal video grounding with visual transformers
Z Tang, Y Liao, S Liu, G Li, X Jin, H Jiang, Q Yu, D Xu
IEEE Transactions on Circuits and Systems for Video Technology 32 (12), 8238 …, 2021
1552021
Atomnas: Fine-grained end-to-end neural architecture search
J Mei, Y Li, X Lian, X Jin, L Yang, A Yuille, J Yang
ICLR 2020, 2019
1552019
Video scene parsing with predictive feature learning
X Jin, X Li, H Xiao, X Shen, Z Lin, J Yang, Y Chen, J Dong, L Liu, Z Jie, ...
ICCV, 2017, 5580-5588, 2017
1512017
Vista-llama: Reducing hallucination in video language models via equal distance to visual tokens
F Ma, X Jin, H Wang, Y Xian, J Feng, Y Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
127*2024
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
H Zhang, Y Wang, Y Tang, Y Liu, J Feng, X Jin
arXiv preprint arXiv:2506.23825, 2025
105*2025
Refiner: Refining self-attention for vision transformers
D Zhou, Y Shi, B Kang, W Yu, Z Jiang, Y Li, X Jin, Q Hou, J Feng
arXiv preprint arXiv:2106.03714, 2021
972021
Training skinny deep neural networks with iterative hard thresholding methods
X Jin, X Yuan, J Feng, S Yan
arXiv preprint arXiv:1607.05423, 2016
962016
Predicting scene parsing and motion dynamics in the future
X Jin, H Xiao, X Shen, J Yang, Z Lin, Y Chen, Z Jie, J Feng, S Yan
NIPS, 2017, 6915-6924, 2017
882017
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
M Ding, X Lian, L Yang, P Wang, X Jin, Z Lu, P Luo
CVPR, 2021, 2982-2992, 2021
872021
Neural Architecture Search for Lightweight Non-Local Networks
Y Li, X Jin, J Mei, X Lian, L Yang, C Xie, Q Yu, Y Zhou, S Bai, AL Yuille
CVPR, 10297-10306, 2020
762020
Token labeling: Training a 85.5% top-1 accuracy vision transformer with 56m parameters on imagenet
Z Jiang, Q Hou, L Yuan, D Zhou, X Jin, A Wang, J Feng
arXiv preprint arXiv:2104.10858 3 (6), 7, 2021
622021
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–20