Xu Tang's Homepages

阿里巴巴全球视频云创新挑战赛发布视频人像分割数据集

2021-07-10T06:54:11.000Z

作为阿里摩酷实验室智能创作&互动特效团队负责人，受邀担任阿里巴巴全球视频云创新挑战赛算法赛道评委，并且参与发布视频人像分割数据集。

阿里巴巴全球视频云创新挑战赛评委

2021-07-09T06:54:11.000Z

受邀担任阿里巴巴全球视频云创新挑战赛算法赛道评委。

News---paper accepted by ACMMM21: Decoupled IoU Regression for Object Detection

2021-07-04T02:45:08.000Z

##Decoupled IoU Regression for Object Detection

Abstract

Non-maximum suppression(NMS) is widely used in object detection pipelines for removing duplicated bounding boxes. The inconsistency between the confidence for NMS and the real localization confidence seriously affects detection performance. Prior works propose to predict Intersection-over-Union (IoU) between bounding boxes and corresponding ground-truths to improve NMS, while accurately predicting IoU is still a challenging problem. We argue that the complex definition of IoU and feature misalignment make it difficult to predict IoU accurately. In this paper, we propose a novel Decoupled IoU Regression(DIR) model to handle these problems. The proposed DIR decouples the traditional localization confidence metric IoU into two new metrics, Purity and Integrity. Purity reflects the proportion of the object area in the detected bounding box, and Integrity refers to the completeness of the detected object area. Separately predicting Purity and Integrity can divide the complex mapping between the bounding box and its IoU into two clearer mappings and model them independently. In addition, a simple but effective feature realignment approach is also introduced to make the IoU regressor work in a hindsight manner, which can make the target mapping more stable. The proposed DIR can be conveniently integrated with existing two-stage detectors and significantly improve their performance. Through a simple implementation of DIR with Faster R-CNN, we obtain 41.9% AP on MS COCO benchmark under ResNet101 backbone, which outperforms previous methods by a large margin and achieves state-of-the-art.

News---paper accepted by ACMMM21: Deep Interactive Video Inpainting: an Invisibility Cloak for Harry Potter

2021-07-04T01:45:08.000Z

##Deep Interactive Video Inpainting: an Invisibility Cloak for Harry Potter

Abstract

In this paper, we propose a new task of deep interactive video inpainting and an application for users interact with the machine. To our knowledge, this is the first deep learning based interactive video inpainting work that only uses a free form user input as guidance (i.e. scribbles) instead of mask annotations for each frame, which has academic, entertainment, and commercial value. With users’ scribbles on a certain frame, it can simultaneously perform interactive video object segmentation and video inpainting tasks throughout the whole video. We utilize a shared spatial-temporal memory module, which combines the interactive video object segmentation and video inpainting tasks into an end-to-end pipeline. In our framework, the past frames with object masks(either the user’s scribbles or the predicted masks) form an external memory, and the current frame as the query is segmented and inpainted using the information in the shared memory. Furthermore, our method allows users to iteratively refine the segmentation results, which can effectively improve the inpainting results where the video object segmentation fails, thus allowing users to obtain high-quality video inpainting results even on challenging sequences. Qualitative and quantitative experimental results demonstrate the superiority of our approach.

受邀AI科技评论专访：“我是一名AI视频up主，日更万部：这是我对人类世界的理解”

2021-07-01T06:54:11.000Z

受邀AI科技评论专访。，相关文章见链接:我是一名AI视频up主，日更万部：这是我对人类世界的理解

阅读量过万。

阿里文娱速看短视频自动化生产解决方案

2021-05-27T06:54:11.000Z

阿里文娱速看短视频自动化生产解决方案

随着用户的时间碎片化程度加剧，视频“由长变短”成为一种趋势，信息流场景下的短视频消费需求日益增长，优酷每年为用户提供大量优质视频资源，具备天然的“由长变短”优势，并通过算法研究在速看短视频的自动化生产方面取得突破。

AI自动剪辑的目标是通过算法手段全自动或半自动进行视频剪辑，借助机器的批量化优势实现批量化生产，能够提升内容生产效率，提升短视频运营和分发效率。目前全网人工短视频生产集中在头部IP，AI自动剪辑可以为腰、尾部版权IP内容进行定向供货，带来新的流量增长点。

目前优酷已经将AI算法能力赋能到了多个业务场景，比如优酷弹幕看点提取、视频理解标签、剧集前情提要、智能封面图、视频速看解说等。例如，智能封面图能力不但支持短视频智能生产，还作为媒资的基础服务开放给UPGC，应用于优酷号上传、优酷搜索、短/小视频推荐等场景。

与此同时，还搭建了前情提要“机器生产+人工审核+广告生成”的生产链路，相比纯人工生产的前情提要，新链路将生产时长从天级别压缩到分钟级别，极大地提高了生产效率。

浙大蔡登老师实验室分享---视频多模态理解&互动特效的研究与技术实践

2021-04-20T06:54:11.000Z

浙大蔡登老师实验室分享—视频多模态理解&互动特效的研究与技术实践，相关ppt参考链接：https://pan.baidu.com/s/1JkTfnyhT6HbsW53EYfyksg
提取密码：tgva （已报备）

链接: https://pan.baidu.com/s/1JkTfnyhT6HbsW53EYfyksg 密码: tgva

分享内容包括：
人脸互动特效（换脸，人脸风格化、人脸编辑、人脸属性等）
视频浓缩，视频看点提取，视频解说等。

DataFun峰会知识图谱与智能创作论坛---阿里文娱视频智能生产技术实践

2021-03-27T06:54:11.000Z

DataFun峰会知识图谱与智能创作论坛技术分享，阿里文娱视频智能生产技术实践。

分享我们在视频智能生产和创作上，近期的进展，包括视频切条、视频混剪&二创、视频浓缩、视频解说、文本视频化等。

相关ppt见链接：https://pan.baidu.com/s/1KfPKhqIxk9sKGgj9FlmCVw
提取码：43C3 （已报备）

News:Join in Alibaba

2020-06-22T06:54:11.000Z

很荣幸能够加入阿里巴巴，希望在接下来的几年能够继续努力、高效、快乐的工作，做出更多有意义、有价值、有影响力的科研成果和产品。
目前我在阿里文娱负责视频智能创作&互动特效方向。现主要研究方向包含两大块—视频智能创作：Video Summary/ Video Grounding/ 视频智能解说/ 文本视频化/ Text Video Retrieval，人脸互动特效：人脸检测跟踪/人脸编辑/ 人脸风格化/人脸换脸/人脸属性等。

另外，本人在招Research Intern和社招。欢迎感兴趣的同学可以邮件（或微信）联系我：buhui.tx@alibaba-inc.com
期待你的加入。

字节跳动---人脸检测技术发展及小尺度人脸检测“框”实践

2020-06-19T06:54:11.000Z

荣幸受邀，在字节跳动分享了自己过往在人脸检测领域的成果和思考。

相关资料参考链接：techbeat官网链接

News:1 oral papers accepted by ACMMM20

2020-05-16T01:45:08.000Z

Learning Global Structure Consistency for Robust Object Tracking

Abstract

Fast appearance variations and the distractions of similar objects are two of the most challenging problems in visual object tracking. Unlike many existing trackers that focus on modeling only the target, in this work, we consider the \emph{transient variations of the whole scene}. The key insight is that the object correspondence and spatial layout of the whole scene are consistent (i.e., global structure consistency) in consecutive frames which helps to disambiguate the target from distractors. Moreover, modeling transient variations enables to localize the target under fast variations. Specifically, we propose an effective and efficient short-term model that learns to exploit the global structure consistency in a short time and thus can handle fast variations and distractors. Since short-term modeling falls short of handling occlusion and out of the views, we adopt the long-short term paradigm and use a long-term model that corrects the short-term model when it drifts away from the target or the target is not present. These two components are carefully combined to achieve the balance of stability and plasticity during tracking. We empirically verify that the proposed tracker can tackle the two challenging scenarios and validate it on large scale benchmarks. Remarkably, our tracker improves state-of-the-art-performance on VOT2018 from 0.440 to 0.460, GOT-10k from 0.611 to 0.640, and NFS from 0.619 to 0.629.

News:2nd place in CVPR UG2+ Challenge 2020

2020-05-16T01:35:59.000Z

2nd place in the (SEMI-)SUPERVISED FACE DETECTION IN LOW LIGHT CONDITIONS track on CVPR UG2+ Challenge 2020.
The fact sheet can be seen here.

将门---人脸检测技术发展及小尺度人脸检测“框”实践

2020-05-07T06:54:11.000Z

将门社区技术分享，techbeat官网链接 / bilibili直播链接。
ppt文件分享，Baidu Cloud
链接:https://pan.baidu.com/s/1cYtc_aDyDFogjTl47VyfvA 密码:sz34

One tech talk

2020-04-25T15:19:43.000Z

Someone share video of my talk on youtube, please see details here.
https://www.youtube.com/watch?reload=9&v=kA9FWQjjU_4&list=PLiG8_90geV
https://www.youtube.com/watch?v=FXC0b9yNOX0

人脸检测技术发展及百度“框”实践

2020-04-21T08:34:41.000Z

撰文《人脸检测技术发展及百度“框”实践》，发表在机器之心栏目。

News:2 papers accepted by CVPR2020

2020-04-21T08:25:04.000Z

HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

Abstract

Current face detectors utilize anchors to frame a multi-task learning problem which combines classification and bounding box regression. Effective anchor design and anchor matching strategy enable face detectors to localize faces under large pose and scale variations. However, we observe that more than 80% correctly predicted bounding boxes are regressed from the unmatched anchors (the IoUs between anchors and target faces are lower than a threshold) in the inference phase. It indicates that these unmatched anchors perform excellent regression ability, but the existing methods neglect to learn from them. In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors. Our proposed HAMBox method could be a general strategy for anchor-based single-stage face detection. Experiments on various datasets, including WIDER FACE, FDDB, AFW and PASCAL Face, demonstrate the superiority of the proposed method. Furthermore, our team win the championship on the Face Detection test track of WIDER Face and Pedestrian Challenge 2019. We will release the codes with PaddlePaddle.

BFBox: Searching Face-appropriate Backbone and Feature Pyramid Network for Robust Face Detector

Abstract

本文提出的方法BFBox是基于神经网络架构搜索（NAS）的方法同时搜索适合人脸检测的特征提取器和特征金字塔。动机是我们发现了一个有趣的现象：针对图像分类任务设计的流行的特征提取器已经在通用目标检测任务上验证了其重要的兼容性，然而在人脸检测任务上却没有取得预期的效果。同时不同的特征提取器与特征金字塔的结合也不是完全正相关的。首先，本文对于比较好的特征提取器进行分析，提出了适合人脸的搜索空间；其次，提出了图1的特征金字塔注意力模块（FPN-attention Module）去加强特征提取器和特征金字塔之间的联系；最后, 采取SNAS的方法同时搜出适和人脸的特征提取器和特征金字塔结构。多个数据集上（WIDER FACE, FDDB, AFW和PASCAL Face）的实验表明了我们提出的方法的优越性。
如下图所示为检测网络的结构。网络是基于RetinaNet的结构加上我们提出的特征金字塔注意力模块（FPN-attention Module），训练超网络时采用的是随机采样的方法。

Our Open Source Projects

2020-04-21T08:06:31.000Z

基于身份保持的条件对抗生成网络的人脸老化IPCGAN (CVPR2018)
https://github.com/dawei6875797/Face-Aging-with-Identity-Preserved-Conditional-Generative-Adversarial-Networks
PyramidBox人脸检测器 (ECCV2018)
https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/face_detection
人脸检测轻量化模型faceboxes和blazeface
https://github.com/PaddlePaddle/PaddleDetection/tree/release/0.2/configs/face_detection
[抗击肺炎] 口罩人脸检测与分类
https://www.paddlepaddle.org.cn/hub/scene/maskdetect

Share our PPT about 'Delveing into High Performance Detector for Finding Tiny Faces' on ICCV

2019-11-04T05:54:11.000Z

On 11.02.2019, we give a presentation on ICCV 2019 Workshop
Face Recognition in the Wild, and this is the slides.
Link:
Baidu Cloud
Google Drive

News:Our paper and code will be released in this page.

2019-10-27T04:36:59.000Z

1st place and 1 invited talk in face detection track on ICCV Wider Challenge 2019.
More details, including tech report and code, will be introduced in this page.

TBD …

ICCV Wider Challenge优胜方案

News:1 paper accepted by TIFS2019

2019-09-04T15:36:59.000Z

1 paper Progressively Refined Face Detection Through Semantics-Enriched Representation Learning accepted by IEEE Transactions on Information Forensics and Security (TIFS) — CCF A.