MULTI-MEDIA LAB

  • Yuxing Han, Jiangtao Wen, Jisheng Li, HU Yubin,
    US Patent
    The present disclosure relates to a target tracking method, device, medium and apparatus, belongs to the technical field of computers, and can ensure the completeness of tracking. The target tracking method comprises performing object detection on all telephoto image frames captured by telephoto cameras in a multi-camera system, and constructing a detection point set based on object detection result for each frame of the telephoto image; selecting a target tracking point from each detection point set based on the starting tracking point; and connecting the selected target tracking points as a tracking sequence for the starting tracking point.
  • Yuxing Han, Jiangtao Wen, Minhao Tang, Yu Zhang, Jiawen GU, Bichuan Guo, Ziyu Zhu,
    US Patent
    A video processing method, including: acquiring a plurality of raw videos, the plurality of raw videos being videos acquired by a plurality of dynamic image recorders arranged according to preset locations; determining an overlapping region between every two adjacent raw videos according to preset rules corresponding to the preset locations, the adjacent raw videos being the raw videos acquired by the dynamic image recorders arranged adjacent to each other; performing multi-stage optical flow calculation on the raw videos in each overlapping region to obtain a plurality of pieces of target optical flow information; and splicing the overlapping region of every two adjacent raw videos based on the target optical flow information to obtain target videos.
  • Yanghao Li, Xinyao Chen, Jisheng Li, Jiangtao Wen, Yuxing Han, Shan Liu, Xiaozhong Xu,
    ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing
    Rate control is a critical part for video compression, especially in bandwidth-limited tasks such as live and broadcast. The newly-rising learned video compression has shown advantageous rate-distortion (RD) performance in previous research, but lack of rate control heavily limits its usage in real coding scenarios. In this work, we present the first rate control scheme tailored for learned video compression. Specifically, we explore the inter-frame dependency of learned video compression and propose a novel R-D-λ model accordingly for efficient rate allocation. Additionally, a staged update algorithm is developed for robust parameter estimation. Experiments on public datasets show that, the proposed rate control scheme achieves low rate error while maintaining equal or even higher RD performance, without introducing coding time overhead.
  • Xinrong Zhang, Zihou Ren, Xi Li, Shuqi Liu, Yunlong Deng, Yadi Xiao, Yuxing Han, Jiangtao Wen,
    arXiv preprint arXiv:2201.02915
    The deluge of new papers has significantly blocked the development of academics, which is mainly caused by author-level and publication-level evaluation metrics that only focus on quantity. Those metrics have resulted in several severe problems that trouble scholars focusing on the important research direction for a long time and even promote an impetuous academic atmosphere. To solve those problems, we propose Phocus, a novel academic evaluation mechanism for authors and papers. Phocus analyzes the sentence containing a citation and its contexts to predict the sentiment towards the corresponding reference. Combining others factors, Phocus classifies citations coarsely, ranks all references within a paper, and utilizes the results of the classifier and the ranking model to get the local influential factor of a reference to the citing paper. The global influential factor of the reference to the citing paper is the product of the local influential factor and the total influential factor
  • 2021 Patent Camera
    Jiangtao Wen, Yuxing Han, Zhong Bao,
    US Patent
    FIG. 1 is a perspective view of a camera showing our new design; FIG. 2 is a front view thereof; FIG. 3 is a rear view thereof; FIG. 4 is a left side view thereof; FIG. 5 is a right side view thereof; FIG. 6 is a top view thereof; and, FIG. 7 is a bottom view thereof.
  • Jisheng Li, Yuze He, Jinghui Jiao, Yubin Hu, Yuxing Han, Jiangtao Wen,
    Proceedings of the 29th ACM International Conference on Multimedia
    Three-degrees-of-freedom (3-DoF) omnidirectional imaging has been widely used in various applications ranging from street maps to 3-DoF VR live broadcasting. Although allowing for navigating viewpoints rotationally inside a virtual world, it does not provide motion parallax key for human 3D perception. Recent research mitigates this problem by introducing 3 transitional degrees of freedom (6-DoF) using multi-sphere images (MSI) which is beginning to show promises in handling occlusions and reflective objects. However, the design of MSI naturally limits the range of authentic 6-DoF experiences, as existing mechanisms for MSI rendering cannot fully utilize multi-layer information when synthesizing novel views between multiple MSIs. To tackle this problem and extend the 6-DoF range, we propose an MSI interpolation pipeline that utilizes adjacent MSIs' 3D information embedded inside their layers. In this work …
  • Jisheng Li, Yuze He, Yubin Hu, Yuxing Han, Jiangtao Wen,
    2021 IEEE International Conference on Image Processing
    Omnidirectional video is an essential component of Virtual Reality. Although various methods have been proposed to generate content that can be viewed with six degrees of freedom (6-DoF), existing systems usually involve complex depth estimation, image inpainting or stitching pre-processing. In this paper, we propose a system that uses a 3D ConvNet to generate a multi-sphere images (MSI) representation that can be experienced in 6-DoF VR. The system utilizes conventional omnidirectional VR camera footage directly without the need for a depth map or segmentation mask, thereby significantly simplifying the overall complexity of the 6-DoF omnidirectional video composition. By using a newly designed weighted sphere sweep volume (WSSV) fusing technique, our approach is compatible with most panoramic VR camera setups. A ground truth generation approach for high-quality artifact-free 6-DoF contents …
  • Jiangtao Wen, Yuxing Han, LI Yanghao, GU Jiawen, Rui Zhang,
    US Patent
    An image processing method, device, storage medium and camera are provided. The method, applied to the camera, comprises: capturing a target image; acquiring a target feature map of the target image through a preset target convolution layer, wherein the target convolution layer includes at least one of a plurality of convolution layers of a convolutional neural network (CNN); and outputting the target feature map. That is to say, after the target image is captured by the camera, the target feature map may be acquired by processing the target image through the target convolution layer pre-integrated in the camera. In this way, the camera transmits the target feature map only to reduce the transmitted data volume, thereby being capable of shortening transmission delay and saving bandwidth required by image transmission.
合作伙伴