Aim and Scope:
With the rapid development of artificial intelligence and computer vision, visual perception and understanding technologies are continuously breaking through traditional boundaries, becoming a core support for enhancing the perceptual and decision-making capabilities of intelligent systems. At present, empowered by cutting-edge technologies such as deep learning, Transformer architectures, multimodal fusion, and large-scale pre-trained models, visual perception has evolved from static images to dynamic videos, and from single-modal input to multimodal collaboration. Meanwhile, visual understanding has expanded from classification and recognition to semantic reasoning and cross-domain generalization. Related research is showing a dual innovation trend in both theoretical methods and application scenarios.
The session covers a wide range of research topics, including the refinement and extension of fundamental visual theories and classical methods, such as traditional image segmentation algorithms, feature extraction theories, and visual perception mechanisms; innovative applications of deep learning modeling approaches, including the evolution of convolutional network architectures, the application of self-attention mechanisms and visual Transformers, graph neural network modeling, generative and diffusion models for segmentation assistance, neural architecture search, and lightweight model optimization; the continuous evolution of core semantic segmentation technologies, encompassing fully convolutional and encoder-decoder structures, multi-scale feature fusion, attention mechanism enhancements, weakly supervised and semi-supervised learning, instance and panoptic segmentation, continual learning, cross-domain adaptation, as well as video and 3D data segmentation; and the exploration of new paradigms in visual understanding, such as the application of foundation and large-scale models, cross-modal perception and open-world segmentation, few-shot and zero-shot learning, self-supervised and unsupervised training strategies, interactive and prompt-driven segmentation, and studies on the interpretability and reliability of segmentation models. Beyond the aforementioned topics, contributions in other relevant areas pertaining to visual perception and understanding are equally encouraged.
This special session, “Advances in Visual Perception and Understanding”, focuses on key modeling approaches, task mechanisms, and system-level practices in the field of vision. It aims to systematically present new progress in integrated research on "perception–understanding–decision-making", and to promote the development of computer vision towards higher precision, stronger generalization, and more intelligent cognition. The session welcomes research on fundamental theories, model innovations, algorithm optimization, and comprehensive applications across multi-source data and diverse tasks, including but not limited to the following topics:
1. Feature modeling, structure design, and semantic representation in visual perception
2. Innovative applications of Transformers, graph neural networks, diffusion models, etc., in visual modeling
3. Novel modeling approaches for self-supervised, weakly supervised, and semi-supervised learning
4. Video understanding, 3D perception, and multimodal perception modeling
5. Model optimization for typical tasks such as image segmentation, object detection, image generation, and visual question answering
6. Visual generalization learning in open-world, zero-shot, and few-shot settings
7. Construction and application of foundational vision models, prompt-based learning, and large-scale vision models
8. Visual intelligent systems with interpretability, robustness, and task adaptability
In addition to the aforementioned topics, other research areas related to visual perception and understanding are also welcome.
This special session looks forward to gathering diverse, cutting-edge, and cross-disciplinary research contributions, and to promoting continuous innovation and broad application of visual technologies toward higher precision, stronger generalization, reduced supervision dependence, and enhanced intelligent cognition.
Acknowledgements:
This special session is organized by Prof. Yuefei Wang (Chengdu University, China).
Introduction to organizers:
Yuefei Wang received the Ph.D. degree in Computer Application Technology in 2019. He is currently an Associate Professor at the School of Computer Science, Chengdu University, China. His research interests include fully supervised and semi-supervised semantic segmentation, deep clustering, and multi-view anomaly detection. He has published 14 SCI-indexed papers as the first or corresponding author, including 6 papers in TOP journals, as well as several papers indexed by EI and Chinese core journals.
Submission Process:
If you wish to participate in this special session--AVPU, please submit your manuscript through the ConfSync:https://confsync.cn/csae/submission and select the Section "Advances in Visual Perception and Understanding".We will assign your submission to Prof.Yuefei Wang for a preliminary review. After passing the preliminary review, your manuscript will undergo a secondary review by experts. Notifications of acceptance will be issued concurrently with the main conference notifications. For any questions, please contact: info@confsync.cn.