AI paper index

Combating Data and Target Shifts in Visual Tasks

2031-01-01 · Open MIND

One-line summary

An AI research paper on Combating Data and Target Shifts in Visual Tasks.

Engineering notes

Engineering notes will be added by the aipentium editorial team.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为大语言模型、生成式AI、ChatGPT相关技术、计算机视觉、深度学习等高价值论文补充中文说明。

Original abstract

The rapid advancement of deep learning models for visual tasks has led to significant progress in many domains. However, a key challenge remains: ensuring that models can generalize effectively to unseen samples or novel classes, especially in real-world scenarios where training data-target pairs are often limited or unavailable, and test data may exhibit shifts in both data and targets. This thesis addresses the problem of shifts across data and targets, proposing novel methods to improve model generalization and transferability in complex settings. The thesis identifies the different shift scenarios for visual tasks and presents their problem forms. The research first relaxes the static target distribution assumption in Multi-Domain Generalization (mDG) tasks, introducing the General Multi-Domain Generalization (GMDG) objective to improve generalization under varying data shifts. Specifically, Extensive experiments validate that GMDG is feasible for classification, segmentation, and regression tasks. Further considering extreme target distribution shifts where even unknown novel classes emerge, the study also explores logical regularization techniques for tasks such as Generalized Category Discovery (GCD), mDG+GCD, and Class Incremental Learning (CIL), proposing a sample-based logical regularization term (L-Reg) to enhance data-, target-, and all-shift generalization. Additionally, a partial logic framework is introduced to address challenges in CIL, enabling models to retain room for unknown classes during training. This is further extended by the Partial-Logic Regularization (PL-Reg), which improves generalization across GCD, mDG+GCD, and transferability for CIL tasks. Furthermore, the thesis proposes a novel Semantic-aware Data Augmentation (SADA) framework for cross-modality generalization in text-to-image synthesis (T2Isyn), text-image retrieval (TIR), and image-text retrieval (ITR) tasks. This framework ensures semantic preservation across text and image modalities, improving both semantic consistency and image quality in cross-modality tasks. The methods proposed in this thesis offer significant advancements in visual generalization and transfer scenarios, providing theoretical insights and practical solutions for handling data shifts, target shifts, and multi-modalities in complex scenarios. Extensive experiments validate the effectiveness of the proposed methods, demonstrating their superior performance across a range of tasks.

5.0Engineering value
7.0Research novelty
4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

aipentium can prepare a custom AI literature review, code map, dataset map, and B2B technology assessment.

Request B2B AI research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment