节点文献
自然对抗条件下的鲁棒视觉表征方法研究
On Robust Visual Representation Methodology for Natural Adversarial Conditions
【作者】 翟伟;
【导师】 查正军;
【作者基本信息】 中国科学技术大学 , 网络空间安全, 2022, 博士
【摘要】 随着以深度卷积神经网络为代表的深度学习技术的快速发展,智能视觉感知技术在过去的几年内已经为人类社会带来了巨大的经济和社会效益,成为了日常生活、科学研究和社会生产中不可缺少的有力工具。但是在深入地应用过程中暴露出了许多难以解决的问题,其中最重要的一个问题便是基于深度卷积神经网络的视觉模型的可信性问题。由于基于深度卷积神经网络的视觉模型开发设计所面向的场景和真实场景之间存在不可估计的偏差,例如光照变化、遮挡、扭曲等自然因素的影响,造成视觉模型在真实场景应用过程中无法输出可信的结果。而这些基于深度卷积神经网络的视觉模型“漏洞”很有可能被不法分子所利用,并最终造成不良社会影响。模拟上述自然影响作为攻击手段的攻击类型称为自然对抗,自然对抗与非自然对抗攻击(如:PGD和FGSM)不同的是自然对抗方式更加自然、符合人类视觉认知、并且不需要考虑对抗攻击的可迁移性问题,更重要的是其构造成本低廉,即只需要对物理世界中的采集设备(摄像头)或者被采集场景(对象)加以干涉和影响、在不访问设备所收集到的图像数据的前提下就可以达到攻击的目的。自然对抗现象在当今社会中变得越来越常见。因此,保证基于深度卷积神经网络的视觉模型在部署后更少地受到自然对抗的影响,是智能视觉感知模型落地过程中需要克服的重要问题。通过对常见自然对抗攻击类型、所攻击的潜在目标以及现有视觉表征方法的分析,总结了现有基于深度卷积神经网络视觉表征方法在视觉底-中-高三个层面的问题:1)底层视觉计算中缺乏对局部视觉基元空间排布的感知能力;2)中层视觉计算中缺乏对不同邻近视觉区域之间在空间上分组特性的感知能力;3)中层视觉计算中缺乏对视觉内容构型统计特性的感知能力;4)高层视觉计算中缺乏对视觉属性关联性的感知能力。本文针对上述问题开展了以下几方面工作:(1)针对底层视觉计算中缺乏对局部视觉基元空间排布的感知能力问题。本工作通过观察发现自然图像中的多种不同视觉基元之间通常存在着内在结构依赖性,并且依此提出了一种新的深层结构揭示网络,利用视觉基元之间的空间依赖性作为鲁棒的底层视觉表示。为了明确验证所提出方法抵御基于构造图像局部扭曲的自然对抗攻击的有效性,本工作利用多个包涵丰富的空间扭曲畸变的纹理数据集构造了上述自然对抗类型的实验并进行了验证。(2)针对中层视觉计算中缺乏对不同邻近视觉区域之间在空间上分组特性的感知能力的问题。本工作通过观察发现自然图像中不同邻近局部区域之间在空间上的通常存在着连贯性,并且依此提出了一种新的深度基元连贯性网络,利用前景和背景区域的视觉基元的空间组织作为鲁棒的视觉中层线索。为了明确验证所提出方法抵御基于构造低对比度环境的自然对抗攻击的有效性,利用专门面向图像低对比度的伪装目标检测数据集构造了上述自然对抗类型的实验并进行了验证。(3)针对中层视觉计算中缺乏对视觉内容构型统计特性的感知能力的问题。本工作探索了符合人类视觉认知理论的图-底分配机制对于抵御基于构造遮挡的自然对抗攻击的有效性。并提出了一个新的图底辅助模块来学习视觉场景的构型统计,利用其来减少复杂空间结构所带来的视觉歧义/模糊。此外,本工作还设计了一个基础且良好的视觉分离测试,以清晰的验证所提出方法的图底分配能力。进一步,为了明确验证所提出方法抵御基于构造遮挡的自然对抗攻击的有效性,利用多个包涵丰富视觉歧义性的真实数据集构造了上述自然对抗类型的实验并进行了验证。(4)针对高层视觉计算中缺乏对视觉属性关联性的感知能力的问题。本工作通过观察发现自然图像中存在的多视觉属性之间通常存在关联性,并且探索了该特性对于抵御基于构造多重视觉概念的自然对抗攻击的有效性。具体地,提出了一种新型的深度多属性感知网络,通过相互强化的方式逐步学习视觉属性及其之间的关联性。为了明确验证所提出方法抵御基于构造多重视觉概念的自然对抗攻击的有效性,利用多个包涵丰富的不完美标签的纹理数据集构造了上述自然对抗类型的实验并进行了验证。基于上述研究,本文研究了自然对抗条件下的鲁棒视觉表征方法,通过在多个构造的自然对抗测试上与先前工作进行比较,验证了所提出的自然对抗条件下的鲁棒视觉表征方法的有效性和优越性。本文工作为后续关于自然对抗条件下的视觉表征研究提供了新的视角与思路。
【Abstract】 With the rapid development of deep learning technology represented by deep convolutional neural networks,intelligent visual perception technology has brought great economic and social benefits to human society in the past few years,becoming an indispensable and powerful tool in daily life,scientific research and social production.However,many difficult problems have been revealed in the process of in-depth application.One of the most critical problems is the trustworthiness of the deep convolutional neural network-based ground vision model.Due to the valuable deviations between the scenes designed for the development of deep convolutional neural network-based visual models and the natural scenes,such as lighting changes,occlusions,distortions,and other natural factors,the visual models cannot output credible results in the process of real scene applications.And these deep convolutional neural network-based visual model "vulnerabilities" are likely to be exploited by unscrupulous elements and eventually cause adverse social impacts.The type of attack that simulates the above-mentioned natural influence as a means of attack is called natural countermeasures.Natural countermeasures differ from non-natural countermeasures(e.g.,PGD and FGSM)in that natural countermeasures are more natural,conform to human visual cognition,and do not require consideration of the relatability of countermeasure attacks,and more importantly,they are inexpensive to construct,i.e.,they only require interference with the capture device(camera)or the captured scene(object)in the physical world.The purpose of the attack can be achieved without accessing the image data collected by the device by interfering and influencing the capture device(camera)or the captured scene(object)in the physical world.Natural adversarial conditions are becoming more and more common in today’s society.Therefore,ensuring that vision model based on deep convolutional neural networks are less affected by natural adversaries after deployment is a critical issue to overcome in implementing intelligent visual perception models.Through the analysis of common types of n atural adversarial attacks,potential targets attacked,and existing visual representation methods,the problems of existing visual representation methods based on deep convolutional neural networks at three levels of visual low-middle-high are summarized:1)lack of perception of the spatial arrangement of local visual primitives in low-level visual computing;2)lack of perception of the spatial grouping characteristics between different neighboring visual regions in mid-level visual computing;3)lack of perception of the statistical properties of visual content configuration in mid-level visual computing;4)lack of perception of the correlation of visual attributes in high-level visual computing.In this paper,the following work is carried out to address the above issues:(1)To address the problem of lacking the ability to perceive the spatial arrangement of local visual primitives in low-level visual computation.In this work,we observe that there are usually intrinsic structural dependencies among multiple different visual primitives in natural images and accordingly propose a new deep structure-revealed network that exploits the spatial dependencies among visual primitives as a robust underlying visual representation.To explicitly verify the effectiveness of the proposed method against natural adversarial attacks based on local distortions of constructed images,this work constructs and validates experiments of the above natural adversarial type using multiple texture datasets encompassing rich spatial distortions.(2)To address the problem of lacking the ability to perceive the spatial grouping properties between different neighboring visual regions in mid-level visual computation.In this work,we observe the usual spatial coherence between different neighboring local regions in natural images.Accordingly,we propose a new deep texton coherence network that utilizes the spatial organization of visual primitives in foreground and background regions as robust visual middle-level cues.To explicitly verify the effectiveness of the proposed method against natural adversarial attacks based on constructing lowcontrast environments,experiments of those mentioned above natural adversarial type are constructed and validated using a dataset dedicated to the detection of camouflaged objects with low contrast.(3)To address the problem of lacking the ability to perceive the statistical properties of visual content configurations in mid-level visual computing.This work explores the effectiveness of a figure-ground assignment mechanism consistent with human visual cognition theory for resisting natural adversarial attacks based on constructive occlusion.And a new figure-ground-aided module is proposed to learn the conformational statistics of visual scenes and use them to reduce the visual ambiguity/blurring caused by complex spatial structures.In addition,a preliminary and well-designed visual segregation test is designed in this work to validate the figure-ground assignment capability of the proposed method.Further,to explicitly verify the effectiveness of the proposed method against natural adversarial attacks based on constructive occlusion,experiments of those mentioned above natural adversarial types are constructed and validated using multiple real datasets that encompass rich visual ambiguity.(4)To address the problem of lacking the ability to perceive the correlation of visual attributes in high-level visual computing.In this work,we observe that correlations usually exist between multiple visual attributes present in natural images and explore the effectiveness of this property for defending against natural adversarial attacks based on constructing multiple visual concepts.Specifically,a novel deep multi-attribute perceiving network is proposed to learn visual attributes and the correlations between them step by step through mutual reinforcement.To explicitly verify the effectiveness of the proposed method against natural adversarial attacks based on constructive multiple visual concepts,experiments of the above natural adversarial type are constructed and validated using multiple texture datasets encompassing rich imperfect labels.Based on the above research,this paper investigates the robust visual representation method under natural adversarial conditions and verifies the effectiveness and superiority of the proposed robust visual representation method under natural adversarial conditions by comparing it with previous work on several constructed natural adversarial tests.This work provides new perspectives and ideas for subsequent research on visual representation under natural adversarial conditions.