Title
题目
Mammography classification with multi-view deep learning techniques:Investigating graph and transformer-based architectures
基于多视角深度学习技术的乳腺X线分类:图神经网络与Transformer架构的研究
01
文献速递介绍
乳腺X线检查是乳腺癌筛查的主要成像方式,也是降低乳腺癌死亡率的最重要工具之一(Broeders et al., 2012; Morra et al., 2015)。由于筛查乳腺X线检查的读片量大、诊断任务明确且采集过程相对标准化,因此它是实现自动化或半自动化读片的理想候选对象。近期研究(Rodríguez-Ruiz et al., 2019; Kyono et al., 2018; Dembrower et al., 2020)表明,深度学习系统有望提供独立评估,从而减轻放射科医生的负担。然而,设计适用于乳腺X线检查的深度学习系统依然面临诸多挑战:癌症患病率低于1%,乳腺X线筛查典型属于“从大海捞针”的问题,需要非常大且丰富的数据集以实现高性能(Wu et al., 2020; Schaffter et al., 2020)。需要处理高分辨率图像(Wu et al., 2020)。需要整合多个尺度(Shen et al., 2021b; Pinto Pereira et al., 2009)和多视角信息(Van Schie et al., 2011; Samulski 和 Karssemeijer, 2011; Perek et al., 2018; Famouri et al., 2020; Ren et al., 2021)。
一种完整的自动处理筛查乳腺X线的方法是所谓的多视角架构,它将筛查检查中通常包含的四个视图信息结合起来,生成检查级分类分数(例如,检查是否可能包含癌症的概率)。多视角架构能够同时执行同侧分析(ipsi-lateral analysis)和对侧分析(contra-lateral analysis):
同侧分析通过结合颅尾(CC)和中侧斜位(MLO)视图,解决高乳腺密度和组织叠加效应(Sacchetto et al., 2016; Wei et al., 2011; Van Gils et al., 1998; Ren et al., 2021; Samulski 和 Karssemeijer, 2011)。
对侧分析整合两侧乳腺的信息,例如检测单独分析视图可能无法显现的不对称性(Rangayyan et al., 2007)。这些架构的一个优势是,理论上可以通过检查级标签进行训练,绕过获取昂贵的像素级监督的需求。在 DREAM 挑战中,首次尝试使用弱监督图像标签训练深度神经网络(DNNs)的结果表明,使用强监督外部数据训练的DNN显著优于仅依赖图像标签的DNN(Schaffter et al., 2020)。随着深度学习架构的进步,基于图像级监督的乳腺癌检测性能得到了显著提升(Wu et al., 2020; Shen et al., 2021b)。
尽管大多数最新的乳腺X线解决方案基于卷积神经网络(CNNs),尤其是残差网络,文献中也出现了一些替代性深度架构:
视觉Transformer(ViT) 在多个医学和非医学任务中表现优于CNN(Dosovitskiy et al., 2020; He et al., 2022; Xu et al., 2022; Matsoukas et al., 2022)。相比CNN,Transformer 在以下三方面表现出色:
优化计算资源分配到图像的相关区域(像素并非同等重要)。
优化语义编码。通过自注意力机制连接空间上远距离的语义特征(Dosovitskiy et al., 2020)。基于图的架构被设计为显式模拟放射科医生的解读模式,同时进行同侧和对侧分析(Ren et al., 2021; Du et al., 2019; Liu et al., 2021b; Zhang et al., 2021; Yang et al., 2021)。这些架构利用同侧视图解决组织叠加问题,并通过联合分析空间共定位且视觉特征相似的结构,识别潜在病变(Wei et al., 2011; Samulski 和 Karssemeijer, 2011; Ren et al., 2021; Yang et al., 2021)。在本文中,我们针对不同归纳偏置的多视角架构进行了直接比较,研究贡献如下:扩展现有基于Transformer(van Tulder et al., 2021; Matsoukas et al., 2022)和基于图卷积网络(GCNs)(Liu et al., 2021b)的架构以处理四个乳腺X线视图。引入一种新的基于Transformer的架构,融合了同侧与对侧跨视图注意力机制。不仅从性能上评估不同架构,还评估其整合局部和全局特征的方式。结果表明,不同架构在本质上具有互补性,对特定特征表现出敏感性,结合多种架构可以更有效地进行乳腺癌检测,即使Transformer在整体上优于基于卷积的架构。
论文余下部分结构如下:第二部分回顾了用于检查级乳腺X线分析的主要架构;第三部分分析了实验中探索的架构;数据集和实验设置分别在第四和第五部分描述;结果和讨论分别在第六和第七部分呈现;最后在第八部分得出简要结论。
Abatract
摘要
The potential and promise of deep learning systems to provide an independent assessment and relieveradiologists’ burden in screening mammography have been recognized in several studies. However, the lowcancer prevalence, the need to process high-resolution images, and the need to combine information frommultiple views and scales still pose technical challenges. Multi-view architectures that combine informationfrom the four mammographic views to produce an exam-level classification score are a promising approach tothe automated processing of screening mammography. However, training such architectures from exam-levellabels, without relying on pixel-level supervision, requires very large datasets and may result in suboptimalaccuracy. Emerging architectures such as Visual Transformers (ViT) and graph-based architectures can potentially integrate ipsi-lateral and contra-lateral breast views better than traditional convolutional neural networks,thanks to their stronger ability of modeling long-range dependencies. In this paper, we extensively evaluatenovel transformer-based and graph-based architectures against state-of-the-art multi-view convolutional neuralnetworks, trained in a weakly-supervised setting on a middle-scale dataset, both in terms of performance andinterpretability. Extensive experiments on the CSAW dataset suggest that, while transformer-based architectureoutperform other architectures, different inductive biases lead to complementary strengths and weaknesses, aseach architecture is sensitive to different signs and mammographic features. Hence, an ensemble of differentarchitectures should be preferred over a winner-takes-all approach to achieve more accurate and robust results.Overall, the findings highlight the potential of a wide range of multi-view architectures for breast cancerclassification, even in datasets of relatively modest size, although the detection o