Ultra-Fast Digital Tomosynthesis Reconstruction Using General-Purpose GPU Programming for Image-Guided Radiation Therapy

作者:Park Justin C; Park Sung Ho; Kim Jin Sung*; Han Youngyih; Cho Min Kook; Kim Ho Kyung; Liu Zhaowei; Jiang Steve B; Song Bongyong; Song William Y
来源:Technology in Cancer Research and Treatment, 2011, 10(4): 295-306.
DOI:10.7785/tcrt.2012.500206

摘要

The purpose of this work is to demonstrate an ultra-fast reconstruction technique for digital tomosynthesis (DTS) imaging based on the algorithm proposed by Feldkamp, Davis, and Kress (FDK) using standard general-purpose graphics processing unit (GPGPU) programming interface. To this end, the FDK-based DTS algorithm was programmed "in-house" with C language with utilization of 1) GPU and 2) central processing unit (CPU) cards. The GPU card consisted of 480 processing cores (2 x 240 dual chip) with 1,242 MHz processing clock speed and 1,792 MB memory space. In terms of CPU hardware, we used 2.68 GHz clock speed, 12.0 GB DDR3 RAM, on a 64-bit OS. The performance of proposed algorithm was tested on twenty-five patient cases (5 lung, 5 liver, 10 prostate, and 5 head-and-neck) scanned either with a full-fan or half-fan mode on our cone-beam computed tomography (CBCT) system. For the full-fan scans, the projections from 157.5 degrees-202.5 degrees (45 degrees-scan) were used to reconstruct coronal DTS slices, whereas for the half-fan scans, the projections from both 157.5 degrees-202.5 degrees and 337.5 degrees-22.5 degrees (2 x 45 degrees-scan) were used to reconstruct larger FOV coronal DTS slices. For this study, we chose 45 degrees-scan angle that contained similar to 80 projections for the full-fan and similar to 160 projections with 2 x 45 degrees-scan angle for the half-fan mode, each with 1024 x 768 pixels with 32-bit precision. Absolute pixel value differences, profiles, and contrast-to-noise ratio (CNR) calculations were performed to compare and evaluate the images reconstructed using GPU- and CPU-based implementations. The time dependence on the reconstruction volume was also tested with (512 x 512) x 16, 32, 64, 128, and 256 slices. In the end, the GPU-based implementation achieved, at most, 1.3 and 2.5 seconds to complete full reconstruction of 512 x 512 x 256 volume, for the full-fan and half-fan modes, respectively. In turn, this meant that our implementation can process > 13 projections-per-second (pps) and > 18 pps for the full-fan and half-fan modes, respectively. Since commercial CBCT system nominally acquires 11 pps (with 1 gantry-revolution-per-minute), our GPU-based implementation is sufficient to handle the incoming projections data as they are acquired and reconstruct the entire volume immediately after completing the scan. In addition, on increasing the number of slices (hence volume) to be reconstructed from 16 to 256, only minimal increases in reconstruction time were observed for the GPU-based implementation where from 0.73 to 1.27 seconds and 1.42 to 2.47 seconds increase were observed for the full-fan and half-fan modes, respectively. This resulted in speed improvement of up to 87 times compared with the CPU-based implementation (for 256 slices case), with visually identical images and small pixel-value discrepancies (< 6.3%), and CNR differences (< 2.3%). With this achievement, we have shown that time allocation for DTS image reconstruction is virtually eliminated and that clinical implementation of this approach has become quite appealing. In addition, with the speed achievement, further image processing and real-time applications that was prohibited prior due to time restrictions can now be tempered with.

  • 出版日期2011-8