GPU technology is the hope for near real-time Monte Carlo dose calculations Xun Jia, X. George Xu, and Colin G. Orton Citation: Medical Physics 42, 1474 (2015); doi: 10.1118/1.4903901 View online: http://dx.doi.org/10.1118/1.4903901 View Table of Contents: http://scitation.aip.org/content/aapm/journal/medphys/42/4?ver=pdfcov Published by the American Association of Physicists in Medicine Articles you may be interested in ARCHERRT – A GPU-based and photon-electron coupled Monte Carlo dose computing engine for radiation therapy: Software development and application to helical tomotherapy Med. Phys. 41, 071709 (2014); 10.1118/1.4884229 Efficient implementation of the 3D-DDA ray traversal algorithm on GPU and its application in radiation dose calculation Med. Phys. 39, 7619 (2012); 10.1118/1.4767755 Sub-second high dose rate brachytherapy Monte Carlo dose calculations with bGPUMCD Med. Phys. 39, 4559 (2012); 10.1118/1.4730500 GPU-accelerated Monte Carlo convolution/superposition implementation for dose calculation Med. Phys. 37, 5593 (2010); 10.1118/1.3490083 A Monte Carlo dose calculation algorithm for proton therapy Med. Phys. 31, 2263 (2004); 10.1118/1.1769631

POINT/COUNTERPOINT Suggestions for topics suitable for these Point/Counterpoint debates should be addressed to Colin G. Orton, Professor Emeritus, Wayne State University, Detroit: [email protected]. Persons participating in Point/Counterpoint discussions are selected for their knowledge and communicative skill. Their positions for or against a proposition may or may not reflect their personal opinions or the positions of their employers.

GPU technology is the hope for near real-time Monte Carlo dose calculations Xun Jia, Ph.D. Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas 75390 (Tel: 214-648-3224; E-mail: [email protected])

X. George Xu, Ph.D. Nuclear Engineering Program, Rensselaer Polytechnic Institute, Troy, New York 12180 (Tel: 518-276-4014; E-mail: [email protected])

Colin G. Orton, Ph.D., Moderator (Received 15 November 2014; accepted for publication 20 November 2014; published 11 March 2015) [http://dx.doi.org/10.1118/1.4903901]

OVERVIEW Monte Carlo (MC) dose calculations are recognized as being the most accurate modality for radiotherapy treatment planning but, because of the excessive computational time required, they cannot presently be used for near real-time dose calculations. Currently, the most common way to accelerate MC dose calculations is to use clusters of central processing units (CPUs), but some believe that the future of near real-time MC dose calculations lies not with clusters of CPUs but with the use of graphics processing unit (GPU) technology. This is the claim debated in this month’s Point/Counterpoint. Arguing for the Proposition is Xun Jia, Ph.D. Dr. Jia received his Masters degree in Applied Mathematics and Ph.D. degree in Physics, both from UCLA. He is currently an Assistant Professor in the Department of Radiation Oncology, University of Texas Southwestern Medical Center. Dr. Jia’s research focuses on GPU-based highperformance computing for medical physics and medical imaging. He has developed several Monte Carlo packages to improve efficiency for photon, electron, and proton transport. Dr. Jia’s research has been supported by government and industrial grants and he has published 60 peer-reviewed papers. He is currently a section editor of the Journal of Applied Clinical Medical Physics. 1474

Med. Phys. 42 (4), April 2015

Arguing against the Proposition is X. George Xu, Ph.D. Dr. Xu obtained his Ph.D. in Nuclear Engineering from Texas A&M University, College Station, TX and, for the past 20 years, he has been on the faculty of Rensselaer Polytechnic Institute, Troy, NY, where he currently holds the Edward E. Hood Endowed Chair of Engineering. Dr. Xu’s research has centered around applications of Monte Carlo methods to problems in radiation protection, imaging, and radiation therapy. He has been continuously funded by the NIH over the past ten years, including an R01 grant to develop a new Monte Carlo code, , for heterogeneous computing involving GPUs and coprocessors. He is the author of more than 150 journal papers and book chapters, and 270 conference abstracts. Dr. Xu is a Fellow of the American Association of Physicists in Medicine, the Health Physics Society, and the American Nuclear Society. In 2014, he was re-elected to a 6-yr term as a council member of the National Council on Radiation Protection and Measurements. FOR THE PROPOSITION: Xun Jia, Ph.D. Opening Statement

Clinical applications of MC dose calculations have been limited by the long computation time to achieve a sufficient precision level. Over the years, great efforts have been devoted

0094-2405/2015/42(4)/1474/3/$30.00

© 2015 Am. Assoc. Phys. Med.

1474

1475

Jia and George Xu: Point/Counterpoint

to accelerating MC simulations. Recently, with the success of GPU-based high-performance computing,1,2 particularly for MC simulations, near real-time (e.g., seconds or subseconds) dose calculation is becoming feasible. Achieving this will not only facilitate its routine utilization, but also realize novel applications to advance radiotherapy practice, such as MC-based inverse treatment planning. To date, the computation time for a typical photon plan has been reduced to less than a minute with ∼1% uncertainty using only one GPU, and the speed can be further boosted with multiple GPUs by a factor proportional to the number of GPUs. Also reported are computation times as low as seconds to tens of seconds for different applications.3,4 Notably, the group at UT Southwestern5 has developed a GPU application to visualize an MC-reconstructed dose delivery process in almost real-time during beam delivery, with a refresh frequency of >10 Hz. These achievements have clearly demonstrated the potential of near real-time MC dose calculations. Besides advantages in speed, GPUs also hold other favorable features for clinical applications. First, GPUs are orders of magnitude lower in cost than a conventional highperformance-computing structure with a similar processing power. Second, GPUs are locally hosted and managed. This is particularly important for problems aiming at near real-time applications, since data-transfer and job-scheduling times cannot be neglected if the computation facility is remotely placed and shared by many users. Patient privacy may also be a concern when transferring medical data to a remote facility. Of course we cannot neglect disadvantages of using GPUs for MC. As a new platform, redevelopment of codes is necessary. However, burdens of initial code development have been overcome to a large extent, and several packages have been successfully built. Efforts have also been initiated to write MC packages in OpenCL to increase portability.6 While there are also technical issues hindering computational efficiency, e.g., thread divergence and memory writing conflicts, many solutions exist to remove or alleviate them.4,7 I would also like to mention a strong competitor of the GPU, the Intel many integrated core (MIC) processor. What makes this particularly attractive is its x86 compatibility, which can run existing CPU codes with minor modification. However, just like for GPUs, substantial effort is needed to achieve optimal performance.8 Simply running an existing code may not achieve high acceleration, because parallel-computing specific issues such as memory access and vectorization were not considered sufficiently in the conventional CPU code. As of today, there has been only limited study regarding MC dose calculations on MIC processors. While it holds the potential to improve efficiency, a lot of research is needed. In conclusion, GPU technology has the capability of substantially accelerating MC simulations. Its advantages and extensive research efforts demonstrate the hope for near real-time dose calculations. AGAINST THE PROPOSITION: X. George Xu, Ph.D. Opening Statement

Since the invention of computers in the 1940s, MC codes have been developed for nuclear engineering, high-energy Medical Physics, Vol. 42, No. 4, April 2015

1475

physics, and, recently, medical physics applications. However, most radiation treatment planning is done currently using dosimetry algorithms that are extremely fast, but only “approximately” correct.9 Given the lasting interest in accelerating MC methods, the recent hype related to the GPU is not surprising. Originally marketed by NVidia as household devices, GPU-based game consoles offered amazingly fast graphics at an affordable price. It did not take long, however, for the scientific community to realize that these desktop toys were actually parallel computers. As summarized in two review papers,1,2 GPU adopters from the medical physics community wasted no time in reporting overwhelmingly positive experiences, including a dozen studies that focused specifically on MC dosimetry. Impressive, but inconsistent, “speedup factors” ranging from single digits to several hundreds were reported within months, sometimes by the same group. It has become a cliché to highlight how fast an MC-based dose calculation can be done with a GPU. Such results indeed attracted a lot of attention from medical physicists who are notoriously busy and seeking expediency. There are two strong indications that GPU technology is only hype and not the hope for near real-time, fully MC dose calculations. First, we have not seen any convincing evidence that the GPU is indeed better than traditional solutions for running MC dose calculations. Both of the above review papers1,2 enjoyed referencing the rapidly increasing number of GPUrelated journal articles—which only reinforces the concept of a “hype cycle.” Furthermore, the authors of the GPU-accelerated MC studies obscure the issue by omitting details on how they compared GPU performance with traditional CPUs. CPUbased clusters are currently so cheap that one can assemble a desk-side 32-core cluster for about $3000US—the cost of a high-end CPU/GPU system. Using software optimization schemes and hyperthreading, such a CPU cluster may achieve a speedup similar to the best reported for GPUs, without the painful process of rewriting the MC code for the GPU/compute unified device architecture (GPU/CUDA) environment. But few of the GPU enthusiasts optimized the CPU code in order to make fair performance comparisons. It has been observed that a lack of “fair comparison” measures is responsible for exaggerated GPU performance.10 Second, competing technologies are mostly ignored by GPU adopters. Intel’s Xeon Phi coprocessor, for example, which comes with 60 embedded Pentium cores, is capable of achieving a similar level of parallelism as GPUs.11–13 Adopting the coprocessor is relatively easy and a large number of them are, in fact, used in Tianhe-2—the world’s number-1 supercomputer. The “heterogeneous computing” era has just begun and it is uncertain which hardware (and software) technology will dominate the market.14 The excitement brought by the GPU has reignited our interest in achieving real-time MC dose calculations and one should take full advantage of the research opportunities.15 However, an inflated expectation can be counterproductive, especially when investing in a single technology that may be obsolete in ten years.

1476

Jia and George Xu: Point/Counterpoint

Rebuttal: Xun Jia, Ph.D.

I agree that variations in reported GPU-acceleration factors exist due to different degrees of software/hardware utilization and optimization. However, it is quite difficult, if not impossible, to conduct an absolutely fair comparison. For example, I would like to mention the software aspect that unfairly treats GPUs: Software optimization schemes, such as variance reduction techniques widely employed in CPUbased MC packages, have been barely explored for GPUs. The deterministic nature of such algorithms is expected to be particularly favorable for GPU’s single-instruction-multiplethread structure. Yet it is absolute computational efficiency, rather than performance relative to CPUs, that determines the feasibility of near real-time MC calculations. The fact that a single GPU can already compute dose in seconds strongly supports this feasibility. Practicality should also be considered. While a low-end cluster with 4–8 computers may offer high speed, it is more advantageous in a clinical environment to use GPU-enabled computers in terms of energy efficiency, ease of management, etc. The utilization of GPUs in scientific computing is absolutely more than hype. Among the world’s top 500 supercomputers, 46 of them use GPU-based coprocessors compared to only 17 systems with MIC coprocessors. A few major vendors in radiotherapy, e.g., RaySearch and Elekta, already employ GPUs in their products. I agree that multiple options are available to substantially accelerate MC in this era of booming technology. Intel MIC is a great example. Nonetheless, it too may be hype which only emphasizes the ease of programmability based on existing CPU codes but hides the required efforts of performance tuning. There is probably no single technology that is undoubtedly better than others. However, based on the overall consideration of GPU’s advantages and developments so far, I believe that GPU technology is the hope for near real-time MC dose calculations. Rebuttal: X. George Xu, Ph.D.

I agree with Dr. Jia that the capability of real-time MC dose calculations is within reach owing largely to the innovative technology and marketing strategies by Nvidia. The greatest roadblock to GPU is the fact that the effort to translate legacy MC codes to the new CUDA programming environment is prohibitively expensive. GPU also faces tough technological challenges, including limited memory and data bandwidth.14 Given the steep investment and market risk, for everyone to jump onto the GPU wagon is costly and unwise. To CPU enthusiasts, multithreading techniques such as OpenMP and Pthreads are readily available for parallel computing. Intel CPUs come with hyperthreading for concurrent execution, and various compiler options can be used for optimization. As a competing architecture, Intel’s MIC is much easier to adopt. To avoid “unfair comparison” between GPU and CPU,11 one should consider the above-mentioned software optimization techniques and pick a “multicore” CPU (instead of a “single-core”) at a similar price to the GPU implementaMedical Physics, Vol. 42, No. 4, April 2015

1476

tion. Comparative studies should also consider software related labor expenses. When we recently compared the performances of ARCHER—an MC dosimetry code developed from scratch by my Ph.D. students11–13—in the CPU, GPU, and MIC platforms, we found that GPU’s advantages as a dose engine are less dramatic than some of those reported in the literature. All things considered, traditional CPU clusters and MIC remain serious competitors to GPUs when energy efficiency is not the priority. In the next five years, all these technologies are expected to evolve rapidly. The potential waste of capital and human resources due to hype and misleading information should be avoided. To this end, peer-reviewed journal publication and grant application processes should emphasize balanced GPU studies that offer the best methodologies and practices to the medical physics community. 1X.

Jia, P. Ziegenhein, and S. B. Jiang, “GPU-based high-performance computing for radiation therapy,” Phys. Med. Biol. 59, R151–R182 (2014). 2G. Pratx and L. Xing, “GPU computing in medical physics: A review,” Med. Phys. 38, 2685–2697 (2011). 3S. Hissoiny, M. D’Amours, B. Ozell, P. Despres, and L. Beaulieu, “Subsecond high dose rate brachytherapy Monte Carlo dose calculations with bGPUMCD,” Med. Phys. 39, 4559–4567 (2012). 4X. Jia, J. Schuemann, H. Paganetti, and S. B. Jiang, “GPU-based fast Monte Carlo dose calculation for proton therapy,” Phys. Med. Biol. 57, 7783–7797 (2012). 5F. Shi, X. Gu, Y. Graves, S. Jiang, and X. Jia, “A real-time virtual delivery system for photon radiotherapy delivery monitoring,” Med. Phys. 41(6), 432 (2014). 6Khronos OpenCL Working Group, “The open standard for parallel programming of heterogeneous systems” (2013), available at: https:// www.khronos.org/opencl/.others. 7S. Hissoiny, B. Ozell, H. Bouchard, and P. Despres, “GPUMCD: A new GPU-oriented Monte Carlo dose calculation platform,” Med. Phys. 38, 754–764 (2011). 8D. Mackay, “Optimization and performance tuning for Intel® Xeon Phi™ coprocessors–Part 1: Optimization essentials” (2012), available at: https://software.intel.com/en-us/articles/optimization-and-performance-tun ing-for-intel-xeon-phi-coprocessors-part-1-optimization.others. 9D. W. O. Rogers, “Fifty years of Monte Carlo simulations for medical physics,” Phys. Med. Biol. 51, R287–R301 (2006). 10V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU,” in Proceedings of the 37th Annual International Symposium on Computer Architecture (ACM, New York, NY, 2010), Vol. 38(3), pp. 451–460. 11T. Liu, X. G. Xu, and C. D. Carothers, “Comparison of two accelerators for Monte Carlo radiation transport calculations, NVIDIA Tesla M2090 GPU and Intel Xeon Phi 3120 coprocessor: A case study for x-ray CT imaging dose calculation,” in Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (SNA + MC 2013), Paris, France, 27–31 October (EDP Sciences, Les Ulis, France, 2014). 12L. Su, Y. M. Yang, B. Bednarz, E. Sterpin, X. Du, T. Liu, W. Ji, and X. G. Xu, “ARCHERRT—A photon-electron coupled Monte Carlo dose computing engine for GPU: Software development and application to helical tomotherapy,” Med. Phys. 41, 071709 (13pp.) (2014). 13X. G. Xu, T. Liu, L. Su, X. Du, M. J. Riblett, W. Ji, D. Gu, C. D. Carothers, M. S. Shephard, F. B. Brown, M. K. Kalra, and B. Liu, “, a new Monte Carlo software tool for emerging heterogeneous computing environments,” in Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (SNA + MC 2013), Paris, France, 27–31 October (EDP Sciences, Les Ulis, France, 2014). 14B. R. Gaster, L. Howes, D. R. Kaeli, P. Mistry, and D. Schaa, Heterogeneous Computing with OpenCL, 2nd ed. (Elsevier, Inc., Waltham, MA, 2013). 15T. Friedman, Do believe the hype, New York times, 2 November, 2010, available at: http://www.nytimes.com/2010/11/03/opinion/03friedman.htm l?_r=0.others.

counterpoint. GPU technology is the hope for near real-time Monte Carlo dose calculations.

counterpoint. GPU technology is the hope for near real-time Monte Carlo dose calculations. - PDF Download Free
1MB Sizes 0 Downloads 9 Views