Source-level Translation of OpenMP Device Constructs to CUDA and Runtime Optimization Methods

본 논문은 OpenMP 4.5 device construct를 이용하여 개발된 C 소스 코드를 대응하는 CUDA 소스 코드로 변환하는 컴파일러와 이를 지원하는 런타임 시스템을 제안한다. 먼저, OpenMP의 실행 모델, 메모리 모델 및 동기화 과정을 살펴보고, source-level 변환의 방법을 설명한다. 또한, 성능 향상을 위해 고안된 버디 할당자, UDTE와 같은 런타임 시스템 최적화 기술을 소개한다. 실험은 SPEC-ACCEL 1.2 벤치마크를 이용한다. 실험 결과 비교 대상인 gcc7 대비 6배 이상, mriq를 제외한 경우에도 2배 이상의 성능 향상을 가져왔다. 본 논문의 프레임워크를 바탕으로 향후 컴파일러 및 런타임 최적화 기술을 추가적으로 개발할 수 있을 것으로 기대된다.

This paper deals with an OpenMP framework for GPU offloading. The framework is composed of a compiler and a runtime system that converts C programs written using the OpenMP 4.5 device construct to CUDA programs. First, we look at the execution model, memory model, and synchronization process of OpenMP, and explain how to translate in the source-level. Moreover, we use runtime optimization techniques such as buddy allocator, and UDTE to improve execution performance. Using the SPEC-ACCEL 1.2 benchmark suite, it shows up to 6 times better performance than the gcc7 framework. We expect that additional runtime and compiler optimization techniques can be applied based on the framework of this paper.

키워드열기/닫기 버튼

OpenMP

이 키워드로 연구동향 분석 이 키워드로 논문 검색

Device 오프로딩

이 키워드로 연구동향 분석 이 키워드로 논문 검색

CUDA

이 키워드로 연구동향 분석 이 키워드로 논문 검색

소스 코드 변환

이 키워드로 연구동향 분석 이 키워드로 논문 검색

Runtime 최적화 기법

이 키워드로 연구동향 분석 이 키워드로 논문 검색

OpenMP, device offloading, CUDA, source-level translation, runtime optimization methods

피인용 횟수

KCI 0회
FWCI (2023-07-26 기준) 0 열기/닫기 버튼
같은 출판연도, 주제분야, 논문 형태에 따라 인용을 측정하여 정규화한 인용지수입니다.

인용현황

KCI에서 이 논문을 인용한 논문의 수는 0건입니다.

참고문헌(7) 열기/닫기 버튼 * 2023년 이후 발행 논문의 참고문헌은 현재 구축 중입니다.

오류신고