Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma

최근 빅데이터 처리를 위한 연구들이 활발히 진행 중이며, 관련된 다양한 제품들이 개발되고 있다. 이에 따라, 기존 환경에서는 처리가 어려웠던 대용량 로그 데이터의 저장 및 분석이 가능해졌다. 본 논문은 다수의 서버에서 빠르게 생성되는 대량의 로그 데이터를 Apache Hive에서분석할 수 있는 데이터 저장 구조를 제안한다. 그리고 저장된 로그 데이터로부터 특정 서버의 이상 유무를 판단하기 위해, 이동 평균 및 3-시그마 기반의 이상 탐지 기술을 설계 및 구현한다. 또한, 실험을 통해 로그 데이터의 급격한 증가폭을 나타내는 구간을 이상으로 판단하여, 제안한 이상 탐지 기술의 유효성을 보인다. 이 같은 결과를 볼 때, 본 연구는 하둡 기반으로 로그 데이터를 분석하여 이상치를 바르게 탐지할 수있는 우수한 결과라 사료된다.

In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.

키워드열기/닫기 버튼

빅데이터

이 키워드로 연구동향 분석 이 키워드로 논문 검색

아파치 하둡

이 키워드로 연구동향 분석 이 키워드로 논문 검색

아파치 하이브

이 키워드로 연구동향 분석 이 키워드로 논문 검색

로그 데이터

이 키워드로 연구동향 분석 이 키워드로 논문 검색

이상 탐지

이 키워드로 연구동향 분석 이 키워드로 논문 검색

Big Data, Apache Hadoop, Apache Hive, Log Data, Anomaly Detection

피인용 횟수

KCI 2회
FWCI (2023-07-26 기준) 0.57 열기/닫기 버튼
같은 출판연도, 주제분야, 논문 형태에 따라 인용을 측정하여 정규화한 인용지수입니다.

인용현황

KCI에서 이 논문을 인용한 논문의 수는 2건입니다. 열기/닫기 버튼

참고문헌(25) 열기/닫기 버튼 * 2023년 이후 발행 논문의 참고문헌은 현재 구축 중입니다.

오류신고