초록 열기/닫기 버튼

풍공학분야에 특화된 학술지인 한국풍공학회지에서 발간된 논문에 대해 토픽모델링 기법 중 잠재의미분석(LSA)와 잠재디리크레할당(LDA)을 적용하여 연구주제 추출의 적합성을 비교 평가하였다. 토픽간의 유사도를 평가하기 위해 문서토픽행렬을 이용한 상관분석법을 제안하였으며, 이를 적용하여 문서단어행렬의 특성벡터로부터 토픽을 추출하는 LSA 대비 단어의 결합확률을 이용하는LDA가 토픽 구성단어를 2배 이상 사용하여 보다 독립적인 토픽을 추출하였다는 평가결과를 얻었다. 학술지의 연구주제를 종합하면‘building’, ‘bridge’를 ‘연구대상’으로 ‘wind speed’, ‘wind load’, ‘vibration control’을 ‘연구목적’으로 ‘wind tunnel test’, ‘numericalmethod’의 ‘연구방법’을 사용하였다. 향후 토픽모델링은 연구주제를 ‘연구대상’, ‘연구목적’, ‘연구방법’의 구조적인 결합으로 정의하여단어의 사용특성을 반영하는 방식으로 개선되어야 할 것으로 사료된다.


This study aimed to compare and evaluate the suitability of topic modeling techniques such as Latent Semantic Analysis(LSA) and Latent Dirichlet Allocation (LDA) by applying to the research subject extraction of the Journal of the Wind EngineeringInstitute of Korea. In order to evaluate the similarity between classified topics, a method of correlation analysis using a documenttopic-matrix was proposed. LDA, which uses the probability of combinations of specific words, employed more than twice as manywords to compose a topic than LSA, which extracts topics from the feature vectors of the document-word-matrix. As a result, thetopics extracted by LDA were more independent than those extracted by LSA. In summarizing the research subjects of the journal,‘building’ and ‘bridge’ were taken as the ‘research objective’ and investigated to determine ‘wind speed’, ‘wind load’, and ‘vibrationcontrol’, which constitute the ‘research purpose’, while ‘wind tunnel test’ or ‘numerical method’ were used as the ‘research method'. It is concluded that topic modeling should be improved in a way that reflects the use of words by defining the research subject as astructural combination of ‘research subject’, ‘research purpose’ and ‘research method’.