초록 열기/닫기 버튼

This research evaluates the effectiveness ofbidirectional language models, particularly those based on the BERTarchitecture, in distinguishing between human-authored text andmachine-generated text in the Korean language. Through an extensive empiricalanalysis using a newly established benchmark, KoMGTDetect-Bench, we explorethe linguistic attributes and complexities that differentiate machine-generatedtext from human-authored text, such as vocabulary usage, sentence structure,and syntax. Our findings reveal that BERT-based models excel in detectingsubtle nuances and irregularities in text, attributing their success to the model'sdeep semantic and syntactic understanding enabled by bidirectionalcontext-aware training strategy. This study also examines the performance ofthese models across various linguistic phenomena specific to the Koreanlanguage, highlighting their strengths and limitations.