초록 열기/닫기 버튼
This study was aimed to determine the best fitted machine learning model to predict the turbidity relative to water temperature, pH, EC, and DO data collected from non-point source pollution monitoring networks in case of missing data. Thus, K-NN, SVM, and Decision Tree were used to be trained. To assess the sensitivity on each algorithm to the scale of the monitoring data, both raw and normalized data sets were run. Additionally, hyperparameters were tuned to derive optimal values for each algorithm’s performance. K-fold cross-validation was employed to prevent overfitting. After tuning, the top 10 models with the highest NSE were evaluated using separate test data that was not involved in the tuning process. This allowed for further validation of the model performance using metrics such as NSE, MSE, RMSE, and MAE. The results indicated that Decision Tree algorithm achieved highest prediction accuracy followed by SVM and K-NN. Decision Tree was particularly well-suited for accurate turbidity prediction relative to other water quality monitoring data. Thus, machine learning techniques could be effectively used for predicting one of the water quality parameters when it will be partially missed or false recorded.