Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends

Jina Jeong, Eungyu Park, Weon Shik Han, Kueyoung Kim, Sungwook Choung, Il Moon Chung

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.

Original languageEnglish
Pages (from-to)135-144
Number of pages10
JournalJournal of Hydrology
Volume548
DOIs
Publication statusPublished - 2017 May 1

Fingerprint

outlier
groundwater
long-term trend
method
identification method
anomaly

All Science Journal Classification (ASJC) codes

  • Water Science and Technology

Cite this

Jeong, Jina ; Park, Eungyu ; Han, Weon Shik ; Kim, Kueyoung ; Choung, Sungwook ; Chung, Il Moon. / Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends. In: Journal of Hydrology. 2017 ; Vol. 548. pp. 135-144.
@article{5830f94bdbb043f193e984ad0a0723b9,
title = "Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends",
abstract = "A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5{\%} outlier level; whereas, only the IQR method performs well for identifying outliers at a 30{\%} outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.",
author = "Jina Jeong and Eungyu Park and Han, {Weon Shik} and Kueyoung Kim and Sungwook Choung and Chung, {Il Moon}",
year = "2017",
month = "5",
day = "1",
doi = "10.1016/j.jhydrol.2017.02.058",
language = "English",
volume = "548",
pages = "135--144",
journal = "Journal of Hydrology",
issn = "0022-1694",
publisher = "Elsevier",

}

Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends. / Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kueyoung; Choung, Sungwook; Chung, Il Moon.

In: Journal of Hydrology, Vol. 548, 01.05.2017, p. 135-144.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends

AU - Jeong, Jina

AU - Park, Eungyu

AU - Han, Weon Shik

AU - Kim, Kueyoung

AU - Choung, Sungwook

AU - Chung, Il Moon

PY - 2017/5/1

Y1 - 2017/5/1

N2 - A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.

AB - A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.

UR - http://www.scopus.com/inward/record.url?scp=85014870293&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014870293&partnerID=8YFLogxK

U2 - 10.1016/j.jhydrol.2017.02.058

DO - 10.1016/j.jhydrol.2017.02.058

M3 - Article

AN - SCOPUS:85014870293

VL - 548

SP - 135

EP - 144

JO - Journal of Hydrology

JF - Journal of Hydrology

SN - 0022-1694

ER -