In a typical classification problem, human coders/annotators code a subset of available observations manually, and these coded observations are used to train a classifier so that uncoded observations can then be classified automatically. One of the assumptions that most classification methods rely on is that the human-assigned labels are reliable, which does not always hold in practice. This paper explores how double-coding can help to improve automatic classification in the presence of coding error. Four double-coding strategies are proposed and compared with single-coding. Our study shows that double-coding is preferable when coding error is non-negligible. We also find that the disagreements between the two manual annotators are better resolved by employing a more expensive expert coder or by simply being removed rather than being coded by a third regular annotator.
Date and Time
-
Language of Oral Presentation
English
Language of Visual Aids
English