Who Should Obey Content Standards?


It is clear what it means to apply these guidelines to manual annotation or to annotation produced automatic transducers (based on machine learning, manual rules, etc.) that produce annotation that is similar to manual annotation. However, it is not clear how far these guidelines would/should go. How can we clearly define the classes of application output should and should not be regulated by annotation guidelines? We can certainly start with a list and assume that all system output should obey standards for: segmentation, tokenization, part of speech, anchor-selection, and possibly some others (coreference?). However, it might be nice to find some clear principles to help handle border cases, possible extensions, what counts as a valid exception, etc.

