Wish List of Standards
- 1. Coreference: how easy is it to map between coreference chain and entity analyses? Is it a good choice for standardization?
- 2. Phrase/Chunk boundaries: is there some level of phrase structure that is worth standardizing? Why or Why not?
- 3. Choosing sentence boundaries. This is definitely crucial to interoperability. However, there seems to be a lot of agreement, embodied in existing standards such as the [XCES]. A few issues that remain are:
- A. Commas or semicolons should sometimes be assumed to signify end of sentence markers. One such circumstance is: there is a set of sentences ending in commas/semicolons that are not clearly embedded under some other sentence, e.g., a file consists of a list of sentences, each followed by a semicolon or comma and there is no higher level sentence.
- B. The sentences are assembled into a list that follows either a centered title of some kind or an NP and a colon, e.g., Reasons to be Cheerful: you are wearing a nice hat, you are reading a good book, you made a friend, etc.
- 4. Named Entities -- some guidelines (e.g. CONLL) exist--can these be made a standard?
- 5. Others?