Is it possible to evaluate a grammar empirically using a parsed corpus?
Tentatively, yes. This question is something of a ‘holy grail’ in linguistics. Crudely, theoretical linguistics concerns itself with models of grammar and their intrinsic (internally-deductive) properties and computational linguistics attempts to fit these models to data. Both of these are absolutely necessary stages in the parsing of a corpus. Once a corpus has been parsed it is the source of three types of evidence. • Frequency evidence is the evidence of frequency of known phenomena. For example, corpus studies have found higher frequency for verb forms of particular lexical items rather than the noun form, although dictionaries generally assume the predominance of the noun. A parser can be ‘trained’ on a corpus by identifying the frequency of particular rules in its knowledge base. This can be done with an uncorrected corpus, i.e. by applying a parser to a number of sentences and counting the number of times each rule was applied in the final analysis. However, if human linguists c