Redington, F. M. (1992). A statistical approach to syntax acquisition.
Unpublished masters thesis, Department of Artificial Intelligence, University
of Edinburgh.
Abstract
I investigated the extent to which simple distributional (statistical
and/or connectionist) learning techniques can provide information
about the syntactic categories of individual words. The particular
technique was one independently discovered by Finch & Chater (1992),
where the bigram statistics of individual words are recorded, and
these words are clustered according to the similarity of their bigram
statistics. Words with the same predominant syntactic category are
found to be clustered closer together than one would expect by chance.
The quality of the clustering is such that groups of nouns, verbs,
adjectives etc., is immediately apparent to the naked eye. I
investigated the properties of the method on a number of small-scale
benchmark problems, and applied the technique to a corpus of
transcribed adult speech taken from the CHILDES corpora
(MacWhinney & Snow, 1985), which is a closer approximation to the
language to which children are exposed than text corpora. Results
from this analysis were comparable in quality (appropriateness of
clustering) to those from text corpora. This work demonstrates that
simple distributional approaches, using no a priori
information, can provide strong constraints for determining initial
syntactic categories. This and similar approaches provide a means of
investigating the extent to which linguistic knowledge is innate, as
where these methods fail, a priori knowledge may be required.
The learning mechanisms which might embody such methods could
themselves be considered &knowledge& which constitutes part of the
language learner's initial state.
Back to publications ...
Last modified: Jan 10, 1999