Antoinette Renouf (University of Liverpool) Most neologisms are very low-frequency words. Linguistically, this means that a study of the low-frequency words in a textual database is likely to reveal much about linguistic productivity. In NLP terms, it indicates that this statistically insignificant linguistic stratum is nevertheless a resource for fine-tuning database search indexes to improve performance. In a dynamic database, monitored chronologically, it could also support updating. We have, in the APRIL Project, developed a fine-grained system of software filters for monitoring morphological and lexical change across time. This tool identifies and classifies new words at stages in a flow of electronic news data, allowing trends and generalisations to emerge. It also supports extrapolation from existing patterns to predict aspects of the future structure of the lexicon. In this paper, I shall report on the project, with particular reference to quantitative methods of analysis and linguistic results. [ Retour... ]
|