DIGITAL LIBRARY
IS DESCRIBING LANGUAGE MERE BUTTERFLY COLLECTION? ON EPISTEMOLOGY, STATISTICAL LANGUAGE MODELS, AND CORPUS
Universidade Federal Rural do Rio de Janeiro (BRAZIL)
About this paper:
Appears in: ICERI2019 Proceedings
Publication year: 2019
Pages: 10900-10903
ISBN: 978-84-09-14755-7
ISSN: 2340-1095
doi: 10.21125/iceri.2019.2673
Conference name: 12th annual International Conference of Education, Research and Innovation
Dates: 11-13 November, 2019
Location: Seville, Spain
Abstract:
Long before computer-based corpus came out, the British linguist J.R. Firth (1957) introduced the Contextual Theory of Meaning, which underlies a conventional approach to language. Firth proposed that the study of meaning and context should be central to linguistics. The speech events, so he believed, are recurring and repeatedly observable. This perspective clearly leads to a focus on language description, since we can only conclude more precisely some events in language through verifying empirical data.

More recently, in 2011, at the “Brains, Minds, and Machine symposium” held during MIT’s 150th birthday party, Noam Chomsky was asked about the success of statistical methods to language resolution. One of his claims against it was that “although statistical language models have had engineering success, this is irrelevant to science”. His radical rationalist perspective conceives pure description of language only “a butterfly collection”. In other words, what people really do with language should not be the real object of science, but rather what people should do to language (competence). This statement led computer scientist Peter Norvig to write an essay called “On Chomsky and the Two Cultures of Statistical Learning”, where he questions what science is that Chomsky is in favor of.

Therefore, we take this study to reinforce some interesting ideas that Norvig brings about and to claim that Sociolinguistics, and more recently Corpus linguistics, and Statistical Language Models mark a watershed in linguistics as science. These domains put together, if they could ever be taken apart, reveal language as first conceived by Firth: as an “event”, as a way of “doing things”; which is why it would be legitimate for a linguist to stick to discourse events themselves.

While Chomsky grounds his philosophy on the idea that mere descriptions of reality do not matter, a great deal of language resolution has depended on Labovian combined with statistical models, not formal models. This seems to be a sympton that language science is indeed different from other sciences, as claimed by Wittgenstein, in Philosophical Investigations, first published in 1953. He refuses to conceive the study of language as it is proposed for general sciences, mainly because the "scientist", in this special case, is one of the pieces involved in what he calls "language-games" (Wittgenstein, 1979). And description should not be underrated since it reveals the way the game is played.
Keywords:
Corpus Linguistics, Epistemology, Statistical Language Models.