Setting up corpora is a laborious process, requiring time and resources. One problem we may find is that once a corpus has been created, in a very short time its static nature may not reflect the way language is currently used. This raises the question of how to make corpus-building a dynamic process. Sharoff (2006) refers to ‘open-source corpora’, making use of the Internet in order to collect data which can constantly be updated following the trends of language change. Clearly, this process must be made rapid and efficient for research purposes. This paper firstly describes the initial development of a tool specifically studied to facilitate the search for linguistic data in a series of steps. Starting from the search for an initial, small amount of “thematic” linguistic data on the web, followed up by the manual examination and analysis of the collected data and, last, automatically extending the analysis -by means of analogical comparison to what was manually analysed- allowing the further extraction of a wider sample of data. Secondly a small scale study aims to examine the ways in which such an approach can be exploited in a specific context. Here attention is focused on the linguistic patterns characterising the metaphorical use of LIGHT = UNDERSTANDING (cf. Lakoff and Johnson 1980). We will illustrate the process of acquiring relevant contexts of such usages from the web and will outline the crucial step of the acquisition process through the analogy-based mechanism that extracts from the web examples of figurative usages of LIGHT by discriminating these from literal usages on the basis of analogical similarity to the manual analysis carried out on the initial data collection.

Letting in the light and working with the Web: A dynamic corpus development approach to interpreting metaphor

FEDERICI, STEFANO;WADE, JOHN CHRISTOPHER
2007-01-01

Abstract

Setting up corpora is a laborious process, requiring time and resources. One problem we may find is that once a corpus has been created, in a very short time its static nature may not reflect the way language is currently used. This raises the question of how to make corpus-building a dynamic process. Sharoff (2006) refers to ‘open-source corpora’, making use of the Internet in order to collect data which can constantly be updated following the trends of language change. Clearly, this process must be made rapid and efficient for research purposes. This paper firstly describes the initial development of a tool specifically studied to facilitate the search for linguistic data in a series of steps. Starting from the search for an initial, small amount of “thematic” linguistic data on the web, followed up by the manual examination and analysis of the collected data and, last, automatically extending the analysis -by means of analogical comparison to what was manually analysed- allowing the further extraction of a wider sample of data. Secondly a small scale study aims to examine the ways in which such an approach can be exploited in a specific context. Here attention is focused on the linguistic patterns characterising the metaphorical use of LIGHT = UNDERSTANDING (cf. Lakoff and Johnson 1980). We will illustrate the process of acquiring relevant contexts of such usages from the web and will outline the crucial step of the acquisition process through the analogy-based mechanism that extracts from the web examples of figurative usages of LIGHT by discriminating these from literal usages on the basis of analogical similarity to the manual analysis carried out on the initial data collection.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/104587
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact