When one reaches the end of this sort of work, one realizes how many factors played a part in the process, and understands that, without the help and cooperation of everyone in one’s life, nothing would have happened.


In the first place I should like to thank the Fundação Calouste Gulbenkian for funding me to attend the International Lexicography Seminar at the University of Exeter in March, 1989, and the Instituto Nacional de Investigação Científica for providing funds for my research with the Birmingham Corpus in October, 1991.  I should also like to thank Professor John Sinclair for his permission to consult the Birmingham Corpus, and Antoinette Renouf for her support and generous friendship on these occasions.  My thanks also go to the Oxford University Computing Service for providing me with computerised texts for research, and to the individuals who gave me their permission to use some  of these texts.  I am also very grateful to the Centro de Linguística da Universidade do Porto for the use of its library, and for the more practical help it has offered on many occasions.


My interest in linguistics has grown slowly over the years, but perhaps the single most influential contribution to the process was made by Professor Doutor Oscar Lopes when he introduced me to the relationship between language and thought during the seminars of my Master's Degree.  Although I was already interested in linguistics from a more professional point of view, this aspect of the subject is what has led me on.


Some years before this, however, I had rather unceremoniously presented myself in the office of Professor Reinhard Hartmann in Exeter, and demanded to know what help he could give me, as a leitor of English Language, in integrating an element of linguistics into my courses.  Over the years, I have returned to him for advice, and I should now like to thank him for his patience on these occasions, and for all the valuable help he has given me over the last few years in organizing my research, and controlling my natural tendency to go off at a tangent. 


The friendship, help, and encouragement of Professor Doutor Mário Vilela have been invaluable to me over the last few years, both in the preparation of my Master's and doctoral dissertations.  Without his generous professional and moral support, I should never have finished, and I thank him for his wonderful mixture of praise and criticism, and his ability to use tact and outspokenness as and when it was necessary.


My admiration for the work of M.A.K. Halliday is obvious throughout my dissertation, and I should like to thank him for his inspiration, and for the practical suggestions he made when I was lucky enough to meet him in Birmingham in October, 1991. I should also like to thank Dr. Anna Wierzbicka for her generosity in sending me all the articles, published and to be published, that she had written on the subject of emotion.  I am grateful, too, to Professor Philip Johnson-Laird for his advice, and to Professor Barbara Lewandowska for her personal interest in my project.


I should like to express my gratitude to my colleagues at the Faculdade de Letras, and in other academic departments.  I should particularly like to thank Professor Doutor Manuel Gomes da Torre, as the Director of English Studies, for his cooperation, and Professores Doutores António Franco and Maria da Graça Pinto for their help and suggestions of bibliography, as well as for their friendship and moral support.   My thanks also go to Professora Doutora Maria Cândida Pacheco for her advice on the more philosophical aspects of my work, and to Professor Doutor José Henrique Barros de Oliveira and Dra. Ângela Costa Maia for introducing me to a lot of the bibilography on psychology in the area of cognition and emotion. I am also very grateful to Dr. Manuel Azevedo Fernandes, Dr. José Adriano Fernandes, and the other members of our multidisciplinary discussion group, for reminding me of the complexity of the human psyche at a stage when my work was making my outlook on life depressingly materialistic. 


I should like to thank those who helped me discover the power of the computer, particularly Professor André Camlong, who so generous in providing me with the software he had devised.   My thanks also go to Dr. Simão Cerveira Cardoso for the computerised texts he provided, and both to him and to Dr. Sergio Matos for their friendly and patient help in exploring the possibilities of this machine. 


Many others have contributed to this project in different ways, including my colleagues, who have offered encouragement, especially Linda Weinrich, and my students, who have allowed me to use their intuitions against which to test my theories about Portuguese. All these people have helped me in their different ways, and I thank them for all their ideas and their generosity.  However, any errors of fact or judgement that may be become apparent  are, of course, my own.


Last, but not least, my thanks go to my husband, children, extended family, and friends, in England, Portugal and other parts of the world, for their support and encouragement,  their patience with my behaviour, and for putting up with the neglect to which they have been subjected during the last few years.   My debt to them is immeasurable, and I hope they still believe me when I tell them I love them.



0.1   Keywords


The two key-words in the title of this book are emotion and corpora - the first referring to the subject, the second to the method of study.  I originally set out to study the subject of emotion because it seemed an interesting area in which to investigate the various claims of language universals and linguistic relativism.  Only later did I understand the wider implications of such a study. 


My use of electronic corpora was prompted by the need I felt for judging ideas about the language of emotion on a quantitative as well as a qualitative basis.  I knew that there were interesting and conflicting views about the linguistic expression of emotion, and yet it seemed to me that a lot of it needed examining in the light of real language in texts, rather than just relying on the linguist's intuitions about what was merely acceptable.  The work described here was carried out over several years, and should be seen as one person's attempt to find a use for the quantitative analysis offered by the corpora being developed, rather than on the present state-of-the-art corpora.


0.2  The Language of Emotion as a subject for study


Although I was drawn to this subject by a variety of factors, I soon found that others were interested in emotion for reasons I had not originally contemplated.   At a more philosophical level, emotions, and the general language and specific concepts with which we attempt to express our experience of them, are at the centre of contemporary  arguments about the nature of human brains and/or minds, human consciousness and about the (im)possibility of consciousness existing in artificial intelligence.  Emotion, and the relationship between cognition and emotion, is something which interests psychologists, AI experts, philosophers, and others, and some of them have approached the subject by looking at the language of emotion.  


The study of the language of emotion focuses an interesting paradox. Most of us feel we know what is meant by words such as fear, love,  or anger, and the lexicon would seem to describe an area of human behaviour which could arguably be considered innate.  However, different languages vary significantly in the way they provide concepts for this area of human experience, and experiments by psychologists and linguists have shown that the way the individual uses and interprets the lexicon of Emotion varies more widely than with most other lexical fields.  Some psychologists and linguists have, in fact, abandoned study of the official lexicon altogether and concentrated on the paraphrases and metaphors with which people sometimes prefer to describe their experience.  Others, however, have continued to study the official Emotion nomenclature and,  in recent years, a large body of research has built up around it, most, but not all, from the point of view of psychology.


The semantics and syntax of the lexicon of Emotion have provided linguists with food for thought for some time, as a study of semantic classes of verbs or of deep case theory will show.   Certain syntactic uses of lexemes of Emotion have, in fact, been at the centre of arguments between the different schools of linguistics.  It is also significant that these analyses have very often been affected by current theories in psychology and philosophy.    I believe that separating the study of the lexicon from that of the syntax in which it is typically embedded contributes to some of the confusion that exists in this field.  I also believe that neither lexemes nor sentences should be considered abstractly, or without reference to at least some immediate situation or context.  The language of Emotion can only really be studied significantly if the three broad levels of the lexicon, syntax and context are studied in conjunction.


It was my aim, therefore, to bring together the ideas of the linguists and the psychologists, and test them against the qualitative and quantitative findings of a relatively large corpus of examples taken from unedited texts.  In this way, I hoped to be able to  demonstrate the problems of universality and relativity more effectively.


The material I have accumulated leads me to propose that there is a similarity between English and Portuguese, which could be used as a basis from which to search for a possible level of universality when comparing these findings to those for other languages, but that this is not specifically localized in the lexicon or the syntax, but rather in a combination of both in conjunction with the needs of language users in context.  On the other hand, I hope to demonstrate that both the lexicons and the syntax of the two languages offer rather different options for interpreting the experiences we call emotion.


I believe that language is inextricably bound up with human experience and that they cannot be considered separately.  Whether one believes that human experience formed language, or that language affects that experience, and also affects the analysis of the same experience, I hope to demonstrate the benefits of attempting a more holistic view which takes both into account.


0.3  The use of electronic corpora    


My first direct contact with an electronic corpus was with the Birmingham Corpus, which I was able to consult for a week in March, 1990.  This visit drew my attention to the fact that lexemes favour distinct syntactic patterns which are, in their turn, influenced by semantic factors. At that time, the corpus consisted of about 7 million words, but, by my second visit, for ten days in October, 1991, this had been increased to about 17,5 million, and I was able to examine these patterns in more detail, and take notes on the frequency of the different syntactic forms of the lexemes involved.


However, although the results of this study were valuable as a norm against which to compare the quantitative lexical and syntactic findings from my own corpus, two factors were missing for the purposes I had in mind.  First of all, I needed much more time in which to analyse the semantic and contextual levels and, in the second place, I needed a comparable corpus of Portuguese.   I resolved, therefore, to build my own corpora.   I realize that using literary texts for linguistic analysis naturally invites  criticism, but at the time I was preparing my corpus the English texts available in electronic form for academic use from the Oxford University Computing Service were largely literary.  In order to scan a comparable Portuguese Corpus, therefore, I turned to Portuguese literary texts. 


There are, of course, more theoretical justifications for using literary texts, and the most important is that this is precisely the type of text which is most likely to contain the Emotion lexicon.   About 40% of the Birmingham Corpus I consulted consisted of literary texts, and when I was searching it for examples, it was perfectly clear that the vast majority of these examples came from that 40%.  If literary texts yield the most examples of the lexicon of Emotion, there must be a reason for it.  This is almost certainly because writers of fiction are exceptionally interested in describing the emotional states and reactions of their characters.  The defence of my usage of literary texts is, therefore, based on the conviction that this type of text is the best source of examples for the lexicon of Emotion.  


In the end I was able to base my findings on an English corpus which consisted of approximately 778,500 words and a Portuguese Corpus of 819, 500 words.  From these corpora I collected 9,755 examples for  the Emotion groups in English, and 11,893 in Portuguese. A further 1,545 examples were collected for Desire in English and 2,051  for Portuguese.    This gave a total of over 25,000 examples on which to work.


The methodology involved taking the examples found in the corpora of the lexical sets in English and Portuguese of the semantic field of Emotion, and analysing:

  1. the lexemes in relation to the others within each set, and to those in the other language;

  2. the syntactic form of the lexemes, and the syntactic structures in which they tend to co-occur;

  3. the semantic nature of the syntax involved;

  4. the semantic roles of the participants in the Emotion situation, whether expressed at the level of the sentence or the co-text,

  5. certain more pragmatic factors which are only observable after consideration of the type of quantitative data available in such corpora.


0.4  The Texts


The choice of texts was dictated by many factors, but availability and a search for texts which had official translations were the most important ones.  The texts I used in my corpus are as follows.  The numbers of words are approximate, as not all the electronic versions were free from repeated parts and omissions.


The English Corpus



Date Title    *No. of words

CARROLL, Lewis                                 

1865. Alice in Wonderland.  27,000

CONRAD, Joseph                                              

1900 Lord Jim. 143,000

DICKENS, Charles                         

1860 Great Expectations 190,000

GREENE, Graham                       

1978 The Human Factor 85,000

FITZGERALD, F. S.                               

1925 The Great Gatsby 48,000

LAWRENCE, D.H.                   

1930 The Virgin and the Gypsy 31,500

LE CARRÉ, John    

1963 The Spy who came in from the Cold 69,000

WAUGH, Evelyn                      

1945 Brideshead Revisited 115,000

WOOLF, Virginia.       .                                

1927 To the Lighthouse. 70,000




Portuguese Texts


Author Date Title *No. of words
BAPTISTA, António Alçada 1989  Tia Susana, Meu amor 27,000
LUIS, Agustina Bessa 1953 A Sibila 84,000
MONTEIRO, Luís de Sttau 1961 Angústia para o Jantar 49,500
NAMORA, Fernando 1949 Retalhos da Vida de um Médico 40,500
PIRES, José Cardoso 1982 Balada da Praia dos Cães 76,000
RIBEIRO, Aquilino 1913 O Jardim das Tormentas 59,000
RIBEIRO, Aquilino 1922 Malhadinhas 34,000
RIBEIRO, Aquilino 1957 A Casa Grande dos Romarigães 90,000
RIBEIRO, Aquilino 1958 A Mina de Diamantes 41,500
RIBEIRO, Aquilino 1958 Quando Os Lobos UIvam 84,500
QUEIROZ, Eça de 1888 Os Maias 214,000
TORGA, Miguel 1940 Os Bichos 20,000
TOTAL     819,500


*The number of words is approximate, given the problems with scanning and digitalization at the time.


0.5  Key to Abbreviations and Printing Conventions in the Text


Printing conventions


Word in CAPITALS    = specific semantic term, e.g. SENSER

                                    = specific syntactic term, e.g. -ING clause, -SE pronoun

Word / phrase in [CAPITALS] + square brackets = semantic primitive

Emotion word with capital letter = refers to a lexical group or semantic field, e.g. Joy

Italics   used when:      

            a)  when word considered as lexeme.

            b)  examples included in text

            b)  with non-English/Portuguese word, e.g. Latin qua

            c)  Book titles

‘’ + titles of articles

‘’ + individual words = used to focus specific terms.

“” + words, phrases or sentences = quotations from works referred to.


Abbreviations in the text




AI = Artificial Intelligence

BC = Birmingham Corpus

BI = Biological Intelligence 

CA = Contrastive Analysis

(E) = Used to mark data from English Corpus

EA = Error Analysis

EC = English Corpus

(P) = Used to mark data from Portuguese Corpus

PC = Portuguese Corpus

PFoc = PHENOMENON  focusing

PH. type = Phenomenon type

S-R = Stimulus - Response

S.O.E.D.  = Shorter Oxford English Dictionary

SFoc = SENSER  focusing

TEFL = Teaching English as a Foreign Language




Object types  (see 3.3.8)


a  =  noun phrase

b  =  non-finite infinitive clause (S = same as main clause)

c  =  non-finite infinitive clause (S = different from main clause)

d  =  non-finite -ING clause (S = same as main clause)

e  =  non-finite -ING clause (S = different from main clause)

f  =  finite (THAT) / QUE clause (S = same as main clause)

g  =  finite (THAT) / QUE clause (S = different from main clause)

h  =  finite WH-/ O QUE clause




PHENOMENON types (see (3.1.2)


1.  Unknown, or unspecified in the immediate context.

2.  Self, or permanent quality of SENSER

3.  State or situation of SENSER

4.  Emotion, perception or cognitive processes of SENSER

5.  Action by SENSER

6.  The Other

7.  State or situation of the Other

8.  Emotion, perception or cognitive processes of the Other

9.  Action by the Other

10.  A non-human object, concrete or abstract.

11.  A complex proposition about the world.


Abbreviations after examples from the corpora


A         = from Angústia para o Jantar

Aq       = from Aquilino Ribeiro's books

Ba        = from A Balada da Praia dos Cães

Bh        = from Brideshead Revisited

Bi         = from Os Bichos

Ge        = from Great Expectations

Gg        = from The Great Gatsby

Hf        = from The Human Factor

Lj         =  from Lord Jim

M         = from Os Maias

N         = from Retalhos da Vida dum Médico

Si         =  from A Sibila

Spy      = from The Spy who came in from the Cold

Su        = from Tia Suzana, meu Amor

Vg        = from The Virgin and the Gypsy

W        = from Alice in Wonderland

Wo      =  from To the Lighthouse


