- ISSN 1442-438X
- CALL-EJ Online
- Vol. 3, No. 2, January 2002
Abstract
This paper charts a corpus analysis research investigation which was conducted in response to a classroom question. The linguistic features under investigation are “used to” and “be used to”; two grammatical forms whose constructional similarity often causes problems for beginner-level students.This intentionally limited study outlines, by way of a step-by-step approach, the practical procedures involved in the assimilation and manipulation of computer-generated data. It is hoped that novice investigators may gain some valuable insight as to what even simplistic inquiries can bring for themselves as linguistic theorists, and to their learners embarking on a greater understanding of language meaning and usage.
A Brief History of Corpus Linguistics
Studies of language can be divided into two main areas: studies of structure and studies of use. Corpus analysis (CA) focuses on the second of these, studying actual language used in naturally occurring texts. Ever since Firth (1957) stated that “You shall know a word by the company it keeps”, it has been a practice in linguistics to classify words not only on the basis of their meanings, but also on the basis of their co-occurrence with other words. However, in a purely practical sense it is only in recent times that machines have given us the ability to identify these relationships in a meaningful and significant way.From the simple listing of words in the Middle Ages by hand, to the earliest corpus-based analyses of literary styles, through to the first modern electronically readable corpus, the Brown University Corpus of American English, (and its close cousins the Lancaster-Oslo/Bergen corpus and the Kolhapur Corpus), the computer-aided analysis of vast amounts of authentic data has come a long way in a very short time. Almost half a century ago Firth (1957: 31) made the following prophetic statement: “The use of machines in linguistic analysis is now established”. John Sinclair (1991: 1) describes the evolution through the last three decades in the following way: “ Thirty years ago when this research started it was considered impossible to process texts of several million words in length. Twenty years ago it was considered marginally possible but lunatic. Ten years ago it was considered quite possible but still lunatic. Today it is very popular”. This popularity has led to an increased understanding of the relationship of meaning to form as formal patterns, previously undetected, have come to light. Sinclair states again, “At the very least, the quality of linguistic evidence is going to be improved out of all recognitionq¥Êt is my belief that a new understanding of the nature and structure of language will shortly be available as a result of the examination by computer of large collections of texts” (1991b: 489). Stubbs (1996) concurs, “computer-assisted analysis of texts and corpora can provide new understanding of form-meaning relations”.
It should be noted that CA involves far more than using computers for the simple counting and quantifying of linguistic features into sets of statistics. Though this may be seen as the first step in a two-stage process, it is the subsequent, qualitative analysis that provides the more revealing evidence “to propose functional interpretations explaining why the patterns exist” (Biber, Conrad & Reppen, 1998: 9). As a practical investigation, however, this paper focuses primarily on the procedures involved in obtaining and manipulating the data required to create a corpus, and while it does present some insight into possible pedagogic considerations and offer tentative conclusions based on corpus generated evidence, its scope is intentionally, limited.
Choosing a Corpus
Source, size and selectionIn response to a recent classroom inquiry, the linguistic features under investigation are “used to” and “be used to”; two grammatical forms whose constructional similarity often causes problems for beginner-level students. For the purposes of this investigation I chose to use two established corpora, the Lancaster-Oslo/Bergen Corpus (LOB), of British English established by Geoffrey Leech and Jan Svartvik, and its American counterpart, the Brown University Corpus of American English (Brown), running parallel investigations under different methodological conditions. The two corpora are very similar in design: each taken from a total of some five hundred texts across a wide-range of registers, a combined total of approximately two million words.
Size is a prime concern for successful corpus-based lexicographic research. As Biber et al. warn: “To study the meaning and use of words, we need a very large corpus — a 1-million word corpus will not provide sufficient data for many words to allow for meaningful generalizations” (1998: 30). However, with more common words in a text of this size, frequencies are generally considered to be quite reliable. At a million or so words each, I was hoping that my choice of general purpose corpora would provide enough evidence to sufficiently highlight linguistic elements for possible future pedagogic exploitation.
Methodology
As primarily a practical research study, I chose to conduct this investigation employing a number of differing methods. In the first instance, I examined the LOB corpus using a CD-ROM provided by the International Computer Archive of Modern English (ICAME), running the analysis through a software application, the Aston Text Analyser (ATA), supplied by Aston University. I also used part of the LOB corpus to examine the practical problems one might encounter in the creation of a pedagogic corpus, established corpora not always being readily available for investigation and exploitation.As a reflection of recent advances in Internet technology, I was also interested in conducting a limited parallel study, making use of an on-line version of the Brown corpus, a free but time-restricted service provided by the University of Pennsylvania's Linguistic Data Consortium, (LDC). Details of distribution and copyright restrictions pertaining to both texts are included, (Appendix C).
It should be noted here that although the Brown corpus is also supplied on the ICAME CD-ROM, I chose not to access it in the traditional way preferring instead to examine the benefits and shortcomings of locating and accessing corpora via the alternative, and increasingly popular, on-line method.
Equipment Used
The study was conducted with the aid of a generic desktop personal computer running the Windows operating system. Software support was provided by the WinATA Mark 2 text analyser, a word processor, MS-Word 97 and an Optical Character Recognition (OCR) program, Caere Omni-Page Pro 9.0 used in conjunction with a flatbed scanner.Data Input: Scanning and OCR
Equipment and procedure
In some instances, teachers and researchers may not have access to established corpora due to resource limitations. In other cases, most notably for investigations in English for Specific Purposes (ESP), it might be necessary to manually create a specific pedagogic corpus. In creating such a corpus for use in CA, one possible means of inputting data is to scan text directly into a computer using a suitable combination of hardware and software. In order to explore the limitations of such a procedure, I used a Microtek ScanMaker X6 scanner, a low budget flatbed model, together with Caere Omni-Page Pro 9.0 OCR software, which was supplied as part of the scanner package.For the limited purposes of this exercise, I first selected a section of some five hundred words from my LOB corpus, cut and pasted them into a new document and saved this as a separate text file. This was then printed onto a standard sheet of A4 paper, and then scanned directly into the computer. An almost flawless text conversion is testimony to the development of OCR software in recent times. A few years ago a similar exercise may well have resulted in a bout of severe frustration, even when scanning a simple page of text. These days, more advanced programs such as Omni-Page Pro offer much greater speed, reliability and flexibility, especially when integrated into established word processing applications such as Word and Word Perfect. Carefully scanned pages of text assimilated in this way can form the basis for a ‘personal’ pedagogic corpus, to be subsequently examined by a suitable text analysis program.
Some Points to Note
There are two significant considerations that can effect the quality of the final output from the scanning procedure. Firstly, and most importantly, is the quality and condition of the document that one wishes to scan. I was using a clearly printed black text on a clean sheet of white plain paper. Highly colored, glossy, marked or even creased papers have all been known to cause problems with OCR software. The second consideration relates to the complexity of the document. As my inquiry revealed, regular text is not really a problem for this kind of application. However, when one mixes text, graphics and tables, more time needs to be spent in the setup process before attempting the conversion. I also found in this exercise that the software occasionally flagged correct words simply because they were not in the dictionary it was using.LOB and ATA
Installation
Installation of the ATA software suite is via CD-ROM. It is important to note during the installation process that in order for the software to function correctly, all files must be extracted into the same location and not into separate folders. Correct installation creates two executable programs; ataIndex and ataInsight which must be run separately, one after the other. The first of these, as the name suggests, creates and indexes the corpus. In the case of LOB, this entails specifying the correct path for the location of the text to be indexed and titling the project appropriately. When the indexing has been completed, it is then necessary to run the second application, ataInsight. This opens an ‘Open ATA project’ window in which the now indexed LOB text can be found. On selecting ‘OK’, the program starts its analysis of the chosen project.Frequency and filter
My investigation is to specifically look for occurrences of “used to” within the corpus. To do this, it is first necessary to locate “used” from the ‘Word Frequency List’ which opens automatically on the left side of the screen. Selecting this entry, (with ‘Collocations’ checked in the right-button mouse menu) creates a list of contexts in a right-hand window; some 181 entries in total.Next, it is desirable to refine a little further using the collocation‘Filter’option, reducing the list to those lines containing my chosen sub-string. Adding “to_” to the filter generates a final list of 178 concordances which contain only my target search string, “used to”. By selecting ‘Export’ from the right-button mouse menu, concordances can then be exported with relative ease from within the application and opened in a word processor, ready for tabulation, (Appendix A). From a total of 1,022,828 tokens, the following frequency list is generated. Relative frequencies are out of 10,000:
Observations
Presentation, an important consideration not merely for aesthetic purposes, also demands a practical working knowledge of basic word processing operations. Ideally for beginner-level students, concordances are presented in a clear and easy to read tabular format, sorted alphabetically to enable the swift identification of collocation patterns, (Appendix A and Appendix B).Brown Corpus
As mentioned above, the Brown corpus is accessed through the University of Pennsylvania's LDC internet site. It offers a selection of corpora for real-time analyses though access, as a ‘guest user’ is restricted to twenty days. On acceptance of the user terms and conditions, one is invited to enter the relevant search criteria in a series of selectable fields.An initial search returns a tagged frequency list, and generates concordances for the identified search pattern. The complete list of Brown concordances is provided in their processed form, (Appendix B).
From a total of 1,189,209 tokens, the following frequency list is generated. Once again, relative frequencies are calculated out of 10,000:
Observations
Established corpora are often the culmination of a great deal of time, effort and, most significantly, money. Such investment is jealously guarded and may not, therefore, be made generally available without due considerations of costs. In some cases this may prove to be prohibitive to the less fortuitous researcher. In this light, it can be seen that the ability to access a large on-line corpus in real-time is extremely useful for those unable to avail themselves of the more traditional resources, and also appealing to those who lack the practical wherewithal necessary for the successful exploitation of a complicated text analysis program. Such corpora also offer the added benefit of speed; a list of concordances can be generated in a matter of seconds. However, at this early stage of development the on-line corpus does not yet offer the flexibility or power of a dedicated software package, such as ATA, to sort or to filter, as need dictates.Analysis
The majority of the concordances in LOB are taken up with “used to” employed to describe past situations and events. There is a visible tendency within the list to collocate with the verb “to be” and also with other common verbs:- as fresh as it used to be, though an
- you herself what she used to be.
- But then I used to be a racing
- reading ," wrote Francis Williams,” used to be a Socialist
The corpus provides twenty-eight instances of “be used to” meaning to be “accustomed to”. The propensity is for the item to collocate with a noun or a verb, notably the gerund. Of the total number, only eleven actually occur with the gerund, which is the collocate most commonly highlighted in beginner-level textbooks. Textbooks also tend to focus on the gerund occurring after the target form:
- time before I got used to calling them portholes.
- Clara was used to following his lead
- seemed to have been used to seeing couples engaged
whereas LOB offers examples of the gerund occupying a position before the target form:
- a bit of getting used to
- She took time getting used to the indoor lavatories
And a single instance of a noun coming between the two:
- garage, but he was used to Grant taking his
A further significant observation is that more than half of the these concordances demonstrate collocations with the verb “get”:
- You'll have to get used to my bad morning
- heavy, but one got used to this
Though not the focus of this particular exercise, the list also provides some examples of the target form performing a third linguistic function, the passive voice:
- descriptions can also be used to refer to performances
- ratio decidendi}is normally used to refer to some
- beggars, a term often used to describe the population,
- ferromagnetic spinel is sometimes used to describe those ferrites
With Brown, as with LOB above, “used to” describing past events tends to collocate with the verb “to be” and other common verbs:
- eem high, but they used to be even higher,
- spe said, This soil used to be like that
- ard roll. <s> This used to be part of
Also present, as noted in LOB, are instances of “used to” employed in the passive voice:
- ma. The method used to scan the eye
- I rand, IOCSIXG, is used to specify the second
The Brown corpus offers twelve examples of “used to” meaning to be “accustomed to”; less than half of the total number present in LOB. Of these, only five collocate with the gerund:
- ke a little getting used to — not because it
- ur people have been used to accepting things as
- that must have been used to booming, `` and th
- he governor was not used to having his integrit
- jealous. <s> He's, used to me bringing home
- and only twoof the twelve co-occur with the verb “get”:
- ke a little getting used to — not because it
- little time to get, used to. After a
Possible Pedagogic Applications
In the classroom, concordances produced through the analysis of a suitable corpus can provide valuable data for the testing of existing grammatical models and practical material for the production of cloze exercises. Closer examination can also reveal patterns and constructions that may not be covered in prescribed textbooks.The initial intent of this study was to examine the differences in usage between “used to” and “be used to”. My learners do not have a significant problem with the former, but do express confusion when attempting to differentiate it from the latter. My institution's current choice of text only instructs in the use of “be used to” co-occurring with the gerund and, consequently, my students have only been exposed to this construction in their English classes. However, the majority of these concordances in Brown and LOB occur with no gerund at all, a point worthy of highlighting in the classroom. Though different in meaning, the number of cases of “get used to” provided by the corpora, most prominently LOB, may be seen as noteworthy and also deserving of my students' attention, as this particular construction is not covered in the students' textbook at all. A practical pedagogic approach to both of these issues would be to expose my students to the corpus-generated data as part of a series of carefully coordinated lessons. Through the insights I have gained in the course of this particular study, my eventual aim would be to bring CA directly into the classroom, possibly as part of the school's regular computer studies classes, and allow my students to join the investigation as part of a hands-on practical exercise.
However, to add a note of caution, as my own small investigation reveals, there are significant differences in both frequency and usage to be found even across two very ‘similar’ corpora. It is important therefore to make only tentative inferences regarding grammatical rules or patterns of use and to acknowledge the limitations of dealing with such a small sample of data. A future piece of research conducted on a much larger text might allow for some more definite conclusions to be made.
A further possible pedagogic option, requiring an extension of this study, would be to heed the advice of Willis & Willis (1996) and Peacock (1997: 152) to produce a set of authentic materials: “materials which are used in genuine communication in the real world” (Wong, Kwok & Choi, 1995: 318), taken from a spoken, rather than written, corpus and to investigate specifically any increased signs of motivation with my less-conscientious learners.
It is perhaps a fitting conclusion to note that in the course of writing this paper a further development in the evolution of computational linguistics and the internet is reported: ICAME is now the latest in a growing number of institutions offering on-line access to all of its corpora, in this case to registered users of its commercially available CD-ROM. It seems likely that such innovations, offering increased levels of accessibility to an ever-growing body of linguistic data, will continue into the foreseeable future.
References
- Biber, D., Conrad, S., & Reppen, R. (1998).CORPUS LINGUISTICS: Investigating Language Structure and Use. Cambridge: Cambridge University Press.
- Brown University Corpus of American English.
- University of Pennsylvania, Linguistic Data Consortium: http://www.ldc.upenn.edu/
- Firth, J. R. (1957). A synopsis of linguistic theory. Studies in linguistic analysis. Oxford: Oxford University Press.
- Lancaster-Oslo/Bergen corpus (1961). International Computer Archive of Modern English. Bergen, Norway.
- Peacock, M. (1997). The effect of authentic materials on the motivation of EFL learners. ELTJ, 51(2), 144-156.
- Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.
- Stubbs, M. (1996). Text and Corpus Analysis. London: Blackwell.
- Willis, J. & Willis, D. (1996). Consciousness-raising activities. In Willis, D. & Willis, J. (Eds.), Challenge and Change in Language Teaching. London: Heinemann.
- Wong, V., Kwok, P. & Choi, N. (1995). The use of authentic materials at tertiary level. ELTJ, 49(4), 318-322.
Author
Sean Maddalena holds a first degree in Law, a Diploma in TEFL and an MSc in TESOL from Aston University. Originally from the United Kingdom, he has made his home in Japan since 1989 and is currently employed by Ashiya University. His specific research interests include Course and Syllabus Design and Computational Linguistics.Appendix A
a bit of gettingq_ | used toq_ . |
plane can only beq_ | used toq_ a limited extent |
of the man habituallyq_ | used toq_ a shoulder-holster . |
such computers can beq_ | used toq_ advantage when a |
Gissingq_ | used toq_ ask ~ * ' Has he |
affluent society should beq_ | used toq_ assist the less |
Rolled barley isq_ | used toq_ balance grass or |
as fresh as itq_ | used toq_ be , though an |
you herself what sheq_ | used toq_ be . |
man myself though : Iq_ | used toq_ be a { 0G.P . } |
But then Iq_ | used toq_ be a racing |
reading , " wrote Francis Williams , "q_ | used toq_ be a Socialist |
done by administrative actq_ | used toq_ be accomplished in |
the subject Social Psychologyq_ | used toq_ be called Home-making |
of the May songq_ | used toq_ be current in |
Itq_ | used toq_ be fancier , but |
At one time " mind * * "q_ | used toq_ be identified with " |
of their larger carsq_ | used toq_ be made available |
her hair , it neverq_ | used toq_ be quite that |
This lessonq_ | used toq_ be read only |
Sometimes that pleasant Citroenq_ | used toq_ be subject to |
Harry of the jointq_ | used toq_ be the barman |
Thereq_ | used toq_ be three separate |
I was younger Iq_ | used toq_ be what is |
Like heq_ | used toq_ be years ago . . . |
three feet long butq_ | used toq_ being handled , in |
of the gold filletsq_ | used toq_ bind up the pŽº/span> |
Miniature cedar trees areq_ | used toq_ block out the |
technical school ) should beq_ | used toq_ broaden the youngsters ' |
British sources have beenq_ | used toq_ calculate the effective |
time before I gotq_ | used toq_ calling them portholes . |
I alwaysq_ | used toq_ clean my rifle |
Heq_ | used toq_ come every day |
Heq_ | used toq_ come to Pierre's |
remember a woman whoq_ | used toq_ come to see |
at Saintes , has beenq_ | used toq_ complete the drawing |
have been or areq_ | used toq_ control impurity build |
is what bedizened boysq_ | used toq_ dance before Mogul |
its phrases , especially thoseq_ | used toq_ describe a visit |
Kunst wasq_ | used toq_ describe certain branches |
with the conventional equationsq_ | used toq_ describe fluxes in |
unit , can be properlyq_ | used toq_ describe soils in |
a root that isq_ | used toq_ describe the herding |
as the wave functionsq_ | used toq_ describe the motion |
equation can indeed beq_ | used toq_ describe the motion . |
however , they may beq_ | used toq_ describe the motions |
beggars , a term oftenq_ | used toq_ describe the population , |
ferromagnetic spinel is sometimesq_ | used toq_ describe those ferrites |
method of measurement wasq_ | used toq_ determine accurately the |
year group was thenq_ | used toq_ determine what would |
his Cambridge days , heq_ | used toq_ display a corresponding |
elaborate dresses than theyq_ | used toq_ do . |
Mould many years backq_ | used toq_ do . |
Peopleq_ | used toq_ do all their |
strain , the two beingq_ | used toq_ draw true stress / |
young the Royal Navyq_ | used toq_ drink it before |
Heq_ | used toq_ drink the cheap , |
that report has beenq_ | used toq_ estimate the theoretical |
diametrically opposed contacts wereq_ | used toq_ facilitate the observation |
gouge , and the fileq_ | used toq_ finish off . |
the former crop beingq_ | used toq_ finish off the |
Clara wasq_ | used toq_ following his lead , |
The method wasq_ | used toq_ forecast visibility ( as |
concrete tube sections beingq_ | used toq_ form the sump |
smoothing plane can beq_ | used toq_ form the taper . |
Bank years ago weq_ | used toq_ get good hauls , 12 |
song , told me : Weq_ | used toq_ get up at |
This solution may beq_ | used toq_ give the contribution |
those places where weq_ | used toq_ go . |
much as Cecil Sharpq_ | used toq_ go about in |
Sheq_ | used toq_ go about the |
garage , but he wasq_ | used toq_ Grant taking his |
Iq_ | used toq_ hate Creedy , when |
for a drink heq_ | used toq_ have his grouse . |
The Caxtonsq_ | used toq_ have their holidays |
told me " I alwaysq_ | used toq_ hear a lot |
Weq_ | used toq_ hear talk about |
took time to becomeq_ | used toq_ hearing so much |
household possessions may beq_ | used toq_ help with the |
Apparently heq_ | used toq_ hide it in |
they may be fruitfullyq_ | used toq_ His Glory . |
and these can beq_ | used toq_ illustrate the type |
overclothe them as theyq_ | used toq_ in the old |
The term quasi-classical isq_ | used toq_ indicate that their |
growth equilibrium " paths , areq_ | used toq_ investigate the stability |
man , if you aren'tq_ | used toq_ it , * * ' he heard |
You'll getq_ | used toq_ it , adorable baby . |
that we should getq_ | used toq_ it . |
I never gotq_ | used toq_ its travel-film colours |
Two methods can beq_ | used toq_ join the crochet |
differences between jobs beq_ | used toq_ justify differences in |
a young man , weq_ | used toq_ keep strictly to |
to meet people Iq_ | used toq_ know , to see |
electric effect can beq_ | used toq_ launch ultrasonic waves |
Iq_ | used toq_ lie awake planning |
a counter-irritant almost Iq_ | used toq_ listen of nights |
Marc Chagallq_ | used toq_ live here and |
Then that's why * - " " Heq_ | used toq_ live in Tangier , " |
Theyq_ | used toq_ look * - and some |
of an elephant , wasq_ | used toq_ make a cake |
Some separated lead-210 wasq_ | used toq_ make reference standards |
crochet lace can beq_ | used toq_ make tablecloths , traycloths |
provision which was nowq_ | used toq_ make the { 0T.E . |
ancient Britons , I believe ,q_ | used toq_ make water hot |
as it is nowq_ | used toq_ mark a paragraph |
Section the term wasq_ | used toq_ mean something like |
Georgeq_ | used toq_ mix 100 stone of |
junior to Humbert , whoq_ | used toq_ mock him affectionately |
You'll have to getq_ | used toq_ my bad morning |
gauge can now beq_ | used toq_ nick in the |
three following winters wereq_ | used toq_ obtain an independent |
Heq_ | used toq_ organise film shows |
which can then beq_ | used toq_ perform an operation . |
and devices to beq_ | used toq_ perform the various |
Iq_ | used toq_ play about in |
Iq_ | used toq_ play rugger , * * ' said |
lead carrier solution isq_ | used toq_ prepare the reference |
how Alexander the Greatq_ | used toq_ recline and transact |
descriptions can also beq_ | used toq_ refer to performances |
ratio decidendi } is normallyq_ | used toq_ refer to some |
it may have beenq_ | used toq_ relate Christ's healing |
migre * ? 2s , who notoriouslyq_ | used toq_ repair to the |
she said chattily , Iq_ | used toq_ ride a bicycle . |
and personality which journalistsq_ | used toq_ ridicule , can be |
the gate the cockerelq_ | used toq_ run to meet |
for you fellows , * * ' heq_ | used toq_ say , you can |
Laughable , theyq_ | used toq_ say . |
Heq_ | used toq_ say : ^ Have whatever |
Of Kitchener heq_ | used toq_ say with humorous |
reminiscent of what weq_ | used toq_ see pŽ®St . |
seemed to have beenq_ | used toq_ seeing couples engaged |
embarrassment if she isq_ | used toq_ seeing her mother |
that force should beq_ | used toq_ settle this problem . |
the May carol heq_ | used toq_ sing , with his |
me the one sheq_ | used toq_ sing in Kimbolton |
a shaped rubber isq_ | used toq_ smooth the hollow |
was young schoolboy I | used toq_ sneak off to |
Sheq_ | used toq_ solve all the |
the clinical weekends heq_ | used toq_ spend with her . |
applied , and every meansq_ | used toq_ stop the train , |
in contrasting tones wereq_ | used toq_ strengthen garments at |
model which may beq_ | used toq_ study both the |
Heq_ | used toq_ stump round the |
possibility of power beingq_ | used toq_ supplement hand tools . |
Iq_ | used toq_ take the small |
and colleague , Campbell Dixon ,q_ | used toq_ tell of a |
The straight-edge can beq_ | used toq_ test the straightness |
is bought , can beq_ | used toq_ the best advantage . |
at ( B ) . A malletq_ | used toq_ the chisel is |
become ( 1 ) tired , or ( 2 ) moreq_ | used toq_ the disturbance . |
Soho , to get meq_ | used toq_ the food , he |
might as well getq_ | used toq_ the idea . |
they very quickly getq_ | used toq_ the idea of |
She took time gettingq_ | used toq_ the indoor lavatories |
They'req_ | used toq_ the snatch racket . |
that most people getq_ | used toq_ them . |
Jane wasq_ | used toq_ these sudden exigencies |
or chieftain to getq_ | used toq_ these trimmings because |
to tinsel compliments , weq_ | used toq_ think him unworldly , |
in an Embassy * - Iq_ | used toq_ think it was |
heavy , but one gotq_ | used toq_ this . |
You are not yetq_ | used toq_ this sort of |
decorative kale are convenientlyq_ | used toq_ tone in with |
horses ; they had beenq_ | used toq_ trains since they |
The brush contacts wereq_ | used toq_ trigger off a |
He oftenq_ | used toq_ try to imagine |
His friendsq_ | used toq_ try to persuade |
friend , William James , whoq_ | used toq_ urge that the |
in London that Jonesq_ | used toq_ use in the |
slaves * - everything he wasq_ | used toq_ using while he |
a literary province Iq_ | used toq_ visit fairly often ; |
Sheq_ | used toq_ walk straight to |
Heq_ | used toq_ walk to the |
page , would have beenq_ | used toq_ weigh bales of |
They could beq_ | used toq_ weigh several sacks |
its simplest form itq_ | used toq_ work in the |
they are a teamq_ | used toq_ working together , they |
like that she hadq_ | used toq_ write to me . |
Appendix B
ke a little gettingq_q_ | used to -- not because it |
iling teasing as heq_q_ | used to . <p> <s> `` Huskyq_q_q_q_ |
from it that sheq_q_ | used to . <p> <s> `` You |
little time to getq_q_ | used to . <s> After a |
ur people have beenq_q_ | used to accepting things as |
a new melody isq_q_ | used to accompany his narraq_q_q_ |
repetitious The logical schemeq_q_ | used to accomplish the formq_q_q_q_ |
residual hese inquiries wereq_q_ | used to adjust compilationsq_q_q_ tient |
uestions . <s>I 'mq_ | used to all three , but |
herse one hebephrenic manq_q_ | used to annoy me , month |
ageq_ seven-iron shot heq_q_ | used to approach the greenq_q_q_q_ |
s> They could beq_q_ | used to attack a nation ' |
platform and can beq_q_ | used to automatically holdq_q_q_q_ iling |
citiz--uglier than youq_q_ | used to be , and you |
ss glorious than itq_q_ | used to be , it is |
nistered here as itq_q_ | used to be , with unleaveneq_q_q_ |
or less than itq_q_ | used to be ? ? <p> <s> |
eem high , but theyq_q_ | used to be even higher '' ,q_q_q_ |
spe said , This soilq_q_ | used to be like that |
ard roll . <s> Thisq_q_ | used to be part of |
as e Catskills , whichq_q_ | used to be the summer |
that must have beenq_ | used to booming , `` and th |
ese profiles can beq_q_ | used to calculate a temperaq_q_q_ |
feeli ransports that wereq_q_ | used to carry Communist ageq_q_q_q_ |
the mails were thenq_q_ | used to carry it out '' . <q_q_q_q_ |
tional codes can beq_q_ | used to challenge and countq_q_q_q_ |
and d Margaret recall ,q_q_ | used to characterize her asq_q_q_ > |
les of crystals areq_q_ | used to classify and identiq_q_q_ |
of materials can beq_q_ | used to construct a satisfaq_q_q_ |
cattle of thousand spectatorsq_q_ | used to crowd it in |
holes and can beq_q_ | used to cut exact-size discq_q_q_ |
the words he hadq_q_ | used to defend Cromwell . <q_q_q_ he |
grea emical methods wereq_q_ | used to demonstrate the renq_q_q_ |
K factor , a termq_q_ | used to denote the rate |
s> Mines can beq_q_ | used to deny access to |
elastic resonance shifts isq_q_ | used to derive a general |
was a Spanish wordq_q_ | used to describe cattle ofq_q_q_q_ |
s ,sometimes it isq_q_ | used to describe felt humanq_q_q_ |
integritq_ ind words travelersq_q_ | used to describe Little Rocq_q_q_ |
prbody temperature isq_q_ | used to describe the radiatq_q_q_ |
e aircraft could beq_q_ | used to destroy other mobilq_q_q_ |
ese sound waves areq_q_ | used to detect submarines ,q_q_q_ ma . < |
the the anonymous Womanq_q_ | used to do , and he |
each time as heq_q_ | used to do . <s> When |
second aerated lagoons beq_q_ | used to eliminate the problq_q_q_ |
h tiles , marble areq_q_ | used to emphasize the feeliq_q_q_ |
ve operation EQU isq_q_ | used to equate symbolic namq_q_q_ |
d transom which wasq_q_ | used to fasten them to |
a satisfa lf-unloading wagonsq_q_ | used to fill silos spreadsq_q_q_q_ |
tten 2 B filter wasq_q_ | used to filter off residualq_q_q_ |
er last week ,Iq_q_ | used to follow Williams eveq_q_q_ |
power which can beq_q_ | used to frustrate the citizq_q_q_q_ -- |
atement may also beq_q_ | used to generate an RDW |
old days when `` weq_q_ | used to get the seamen |
af A hebephrenic manq_q_ | used to give a repetitiousq_q_q_q_q_ |
was another . <s> Iq_q_ | used to go with Watson |
mulated that can beq_q_ | used to good advantage . <pq_q_q_ |
eel lonely , and weq_q_ | used to hang a sign |
aps as the cave-menq_q_ | used to have in the |
he governor was notq_ | used to having his integrit |
and had already becomeq_ | used to Hesperus ' snappingq_q_q_ he |
eem strange to earsq_q_ | used to hillbilly and jazz |
and he was notq_q_ | used to horseback . <s> Now |
ngs Thorpe, can beq_q_ | used to illustrate anotherq_q_q_q_q_ power |
vocatio pleading cannot beq_q_ | used to impose unnecessaryq_q_q_q_ h |
nk together like weq_q_ | used to in the old |
the progr `` technology '' isq_q_ | used to include any and |
of time is merelyq_q_ | used to increase the realisq_q_q_ |
mobil rrently , marina isq_q_ | used to indicate a municipaq_q_q_q_ ** |
w seldom they did :q_q_ | used to it , probably . <s> |
n tactics have beenq_q_ | used to justify like tacticq_q_q_q_ |
spreads Computers are beingq_q_ | used to keep branch inventoq_q_q_ < |
the new jail , weq_q_ | used to keep prisoners in |
ng cover , could beq_q_ | used to keep the wastes |
the eye . <s> Weq_q_ | used to kid him by |
ny ? ? <s> He neverq_q_ | used to like any hot |
cereal aining appliance isq_q_ | used to lock them in |
c. <s> The Presidentq_q_ | used to look at it |
by the same methodq_q_ | used to look up a |
ith , Styka . <s> Iq_q_ | used to love this country |
he coconut palm areq_q_ | used to make candles in |
as urposes -- also areq_q_ | used to make soaps , detergq_q_q_ |
of public places thatq_q_ | used to make the Jew |
zon apabilities must beq_q_ | used to maximum advantage tq_q_q_q_ . < |
jealous . <s> He 'sq_q_ | used to me bringing home |
count mimesis '' is hereq_q_ | used to mean the recallingq_q_q_q_q_ |
if it could beq_q_ | used to measure the elasticq_q_q_ |
s> Sonar can beq_q_ | used to measure the thickneq_q_q_ |
radiat ed thermocouple wasq_q_ | used to measure the upstreaq_q_q_ |
aratus will also beq_q_ | used to measure transitionq_q_q_q_ ese |
s steel screws wereq_q_ | used to minimize corrosionq_q_q_q_ e |
The DA statement isq_q_ | used to name and define |
The DC statement isq_q_ | used to name and enter |
sample ; e bio-assay methodsq_q_ | used to obtain them . <s> |
tient of mine , whoq_q_ | used to often seclude herseq_q_q_ |
s> yesterday .<s> Youq_q_ | used to paint in them , |
ly state funds wereq_q_ | used to pay for the |
as a child Iq_q_ | used to play '' . <s> He |
he corner where youq_q_ | used to play when you |
very summer . <s> Iq_q_ | used to play with the |
out ''surpluses had beenq_q_ | used to provide a private |
ce forces have beenq_q_ | used to provide defense zonq_q_q_ |
she ed aluminum plate ,q_q_ | used to provide the dryingq_q_q_q_ |
asq_ Miss Giles alwaysq_q_ | used to refer to her |
most of what weq_q_ | used to regard as the |
ntic up there , sheq_q_ | used to say , with the |
of my ewish intellectualsq_q_ | used to say . <p> <s> |
se by instinct , heq_q_ | used to say : such places |
The party that wonq_q_ | used to say something aboutq_q_q_ |
ma . <s> The methodq_q_ | used to scan the eye |
S statement must beq_q_ | used to select the major |
stem . <s> DIOCS isq_q_ | used to select the major |
b '' . <s> It isq_q_ | used to separate two or |
s> The symbol isq_q_ | used to separate two or |
foam and can beq_q_ | used to slit continuous sheq_q_q_ |
me rand , IOCSIXF , isq_q_ | used to specify the first |
I rand , IOCSIXG , isq_q_ | used to specify the secondq_q_q_q_q_ |
upstrea equency starter wasq_q_ | used to start the arc . < |
erb garden was alsoq_q_ | used to stop bleeding , andq_q_q_ |
a lock ,which isq_q_ | used to store cumulative req_q_q_q_ |
Throu was constructed andq_q_ | used to study transition prq_q_q_ |
corrosion e been successfullyq_q_ | used to suggest ways to |
than to an Americanq_q_ | used to summers in New |
pirical data can beq_q_ | used to support whatever prq_q_q_ |
sort of thing thatq_q_ | used to take place in |
e evening . <s> Sheq_q_ | used to tell me , `` When |
South nt this opportunityq_q_ | used to tell them about |
ygous Af cells wereq_q_ | used to test each sample ;q_q_q_q_ |
invento <s> Where Americansq_q_ | used to think of a |
unt of a machine-familyq_q_ | used to this very day |
he enemy-Jew can beq_q_ | used to transform the ordinq_q_q_ |
i er , Model 565 , isq_q_ | used to transport the boatq_q_q_q_ |
was a trick theyq_q_ | used to try and conceal |
San Juan , but Iq_q_ | used to work on a |
Appendix C
Copyrights and distribution:
LOB Corpus:
The corpus and accompanying manual are available at cost to bona fide researchers through the International Computer Archive of Modern English (ICAME), at the Norwegian Computing Centre for the Humanities, Bergen, Norway.The following restrictions on the use of the material must be strictly observed:
- No copies of the corpus, or parts of the corpus, are to be distributed under any circumstances without the written permission of ICAME.
- Print-outs of the corpus, or parts thereof, are to be used for bona fide research of a non-profit nature. Holders of copies of the corpus may not reproduce any texts, or parts of texts, for any purpose other than scholarly research without obtaining the written permission of the individual copyright holders, as listed in the manual ccompanying the corpus.
- Commercial publishers and other non-academic organizations wishing to make use of part or all of the corpus or a print-out thereof must obtain permission from all the individual copyright holders involved.
Brown Corpus:
The Linguistic Data Consortium grants to you a license to use this data subject to the following understandings, terms and conditions:- Permitted Uses.
- This data may only be used for linguistic research.
- Small excerpts of text or audio data from LDC-Online materials may be displayed to others or published in a scientific or technical context, solely for the purpose of describing the research and related issues. Statistics and other summaries of LDC-Online materials may also be published in the same context. Except for such publication of small excerpts or statistical summaries in scientific or technical works, neither LDC-Online materials themselves, nor access to them, may be sold or transferred to others.
- Access by Individuals.
- To access this data, you must be a staff member, consultant, or individual providing service or doing research at an organization that is a member of the LDC, and you must agree to this user agreement and its provisions. You must terminate your access when these conditions no longer apply.
- Copyright.
- Except as specifically permitted above the display, reproduction, transmission, distribution or publication of the these databases is prohibited.
- Violations of the copyright restrictions on the data may result in legal liability.
source: http://callej.org/
Copyright (c) 2002- Sean Romano Maddalena & CALL-EJ Online (ISSN 1442-438X). All rights reserved.
No comments:
Post a Comment