Language Learning & Technology
Vol. 9, No. 2, May 2005, pp. 90-110
EXPANDING ACADEMIC VOCABULARY WITH AN INTERACTIVE ON-LINE DATABASE
Paginated PDF version
Université de Québec à Montreal
University students used a set of existing and purpose-built on-line tools for vocabulary learning in an experimental ESL course. The resources included concordance, dictionary, cloze-builder, hypertext, and a database with interactive self-quizzing feature (all freely available at www.lextutor.ca). The vocabulary targeted for learning consisted of (a) Coxhead's (2000) Academic Word List, a list of items that occur frequently in university textbooks, and (b) unfamiliar words students had met in academic texts and selected for entry into the class database. The suite of tools were designed to foster retention by engaging learners in deep processing, an aspect that is often described as missing in computer exercises for vocabulary learning. Database entries were examined to determine whether context sentences supported word meanings adequately and whether entered words reflected the unavailability of cognates in the various first languages of the participants. Pre- and post-treatment performance on tests of knowledge of words targeted for learning in the course were compared to establish learning gains. Regression analyses investigated connections between use of specific computer tools and gains.
In a 1997 review of research-informed techniques for teaching and learning L2 vocabulary, Sökmen issued the following challenge to designers of software for language learners:
There is a need for programs which specialize on a useful corpus, provide expanded rehearsal, and engage the learner on deeper levels and in a variety of ways as they practice vocabulary. There is also the fairly uncharted world of the Internet as a source for meaningful vocabulary activities for the classroom and for the independent learner. (p. 257)
The quote is interesting in a number of ways. One obvious point is that the Internet has become familiar territory for both course developers and language learners in the years since 1997. But much remains uncharted: few of the many vocabulary activities available on-line have been studied in any detail to determine their effectiveness for language learning. In this study, we take a step toward addressing this deficit. We begin by expanding on Sökmen's remarks to delineate the theoretical and research underpinnings for the design of principled computerized vocabulary activities. Then we describe a new set of tools for studying vocabulary that were designed to implement these principles. The tools are freely available to researchers, educators, learners, or anyone with access to a computer with an Internet connection, and they can be used with learners of English or French. Currently, the tools are used in English courses in 15 countries across five continents. In the second half of the paper, we examine how learners used the tools and delineate the vocabulary gains they achieved; finally, we lay out an agenda for future research.
Sökmen's (1997) criteria for designing computerized vocabulary activities reflect theoretical and research insights from several different perspectives. The first criterion of specializing on a "useful corpus" speaks to the powerful impact corpus linguistics has had on L2 vocabulary acquisition studies. During the last 20 years, ever larger corpora of materials in various genres (e.g., academic textbooks in English) have been analyzed using ever more powerful computers. This has allowed researchers to identify with a high degree of specificity which recurring words and phrases a language learner would profit most from studying, given his or her learning goals (e.g., Biber, Conrad, & Reppen, 1994; McCarthy & Carter, 1997; Schmitt, 2004; Simpson & Mendis, 2003). Lists of frequent English word families and the extent to which they offer coverage of particular genres have been explored by L2 vocabulary acquisition researchers such as Coxhead (2000), Laufer (1992), Nation and Waring (1997), Sutarsyah, Nation, and Kennedy, (1994) and others in the case of English, and by Cobb and Horst (2004) in the case of French. Although the idea of using a corpus to specify a vocabulary syllabus suited to the needs of a particular learner constituency is widely accepted in the research community, the approach does not appear to have been widely implemented (yet) in computerized learning activities. We examined 50 on-line vocabulary sites designed for learners of English that were either known to us from published research or located by keyword searches of the Internet. Explicit mention of using a corpus to select the vocabulary targeted for learning was rare. Only three presented activities for learning vocabulary that occurs frequently in a specified corpus. These were the Compleat Lexical Tutor (Cobb, 2000), the Virtual Language Centre (Greaves, n.d.) and Haywood's (n.d.) Academic Word List site.
Sökmen (1997) also notes the importance of opportunities for cognitive engagement "on deeper levels and in a variety of ways" (p. 257). These criteria are consistent with views from cognitive psychology that emphasize the role of depth of processing (Craik & Lockheart, 1972) and the richness of initially encoded associations (Craik & Tulving, 1975) in the retention of new knowledge. The implications of this perspective become clear if we consider an example of a computer activity that offers rather limited opportunities for deep processing.
Suppose an exercise simply presents target words as multiple-choice items to be matched to basic definitions, along with feedback in the form of correct/incorrect verdicts. If the activity is used to learn new words, there is admittedly some scope for deep processing in the sense that when the L2 learner matches the target word to its correct definition, he or she engages in semantic encoding -- in contrast to the more shallow processes involved in merely pronouncing the word or attending to its written form (Ellis, 1994; Laufer & Hulstijn, 2001). But completing this definition-matching activity correctly can hardly be seen as a rich learning experience. The absence of further information about the word and the lack of further opportunities for engagement mean that the encounter is not likely to enhance the building of the elaborate network of links between old and new knowledge that is associated with high levels of retention (Hulstijn, 2001). Nor is it likely to lead to the flexible entries in the mental lexicon that theorists such as Nagy (1997) argue make it possible for words to be understood when they are met in novel contexts. Clearly, activities need to offer learners something more to study than mere words and definitions. The advantage for exposure to rich linguistic input at the learning stage was demonstrated in Cobb's (1999) study of computerized activities for vocabulary learning in two formats, one that involved participants in examining multiple sentence examples of a target word in use (a concordance), and another that offered a definition accompanied by a single sentence example. The study showed that learners were more able to transfer newly acquired knowledge of a word to a novel context if it had been studied in the concordance condition.
Rich and varied input is also crucial in providing opportunities for the "expanded rehearsal" mentioned by Sökmen (1997). Rehearsal is recognized as a key factor in explicit vocabulary learning (Ellis, 1994; Hulstijn, 2001), and computerized exercises clearly serve varying individual rehearsal needs well since learners can work on activities independently without taking up valuable class time. But if learners return to our basic multiple-choice activity to review the vocabulary they learned previously, they can only make the same word-meaning matches again. Ideally, computerized review activities would offer opportunities for expansion by presenting and testing target words in ever new contexts. A step in this direction has been taken in vocabulary sites that include a cloze-generator; this feature allows learners to enter a variety of texts and test their knowledge of words in new contexts. However, in our informal examination of 50 sites, we found only a few that offered either multiple contextualized examples of target words in use or facilities for building novel cloze passages or both. These were the Compleat Lexical Tutor (Cobb, 2000), the Virtual Language Centre (Greaves, n.d.), Haywood's (n.d.) Academic Vocabulary site, Mason's (n.d.) Culture Shock page, and Gerry's Vocabulary Database (Luton, 2000).
The on-line computer resources investigated in this paper supported vocabulary learning in an experimental course for university-bound learners of English in Canada. Both the design of the vocabulary course and the computerized support activities address Sökmen's (1997) challenge in several ways. First, the approach was corpus-based in that words targeted for learning in the course included the 800 items on the University Word List (UWL). This list of word families that occur frequently in academic writing is a composite of several frequency lists, which for the most part were derived from pre-computer analyses of large corpora (Xue & Nation, 1984). More recent sessions of the course have focused on Coxhead's (2000) updated and more streamlined Academic Word List (AWL), a list of 570 word families found to recur frequently and consistently across a range of academic texts in a corpus of 3.5 million running words. The approach was also corpus-based in another sense: the course materials included a set of academic readings chosen in part by the students themselves and vocabulary from these readings that they selected to study. The idea was to give students a role in identifying words that were important to know. It was expected that this mini-corpus, which reflected the reading and study interests of class members, would provide students with opportunities to meet UWL or AWL words in context and also serve as a useful source of infrequent and/or domain-specific words that do not occur on the UWL or AWL.
Secondly, a collaborative on-line word bank activity (see Figure 1 for sample entries) engaged students in more active processing than is usually available.
Figure 1. Data entry template and sample entries to collaborative on line database
Using the Word Bank (designed by the second author) involved learners in identifying important words to study, entering (i.e., typing) the words and their definitions along with example sentences into the bank, and using the gapped example sentences to review their own and their classmates' words -- all activities that engaged them in deeper processing than is needed to complete activities such as multiple-choice synonym recognition. The sound feature, which allowed students to hear the entered words and collocations, offered learners the opportunity to process the information in another modality. At the same site, the concordancing feature (originally created by Greaves, n.d., and adapted by the second author) presented learners with rich semantic, syntactic, and collocational information about a new word in the multiple sentence contexts located by the concordancer. In learning new material, the learner could attempt to guess the concordanced word's meaning, hold the hypothesis in memory, and confirm the guess by accessing the on-line dictionary that is linked to the concordance interface (see Figure 2 for an example of a concordance). Students were also encouraged to use concordancing as a technique for reviewing previously learned words.
Figure 2. First 10 lines of concordance output for the word process drawn on the Brown corpus (Francis & Kucera, 1979)
In addition to piloting the new tools, another important goal of the experimental course was challenging learners to study hundreds rather than mere dozens of new words. Academic learners need to recognize the meanings of thousands of English words in order to handle the reading requirements of university textbooks effectively (Hazenberg & Hulstijn, 1996; Laufer, 1989, 1992), and memory research reviewed by Nation (1982, 2001) suggests that learners can acquire and retain knowledge of many more new word meanings than is usually expected in language courses. These increased expectations were built into the course design; we wanted to expose students to a large number of useful new words and challenge them to increase their vocabulary size. But which words (in addition to the AWL) are most useful for a diverse group of academic learners to know? How could we identify a large number of useful words for students with differing L1 backgrounds, L2 proficiency, and academic objectives? The interactive on-line word bank software provided an answer by putting the decision in the hands of the students. This computer tool would allow them to select for themselves the words they would study in the course.
The course and the computer activities are described in more detail below. Then questions about the usefulness of the tools and the learning results are explored in a number of experiments.
The context for the research was an experimental vocabulary course for intermediate-level academic learners of English at a Canadian university. In early 2000, course designers began looking for ways to diversify the ESL curriculum that was largely devoted to developing academic writing skills. It was decided to pilot a number of alternative courses of which the vocabulary course was one. Since students were struggling with the vocabulary (and reading comprehension) sections of a mandatory in-house placement test, it was thought that a preparatory course focusing directly on academic vocabulary might be of more use than the usual integrated reading and writing course with soft-target objectives. The experimental course was offered for the first time in the fall semester of 2000 (see Horst & Cobb, 2001, for a detailed report). Over the course of subsequent sessions in 2001 and 2002, the course was revised and the software was improved and expanded to include additional activities. This paper draws on data gathered in several of these sessions. The description of the course begins with a look at how students contributed to the creation of the reading and vocabulary materials they would study. Later we turn to the activities they used to study AWL words and the items they had selected from the readings.
Figure 3. Homepage for Academic Vocabulary Development, an experimental ESL course
Building a set of academic readings for the course involved requiring students to access articles from quality magazines or newspapers on the Internet. For instance, in one session of the experimental course students were expected to read two articles of their choice each week from the Focus section of the Toronto Globe and Mail, a supplement that features essays on a variety of topics. Pieces in a recent edition compared the economic impact of floods on Canadian and Bangladeshi communities, discussed the role of grandparents in the modern family, and examined the effects of independent internet-based blogs on traditional political reporting. This range of topics is typical of the Focus section and we expected that students with varied academic interests (business, education, computer science, etc.) would consistently be able to find articles that had relevance to their fields of study and were rich in potentially useful new vocabulary. Analyses reported by Nation (2001) indicate that academic (AWL) vocabulary occurs more frequently in newspaper texts than in some other genres (e.g., fiction), so reading the articles could also be expected to offer students opportunities to meet previously studied AWL words in new contexts.
Each week students prepared summaries of the articles they had read; they also used dictionaries to look up unknown words from these readings. They then each selected unfamiliar words that they felt might also be useful for their classmates to know and entered them in the on-line Word Bank created by the second author. This provided a simple way of sharing the valuable information gleaned in the individual word quests. Figure 3 shows the homepage for the most recent session of the course. The button for Word Bank entry appears at the top of the middle column under Focus Activities. Clicking on this button brings up the Word Bank (see Figure 1). At the top of the Word Bank page is the data entry template that presents the student with spaces for entering a word, an example of the word used in context, word class information, a dictionary definition, and the contributor's name. Each week the students were required to enter five new words they had encountered in their newspaper reading in the Focus Word Bank. A sample of three Focus Word Bank entries made in the most recent course also appears in Figure 1.
In addition to the Focus texts, students also read texts related to their domains of study. Students with similar study interests were grouped around domains such as business, computer studies, science, and humanities. Group members were responsible for selecting suitable subject area texts and sharing them with others in the group. Words from these readings were entered regularly into Specialist Word Banks; the links to these appear in the third column of the homepage (Figure 3). Words entered by students in the computers group such as chip, code, and port show that the Word Banks offered good opportunities to study domain-specific words; the inclusion of dynamic, estimate, and herd indicates that other, more general words in these specialist readings were also of interest.
Any claim that learning vocabulary with a collaborative on-line database is effective rests on showing that students are able to generate accurate and useful materials for their own learning. Therefore we were interested in evaluating the quality of these student-produced materials. We were also interested to see if learners provided more informative Word Bank entries for their classmates to study when an interactive feature was added. Contributing to the collaborative on-line Word Bank had been an integral part of the course from the outset, but in the summer of 2002 a new activity was built in. This was the quizzing option (described later in detail), which allows students to test their knowledge of the words entered into the Word Bank by attempting to supply missing words in randomized gapped versions of the student-entered example sentences. Our investigation of the quality of the entries focuses on the example sentences students entered before and after this addition. Thus the first research questions are as follows:
1) What was the quality of the context support for words entered in the on-line Word Bank?
2) Did the quality improve with the addition of the self-quizzing feature?
Another quality concern was the extent to which the on-line word bank served the needs of different types of learners in the group. We recognized that not every student would be interested in all of every other student's word bank entries, but we reasoned that each student would belong to a number of constituencies within the class that had common vocabulary needs. For instance, if a commerce student was interested in a word like tycoon, other students with business interests might be curious about it too. Similarly, if a French-speaking learner was unfamiliar with a word of Germanic origin like swivel, other Romance-language speaking learners in the group might be unfamiliar with it as well. To determine whether the collaborative on-line project was living up to its potential to offer instruction tailored to individual needs, we identified two distinct first language constituencies in the group, Asian versus Romance language speakers, and examined the words they entered in the on-line Word Bank. The Romance language speakers were expected to enter fewer words of Latin and Greek origin since they are able to exploit cognate knowledge for clues to meaning, a strategy not available to Asian language speakers. Thus the third research question was as follows:
3) To what extent did language background affect students' selection of items for study?
The fourth and most important question concerns learning. We were interested in the extent to which students acquired new knowledge of the many words that were targeted in the course. As outlined above, the vocabulary items learners studied came from the two main sources, entries in the on-line Word Bank and the AWL. A list of AWL words as well as the collaborative Word Banks created each week were accessible to students for study on the class Web page. The research question that addresses new word learning is as follows:
4) To what extent did learners increase their knowledge of vocabulary targeted for study in the experimental course?
Figure 4. On-line concordance interface
The course familiarized students with a variety of research-based strategies for learning and retaining new vocabulary, but we limited our investigation to five activities that involved interactive on-line tools -- all available on the class Web page shown in Figure 3 (and to interested users at the Compleat Lexical Tutor site, Cobb, 2000). The five activities were as follows: examining concordance examples, consulting an on-line dictionary, reading hypertext, using the quiz feature of the on-line Word Bank, and entering texts into the cloze-passage maker.
The first three -- concordancing, consulting a dictionary, and reading hypertext -- go hand in hand and can be categorized as word discovery strategies. A student who concordances an unfamiliar word is presented with multiple examples of the word drawn from large on-line corpora. To concordance a word, the student types the word into the box labeled "Keyword(s)" as shown in Figure 4 where the word process has been entered. The learner then chooses one of 14 available corpora and clicks on "Search for concordances." The concordancer searches the corpus to find all occurrences of the selected word and displays them in a format that allows the user to see the many different instances of the word in use. A sample of concordance output drawn on the Brown corpus (Francis & Kucera, 1979) for the word process is shown in Figure 2. This million-word corpus is made up of 500 samples of English prose texts selected to represent a wide variety of topics and genres. It serves as the default corpus at the site; other corpora can be selected from the pull-down menu available at the "Select concordance" option.
If guessing the meaning from the concordance output proves difficult, the student can access an on-line dictionary definition by requesting a definition from WordNet (see upper right corner of Figure 2). This dictionary feature is also available at the class Web site independent of the concordancer. In addition, students had the option to read class texts (all of which were available on-line) with the help of a third tool, the hypertext feature created by the second author. This tool transforms each word of any entered text into linked hypertext; clicking on any word once allows the learner to hear the pronunciation of the selected item. Clicking twice produces a concordance (drawn from the 1979 Brown corpus, Francis & Kucera) of the word that in turn links to the on-line dictionary. An example of a typical newspaper passage of the type used in the experimental course appears in hypertext format in Figure 5 along with concordance and dictionary support for the word majority.
Figure 5. Hypertext feature
The other two computer activities can be termed practice strategies. The first of these involves using the quiz feature designed to accompany the on-line Word Bank: Once words and accompanying definitions and examples have been entered into the Word Bank (which stores the entries in alphabetical order), students can create a personalized quiz by first checking the boxes to the left of words they wish to study and then on the "Quiz checked items" button. As shown in Figure 6, this produces a screen where the example sentences are randomized and presented in a gapped format. Students can fill in the sentences by choosing from a menu of answer options that consists of the selected words. Help is available in the form of the word class information and definition that accompanies the entry. Once the quiz has been completed, the student is shown a score (percentage of correct answers) and information about which items need to be revisited.
Figure 6. Word Bank quiz (based on entries shown in Figure 1)
Finally, the fifth tool is the clozemaker. This feature (designed by the second author) allows a student to enter a text that is then transformed into a gapped passage where words of a selected frequency (1-1000 most frequent, 1001-2000, AWL, or off-list) are missing. As with the Word Bank quiz, the learner fills in a space by choosing the appropriate item from a menu that lists all of the deleted words. This was presented to the students as a useful way to review AWL words. An example with the same passage about cellphones used to create the hypertext reading activity in Figure 5 is shown in Figure 7. Each of the deleted items appears in the "Target Word Info" box at the top of the exercise. Students who want to check their understanding of one of these items can click on the word; this brings up a concordance along with a link to the on-line dictionary.
Figure 7. Clozemaker exercise with gapped AWL words
We were interested in assessing the extent to which learners used the various computer tools on offer in studying the vocabulary targeted for learning in the course. We also wanted to examine the connection between students' use of the computer tools and any eventual word learning outcomes that occurred in the course. These concerns prompted the final research questions:
5) Which of the on-line activities were used most?
6) To what extent were vocabulary gains achieved in the course associated with use of particular activities?
To summarize, the experimentation focuses on four different aspects of the computer-assisted course. First, we consider the quality of the on-line word bank entries by examining the support for meaning available in example sentences. Second, we explore the Word Bank's potential for addressing varying vocabulary needs in a diverse group of learners by looking for differences in the kinds of words entered by Asian and Romance language speakers. The third aspect is the core issue of whether new word knowledge was acquired in the experimental course; finally, we consider the participants' use of the on-line study tools and possible connections between use and vocabulary growth. In the next sections we describe the methodology and results of the various experiments conducted to answer these questions, beginning with a description of the participants.
Participants and Context
The participants were university ESL learners at two Canadian universities. The 33 students who registered for the first session of the experimental course in the autumn of 2000 represented a variety of first language backgrounds. Fourteen of the students spoke Asian languages (Chinese and Vietnamese) and 12 had Romance language background (Quebec French, Spanish, or Portuguese). There were also students who had neither Asian nor Romance first language backgrounds (speakers of Arabic, Farsi, and Russian). There was a range of abilities in the class but they can be termed intermediate-level learners. All had tested into the near-pass band on the university's placement instrument, and had been admitted to the university on the condition that they take ESL courses at the intermediate level to improve their English. Subsequent groups taking the experimental vocabulary course have been similar in character to the original group; however, a different group of participants, high-intermediate learners with French as their first language, tried out the new quizzing feature when it first became available in the summer of 2002.
The Montreal university where most of the data was collected has offered a single session of the experimental vocabulary course each year in the fall semester since 2000. With each session of the course, new on-line study tools have been developed and new research questions have been explored. Because each session is different, each question is explored with an intact group of participants that experienced the same version of the course (rather than treating the participants across the various sessions as a single group). Selecting a random sample in the small groups of participants available each year did not seem feasible; therefore, intact groups were used in the experimentation. The characteristics of the participant groups, the study tools used, and the research questions addressed in the various sessions are outlined in Table 1.
On-line tools available
intermediate level, various L1s
• Word Bank,
3 (Asian vs. Romance)
(5 students joined group late -- no analysis of data possible this session)
high intermediate level, L1 = French
• Word Bank
• new Word Bank quiz
2 (change in quality)
intermediate level, various L1s
• Word Bank
• Word Bank quiz
5 (most used tools)
6 (growth/use connection)
PROCEDURES AND RESULTS
The first three research questions pertain to the Word Bank itself: the usefulness of information offered in the student-created materials and the extent to which they reflected the needs of learners with different first language backgrounds. Answering these questions involved examining sample entries in detail. Answering the remaining questions about word learning and strategy use involved administering tests of vocabulary knowledge and a questionnaire. These procedures and the experimental findings are discussed in detail below, beginning with the investigation of the context sentences students entered in the Word Bank.
Word Bank Entries -- Investigating Quality
To investigate the quality of students' example sentences and the effect of adding the study option, we randomly selected two sets of 60 sentences that were entered into the Word Bank during an 8-week course in the summer of 2002. One set sampled entries made during Week 2 of the course, a point at which students were judged to be fully familiar with using the on-line tools to enter items into the Word Banks. The second came from entries made during Week 5 just after the new feature -- the self-quizzing option -- had been added.
To assess the extent to which an example sentence supported the meaning of the target word, we followed a method inspired by Beck, McKeown, and McCaslin (1983). First, we deleted the target words from the 120 sentences and asked four native speakers to supply the missing items. These responses were then evaluated by two native speaker raters. For instance, four responses to the gapped version of the sentence "Punishments which are swift and sure are the best ________," were kind, answer, deterrent, and deterrent. This sentence had been entered by a student as a context sentence for the word deterrent. Responses that bore no clear resemblance to the meaning of the target word (kind and answer) were awarded a score of 0 points while the two exact matches were each awarded a score of 1 point. In this case, the total score for the example sentence was 2 points (0 + 0 + 1 + 1 = 2). Responses that approached the meaning of the gapped word such as children (target = offspring) or tremble (target = shudder) were awarded .5 points. Thus the possible supportiveness score given an example sentence ranged from 0 points (no responses resemble the meaning of the target) to 4 (all four responses match the target exactly). There was a large amount of agreement in the scores awarded by the two raters (inter-rater reliability = .92). Scores assigned by the two raters were added together, resulting in a single score for each example sentence that ranged from 0 to 8 possible points. Then the two sets of 60 scores (from weeks 2 and 5) were tested for differences using a t-test for unmatched samples. It was expected that using the Word Bank to study for class tests would prompt students to enter more informative sentences with the addition of the new interactive quizzing option in week 5.
Results -- Question 1
As Table 2 shows, the mean rating for all 120 entries amounted to 2.46 (SD = 2.20). The general picture emerging from this analysis is one of useful example sentences that support the meaning of the target words. Once the mean score of 2.46 is halved to arrive at the average score awarded by a single rater, the result (1.13) is just over the score attained when one of the informant responses matches the target exactly (1 + 0 + 0 + 0 = 1), or if two of them respond with words that are similar in meaning to the target (.5 + .5 + 0 + 0 =1). Thus the mean score indicates that there were clues to meaning available in the sentences that one or two respondents were able to exploit successfully, although there was clearly also considerable variability.
n = 60
n = 60
Weeks 2 & 5
n = 120
t = 1.16, p > .05
The predominance of informative sentences is confirmed in counts of successful and unsuccessful guesses. For 94 of the 120 gapped example sentences (78.33%), at least one of the raters was able to provide a response that either matched the target or closely approximated its meaning. Only 26 (21.66%) of the sentences were given a score of 0 points by both raters. In other words, in over three quarters of the sentences, there were useful clues to meaning on offer that one or more of the respondents exploited successfully. These findings support the results of the earlier investigation of Word Bank entries (Horst & Cobb, 2001); that study also found that the quality of example sentences (and definitions) available in the student-produced on-line study materials was high. It is interesting to note that students occasionally complained about spelling or grammar errors they spotted in the Word Bank entries but to our knowledge, none have complained about the semantic information on offer.
Results -- Question 2
Table 2 shows that mean ratings amounted to 2.21 (SD = 2.29) in week 2 of the course and 2.70 (SD = 2.14) after the new feature was added. The increase in mean ratings suggests that students did indeed become more interested in entering examples that would serve them and their classmates well in the self-quizzing activity. However, the t-test indicated that this difference was not significant. A similar hint of improved quality over time was found in a similar analysis of context sentences in the 2000 session (for details, see Horst & Cobb, 2001) but there too, differences were not statistically significant. Since students are probably lifting context sentences directly from the reading passages rather than carefully constructing informative sentences to support the meanings of entered words, it is not surprising that the quality of the sentences remained fairly consistent over time. Research by Zahar, Cobb, and Spada (2001) indicates that such naturally occurring sentences appear to support word meanings rather well, thus the more pertinent question in the case of the Word Bank entries may have been, Did the students supply enough of the language surrounding an entered word to offer useful clues to meaning? The results of the experiment indicate that the answer was yes.
To determine whether students of different L1 backgrounds were using the on-line resources in different ways to meet their varying vocabulary needs, we examined words entered into the Word Bank by students of Asian and Romance language background. To compare the words that learners in the two groups entered, we prepared two sets of 300 words each. The Asian set consisted of 300 items entered in the Word Bank during the first three weeks of the course by 14 learners whose first language was Chinese or Vietnamese. The Romance set consisted of 300 items entered by 12 French, Spanish, and Portuguese speakers.1 Each set was analyzed using lexical frequency profiling software (VocabProfile adapted by Cobb, 2000, from Nation & Heatley, 1996). This program sorts the words of any entered text into the following categories: words that are on the list of the 1-1000 most frequent word families,2 words on the 1001-2000 most frequent list (West, 1953), items on the Academic Word List (Coxhead, 2000), and "off-list" words that do not occur on any of the frequency lists. Since these are category data, a chi-square test was used to determine whether patterns in the two data sets differed. We hypothesized that the proportion of entries from the AWL band (which contains many words of Greco-Latin origin) would be larger in the Asian group than in the Romance group.
Results -- Question 3
The results presented in Table 3 show that this hypothesis was borne out. The number of AWL words entered by students with Asian language background (18 %) exceeded the number of Romance language entries in this category (11%). On the other hand, the Romance group entered more high frequency words than the Asian group. Investigation of this category data using a chi-square test showed that the pattern of entries in the two data sets differed significantly (χ2 = 13.83, df = 3, p < .05).
Table 3. Distributions by Frequency of 300 Words Entered by Two L1-Based Groups
% in Asian group
% in Romance
The symmetrical differences between the two groups are especially striking if the two high frequency categories (entered words in the 1-1000 and 1001-2000 most frequent bands) are taken together as shown in Figure 8. There we see that a total of just 12% (7 + 5) of the Asian entries were highly frequent English words but more than twice as many of the entries made by Romance language speakers are words from this category. Over a quarter (18 + 9 = 27%) of all the words they entered were on the list of the 2,000 most frequent English word families. Many of the most common English words are of Anglo-Saxon origin and have no cognate equivalents in Romance languages; this makes them more likely to be unfamiliar to Romance speakers than less frequent Latin-based English words such as facilitate or maximize. The occurrence of common words of Germanic origin like flew, storm, and height on the list of Romance entries suggests that learners in the group were indeed directing their attention to non-cognates. It is also possible that factors other than access to cognates have a role in accounting for the results (e.g., differing perceptions in the two groups as to whether entries should be totally new items or might also include familiar but only partially understood words). In any case, there is clearly reason to believe that both groups were well served by the word learning opportunities offered in the interactive on-line Word Bank. The bar chart also shows that the majority of the words students in both groups entered was in the low frequency "off-list" zone (69% in the Asian group and 61% in the Romance group); this is the category of words we expected the Word Bank would be used for.
Figure 8. Distributions of 300 entered words in two L1-based groups
A study of the first session (Horst & Cobb, 2001) showed that learners acquired knowledge of AWL vocabulary in the experimental course, but showed little evidence of increased knowledge of words entered in the Word Bank. The likely explanation was the use of a standardized research instrument (The Vocabulary Levels Test; Schmitt, Schmitt, & Clapham, 2001) that tested 30 AWL words but only a few of the items that students had entered into the Word Bank. A goal in the most recent session was to create and test more sensitive, purpose-built measures. Developing the new measures involved selecting three magazine texts to be read in the course (in addition to student-selected readings). Since the investigation of Romance and Asian entries reported above indicated that most of the words students selected for entry into the Word Bank were off-list items (words that did not occur on lists of the 2000 most frequent English word families and the AWL), we decided to use off-list words that occurred in the magazine readings as test targets on a pre-test.3 We expected that when students eventually read the texts and entered words into the Word Bank, entries would include some of these pre-tested words. We would then be able to administer a post-test at the end of the session that would allow us to compare students' knowledge of words they had entered into the Word Bank (and studied using tools available on the class website) to their knowledge of words that had not been entered.
The procedure was as follows: At the outset of the session, the students were asked to rate their knowledge of a random sample of 150 off-list words that occurred in the magazine readings, 50 from each of the three texts. The ratings instrument presented the students with the words and required them to indicate whether they knew the meaning of an item by choosing one of three options: YES (sure I know it), NS (not sure) or NO (I don't know it), as shown in Figure 9. This is an adaptation of a technique developed and tested by Horst and Meara (1999). In later weeks, students read the pre-selected texts along with the other course readings and entered unfamiliar words into the Word Bank as usual. Sample items from the self-rating instrument are shown in Figure 9. As it happened, 21 of the 150 pre-tested words were eventually entered into the Word Bank by students in the course and so made available for study by all. This meant that by the end of the course it was possible to ask students to rate their knowledge of the pre-tested words again and assess the learning effects of the Word Bank activities by comparing knowledge ratings for the 21 entered words to ratings for the remaining 129 words that had also been encountered in course readings but were not entered in the Word Bank. Differences in the percentages of words students rated YES in the two conditions were tested using a t-test for matched samples.
Figure 9. Sample items on ratings measure
In addition to the ratings instrument -- a self-assessment measure that allows a possible role for over-estimation of gains -- the experiment also included an individualized end-of-course test that required students to demonstrate knowledge of words. Creating this test involved identifying 10 words that met the following criteria: All 10 words were items a participant had rated NO (not known) at the beginning of the course; 5 of these had eventually appeared in the Word Bank while the remaining 5 had not. The test (based on Wesche & Paribakht's Vocabulary Knowledge Scale,
1996) required students to produce a synonym of a target word and if possible, to also incorporate it in a meaningful sentence. Sample questions from the demonstration test are shown in Figure 10. Numbers of words that students were able to either define accurately or define accurately and use in a correct sentence (see answer formats 2 and 3 in Figure 10) were tallied. Then success rates for words that were not entered in the Word Bank were compared to those for words that had been entered. Again, t-tests for paired data were used to test the difference in means in performance on the two sets of words.
1. I don't know what this word means.
2. I am not sure. I think it means ...........................................
(Give the meaning in English, French, or your language.)
3. I know this word. It means ........................................ and I can use it in a sentence. (Write the sentence.)
Figure 10. Sample items on demonstration measure
Results -- Question 4
Pre-post comparisons of mean percentages of words rated YES indicated that all 14 participants knew more words in both entered and un-entered categories at the end of the course than they had at the beginning. Knowledge of the 129 words students met in reading the selected passages but were not entered in the Word Bank increased significantly from about 53% to 69%, a gain of roughly 16% (t = 9.21, p < .0001); this small gain is consistent with accounts of word learning through naturalistic exposure in conditions where the cognitive processing demands are relatively low (Laufer & Hulstijn, 2001). Knowledge of the 21 items that were entered in the Word Bank increased more substantially, from around 39% at the beginning of the course to about 77% by the end -- an increase of over 37% and more than double the gain made on the un-entered words. This difference was significant (t = 10.61, p < .0001). The change is especially striking since the mean knowledge level of these words was initially lower than that of the un-entered words and the endpoint higher. These results are shown in Table 4.
(n = 129)
(n = 21)
The second test required learners to demonstrate knowledge of words they had identified as not known (i.e., rated NO) at the outset of the course. The mean percentage of words for which knowledge was successfully demonstrated amounted to 17.5% in the case of the 5 un-entered items, while the figure for the 5 entered items was nearly double at 31%. A t-test for correlated samples indicated that this difference narrowly missed significance at the .05 level (t = 2.04, p = .06). These results appear in Table 5. The findings of this demonstration test provide substantiation for the gains registered on the ratings instrument. The doubled gain for entered words found here corresponds to the doubled gain found there; thus there is reason to assume that gains reported on the ratings measure reflect demonstrable increases in knowledge of the meanings of words rather than optimistic over-estimations. In sum, the results indicate that by the end of the course learners had gained and retained knowledge of about a third of the words entered in the Word Bank -- at the fairly high criterion of being able to produce accurate definitions.
t = 2.04, p = .06
Keys to Success -- The Strategy Questions
Students' use of the five resources -- the on-line dictionary, the concordancer, the Word Bank quiz feature, hypertext reading, and the cloze maker -- was assessed in a survey administered at the end of session 4. Students were asked to indicate how often they used each tool by choosing one of the following options: never, once or twice, fairly often, very often and almost always. Each answer was assigned a number value ranging from 0 for never to 4 for almost always. Differences in mean use scores for the five tools were investigated using an ANOVA for matched samples. The possible relationship between use of a particular tool and vocabulary gains was explored using regression analysis with use scores for the various tools as the independent variables.
Results -- Question 5
Mean ratings indicated that the most used strategies were consulting the on-line dictionary directly (M = 2.43, SD = .85) and using the Word Bank quiz feature (M = 2.36, SD = .84). The group means place use of these two strategies in the fairly often to very often range. Results for all five strategies are shown in Table 6. The ANOVA (df = 4) and post hoc Tukey test indicated significant differences (p < .05) between the two most used features (dictionary and Word Bank quiz) and the two least used features (concordance and hypertext). Other comparisons did not deliver significant differences. The finding that the dictionary was popular is not surprising. In the weekly task of entering five words into the Word Bank, pasting in WordNet definitions was probably an appealing alternative to manually typing in definitions from a paper dictionary. The attraction of the Word Bank quiz is also clear. No doubt students used this resource as they studied for midterm and final tests on Word Bank items. Mean use of the clozemaker, which approaches the "fairly often" level (M = 1.79, SD = .80), was seen as unexpectedly high by the course teacher who reported that she had directed relatively little attention to this option in class.
Word Bank Quiz
Results -- Question 6
In an earlier study, a near significant relationship was found between gains made on AWL words in the course and use of the concordancer. Even though use of this strategy was not particularly high, the multiple regression analysis suggested that concordancing made a unique contribution to variance in scores (Horst & Cobb, 2001). In this study, no significant relationships were found. Of the five variables, the one that pointed to a possible connection to word gains was use of the Word Bank quiz (r = .39, p = .09). The small size of the participant group (n = 14) may explain the lack of clear findings. Also, since students were free to study the words as they pleased, other more traditional ways of studying may have obscured the contribution of the on-line tools.
In sum, the results of our experimentation so far are positive and augur well for the further development of interactive on-line activities that offer rich input and encourage deeper processing. We can point to a number of findings:
First, the experimental course has proved its feasibility. The computer-based materials were usable and able to handle the volume of vocabulary processing that researchers have long argued was possible, but which we believe is only practical in a networked context where students share their words and not every instance of processing or rehearsal must pass through a teacher.
Secondly, the learners have shown themselves able to submit Word Bank entries (interesting words of general applicability, clear examples, correct part of speech, suitable creation or selection of definition) that can be used by other learners (see also Horst & Cobb, 2001). The language of their example sentences is informative, and there is no tendency to produce example contexts too short to make any sense of. Further, the learners probably have the capacity to provide each other with even clearer contexts, as was seen in the upward movement in contextual support levels when the quiz option was added to the Word Bank.
Third, our process and materials seem not only usable but also able to be used and shared by learners with fundamentally different starting points (Romance and Asian language backgrounds) and different objectives (various specialist areas).
Fourth, it seems that many of these words, at least those that pass through the Word Banks and the numerous opportunities for further processing these provide, are not only processed but also learned, both receptively and productively.
Fifth, the learners showed good interest in deeper processing of new words on at least some occasions. For example, they could have been content to meet AWL words in word lists and banks, and self-quizzes which asked them to replace the word in the same context, but instead they took the trouble to generate novel AWL cloze passages where they would have to replace AWL words in gaps in texts of their own choosing "fairly often."
We believe that the tools investigated in this study make a promising start on the program outlined by Sökmen (1997) for computer assisted vocabulary learning. We took as a point of departure her challenge to develop vocabulary acquisition tools that
- are based on a corpus,
- expand and vary opportunities for rehearsal, and
- engage the learner at a deep level.
We have tried to operationalise these ideas in one of the several ways this might be done. To itemize, our course syllabus includes the AWL, which is based on frequency analysis of a corpus, and our learners have direct access to corpus information via the concordancer. Our learners have numerous and varied opportunities for rehearsal such as re-encountering words in spoken form, dictionaries, on-line word-banks, and self-administered and teacher-administered quizzes. Deeper learning is encouraged by having learners contribute their own words, contexts, and definitions to the course materials, and providing them with opportunities to meet words in novel contexts through the concordance and the cloze-building features. In addition, we took up Sökmen's challenge to consider the"world of the Internet as a source for meaningful vocabulary activities" (p. 257), but in our work the Internet is more than a source; it is also a medium through which to learn. A corpus approach, at least as we have realized it, is really only practicable if undertaken in a networked computer context -- the corpus access, collaboration, and general volume of our syllabus all depend on it.
Yet the deeper processing question remains far from answered even in the context of our course. We found in the earlier study (Horst & Cobb, 2001) that concordancing, while not immensely popular, appeared to be predictive of learning. Here, there was less use of concordancing, possibly because the clozemaker program may have given some of the same benefits of meeting words in new contexts but in a more coherent textual scheme. However, there are benefits to concordancing, such as the number and breadth of contexts for a given word, and the possibilities of offering it as a help option at an opportune moment (while working on a cloze passage, for example) that make us want to continue developing ways to make concordancing more usable.
Our future plans for this course are threefold:
- Materials. The Word Bank must be easier for teachers to use. The next round of this course will offer a new teacher-edit function, so any errors in students' Word Bank entries can quickly be cleaned up. The resources can also be expanded. Since the period of this study, a number of new on-line dictionaries have become available, including excellent advanced learner dictionaries from Cambridge (2004) and Longman (n.d.), and specialist ones such as Greaves' (n.d.) bilingualised English-Chinese lexicon. We intend to offer a menu of such resources. Finally, the search goes on for a good learner corpus to replace the Brown Corpus we are currently using. Better general and specialist corpora are needed. Ideally, a general corpus would be large enough to consistently offer 10 or more contextual examples for any of the thousands of middle-to-low frequency words that an academic learner of English might opt to look up, but would not feature the many extraneous off-list items that make the interpretation of even common words problematic in concordances based on currently available corpora. Specialist corpora for specific domains of study are less of a problem to develop, in principle, following a procedure established some years ago in Sutarsyah, Nation, and Kennedy (1994). Yet to our knowledge, none have been developed even for the most common academic disciplines.
- Learner tracking. We have begun looking at which resources learners are using (concordances, cloze passages, etc.) but we need to look more closely, as a step toward tying resource use to learning outcomes. In the next run of this course, we will track concordance use specifically. Since concordancing is available as a help option in completing cloze passages and elsewhere in the suite of activities, it may be getting use that students do not recall when asked about it separately on an end-of-course survey. It should be a fairly simple matter to link use of this and other resources to the IP (Internet protocol) numbers of learners' most often used computers and begin to track the sources and resources of learning.
- Better testing. There appears to be no suitable standard instrument available for assessing gains in an advanced vocabulary course. The Vocabulary Levels Test serves well at 2,000 and AWL levels, but the 5,000-10,000 level, with 30 test words representing 5,000 word families, cannot be used in this way. Students might well learn or begin to learn scores of new words in this frequency zone without producing a ripple on such a test. In this study we have experimented with ways of developing pre-post tests more tied to the words actually encountered, and shall continue to pursue this avenue. We plan to draw on techniques piloted in research by Horst (2001) to test changes in levels of partial vocabulary knowledge; measures that are sensitive to incremental growth may register acquisition at the level and pace of our experimental course more clearly.
To wrap up, one corpus-based approach to on-line vocabulary acquisition has shown itself viable and has passed the first experimental tests. And yet, we feel there is at least as much work in front of us as behind us. Sökmen has set ambitious goals for research and activity design; it will take a great deal of energy and dedication to meet them.
1. To ensure the validity of the comparison, we excluded from the analysis words entered by a participant who spoke both Chinese and French fluently.
2. Following guidelines by Bauer and Nation (1993), a word family is defined as a root word (e.g., produce) and its derived forms (e.g., product, production, unproductive).
3. The choice of off-list targets was based on the assumption that the subjects would be thoroughly acquainted with high frequency English words. This assumption seems somewhat questionable given the findings for learners of Romance language background. As Figure 8 shows, over a quarter of the Romance entries appear on West's (1953) list of the 2000 most frequent English word families.
This research was supported by generous funding from Fonds québécois de la recherche sur la société et la culture (FQRSC) and Concordia University. We are also grateful for the insightful comments provided by several anonymous reviewers.
ABOUT THE AUTHOR
Marlise Horst is an assistant professor at the TESL Centre of the Department of Education at Concordia University in Montreal. Her research focuses on extensive reading and computer-assisted vocabulary learning.
Tom Cobb is a professor in the département de linguistique et de didactique des langues at Université de Québec à Montreal. He specializes in developing on-line tools for vocabulary learning and research.
Ioana Nicolae recently completed an MA in Applied Linguistics at Concordia University. Her thesis research investigated grammatical and semantic aspects of word knowledge in university learners of English.
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253-279.
Biber, D., Conrad, S., & Reppen, R. (1994). Corpus-based approaches to issues in Applied Linguistics. Applied Linguistics, 15(2), 169-189.
Beck, I. L., McKeown, M. G., & McCaslin, E. (1983). Vocabulary development: All contexts are not created equal. Elementary School Journal, 83, 177-181.
Cambridge Advanced Learners Dictionary. (2004). http://dictionary.cambridge.org/
Cobb, T. (1999). Breadth and depth of vocabulary acquisition with hands-on concordancing. Computer Assisted Language Learning, 12(4), 345-360.
Cobb, T. (2000). The compleat lexical tutor [Web site]. Available at http://www.lextutor.ca/
Cobb, T., & Horst, M. (2004). Is there room for an AWL in French? In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language (pp. 15-38). Amsterdam: John Benjamins Publishing.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal and Learning Behavior, 11, 671-684.
Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268-294.
Ellis, N. (1994). Vocabulary acquisition: The implicit ins and outs of explicit cognitive mediation. In N. Ellis (Ed.), Implicit and explicit learning of languages (pp. 211-282). London: Harcourt & Brace.
Francis, W. N., & Kucera, H. (1979). A standard corpus of present-day edited American English, for use with digital computers. Providence, RI: Department of Linguistics, Brown University.
Greaves, C. (n.d.). The virtual language centre [Web site]. Available at http://vlc.polyu.edu.hk/
Haywood, S. (n.d.). Academic vocabulary [Web site]. Available at http://www.nottingham.ac.uk/~alzsh3/
Hazenberg, S., & Hulstijn, J. (1996). Defining a minimal receptive second language vocabulary for non-native university students: An empirical investigation. Applied Linguistics, 17(2), 145-163.
Horst, M. (2001). (2001). Text encounters of the frequent kind: Learning L2 vocabulary through reading. Unpublished doctoral dissertation, University of Wales, Swansea.
Horst, M., & Cobb, T. (2001). Growing academic vocabulary with a collaborative on-line database. In B. Morrison, D. Gardner, K. Keobke, & M. Spratt (Eds.), ELT perspectives on IT & multimedia; Selected papers from the ITMELT conference 2001 (pp. 189-225). Hong Kong: Hong Kong Polytechnic University.
Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading. Canadian Modern Language Review, 56(2), 308-328.
Hulstijn, J. H. (2001). Intentional and incidental second language vocabulary learning: a reappraisal of elaboration, rehearsal and automaticity. In P. Robinson (Ed.), Cognition and second language instruction (pp. 258-286). Cambridge, England: Cambridge University Press.
Laufer, B. (1989). What percentage of lexis is necessary for comprehension? In C. Lauren & M. Norman (Eds.), From humans to thinking machines (pp. 316-323). Clevedon, England: Multilingual Matters.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In P. J. L. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 126-132). London: Macmillan.
Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied Linguistics, 22(1), 1-26.
Longman Web Dictionary. (n.d.). Available at http://www.longmanwebdict.com/
Luton, G. (2000). Gerry's vocabulary teacher [Software]. Staffs, England: Creative Technologies. Available at http://www.cict.co.uk/software/gvd/features.htm
Mason, D. (n.d.). Culture shock [Web site]. Available at http://international.ouc.bc.ca/cultureshock/
McCarthy, M., & Carter, R. (1997). Written and spoken vocabulary. In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 20-39). Cambridge, England: Cambridge University Press.
Nagy, W. (1997). On the role of context in first - and second-language vocabulary learning. In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 64-83). Cambridge, England: Cambridge University Press.
Nation, I. (1982). Beginning to learn foreign vocabulary: A review of the research. RELC Journal, 13(1), 14-36.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge, England: Cambridge University Press.
Nation, I. S. P., & Heatley, A. (1996). Vocabprofile, word and range: Programs for processing text. Wellington, New Zealand: LALS, Victoria.
University of Wellington.
Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge, England: Cambridge University Press.
Schmitt, N. (Ed.). (2004). Formulaic sequences. Amsterdam: John Benjamins.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55-88.
Simpson, R., & Mendis, D. (2003). A corpus-based study of idioms in academic speech. TESOL Quarterly, 37(3), 419-441.
Sökmen, A. J. (1997). Current trends in teaching second language vocabulary. In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 237-257). Cambridge, England: Cambridge University Press.
Sutarsyah, C., Nation, P., & Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based study. RELC Journal, 25(2), 34-50.
Wesche, M., & Paribakht, T.S. (1996). Assessing vocabulary knowledge: Depth versus breadth. Canadian Modern Language Review, 53(1), 13-39.
West, M. (1953). A general service list of English words. London: Longman, Green and Co.
Xue, G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3, 215-229.
Zahar, R., Cobb, T., & Spada, N. (2001). Conditions of vocabulary acquisition. Canadian Modern Language Review, 57, 541-572
Home |About LLT | Subscribe | Information for Contributors | Masthead | Archives