Language Learning & Technology
Vol.11, No.1, February 2007, pp. 102-108

External links valid at
time of publication

Paginated PDF Version

Michael Rost
Lateral Communications


In my work as an author and teacher trainer, I have the opportunity to travel around the world and talk to teachers in a variety of settings. Though I meet teachers with a range of backgrounds and a wide disparity of resources, I find that a few common themes come up whenever I talk with teachers about language teaching and technology. One of the familiar refrains is that most of us claim to lack the technological resources we feel we need to teach effectively. There’s always something new on the horizon that we feel we just have to have. Another recurring theme is the lament that most of our students just don’t seem to take advantage of the extra learning opportunities we present them anyway! Teachers want to help, but often feel under appreciated for their efforts. 

Personally, I have relished the ongoing advances in technology over the course of my teaching career. I started out as a secondary school teacher in Togo, West Africa with chalk – sometimes yellow or pink! – and a blackboard as my only teaching technology. When teachers express a sense of being overwhelmed by new technology, I sometimes talk about my own beginnings and also remind them of a few of Donald Norman’s principles of human-centered design. According to Norman (2004), for any new technology to be effective, it must be intuitively helpful and elegantly efficient. In the case of language teaching, this means the technology must – immediately and transparently – help us teach better than we do already. If it doesn’t, we simply shouldn’t use it. In addition, Norman says, for any new technology to be widely adopted, it must appeal to the emotions as well as to reason. If people don’t enjoy using a particular technology, no matter how logically useful it may be, they will tend to shun it. 

Perhaps because as language teachers we tend to favor eclecticism, we will often throw any emerging technology into the mix as a "helpful resource." As Doughty and Long (2003) point out, teachers often do not distinguish between new technological tools that are innovative but not actually helpful and those which are innovative and genuinely helpful. In my own instructional design, I have identified three "intervention phases" in the listening process: decoding, comprehension, and interpretation. Before we assume any new technology or intervention is actually going to be supportive, I believe we need to understand the learners' goals during these listening processes. What actually motivates the learners towards achieving these goals is what ultimately will be useful.


The current issue of Language Learning & Technology offers three articles that provide frameworks for evaluating technology in the teaching of listening, in that they examine some of the variables that affect quality of instruction.

In the first article, "Help options and multimedia listening," Grgurovic and Hegelheimer provide a study of input, task and feedback modifications for a recorded academic lecture. An operational goal of their study is documenting how the frequency and time of use of different help options affect learner comprehension. This study confirms the current position in much CALL research that the additional "interactions" with support options not only tend to aid text comprehension, as demonstrated by increase in post-listening test scores, but also promote language acquisition, as inferred through the input-interaction hypothesis (that is, if interaction, specifically repair-motivated interaction, promotes comprehension, and if comprehension promotes acquisition, then interaction promotes acquisition). While the audio-video input in the study, an Astronomy lecture, itself is not modified or elaborated, the opportunities for processing input are amplified through the optional use of repeated viewings, subtitles, transcripts, lexical pushdowns, and feedback on responses to comprehension questions. 

The main value of the Grgurovic and Hegelheimer study, in my view, is not so much the attempt to substantiate the position that increased interaction (human-human or human-machine) tends to promote comprehension and acquisition. For me, a more pragmatic value is in the authors’ investigation of the learners' patterns of navigation for the support options. The authors suggest that navigation patterns in use of subtitles, transcripts, dictionary look ups, and explanations are related to proficiency: More proficient learners make more use of the additional options. This is not surprising, of course. Lower proficiency students generally seek less input because if they are processing input inaccurately or incompletely, more input usually leads to more confusion. Because of this "help option dilemma", teachers and media designers need a qualitative instructional paradigm that introduces support options in ways that are intuitively supportive to the learners. For instance, Martinez (2001) suggests that different types of learners ("transforming learners," "performing learners," "conforming learners") prefer different sequencing and alternative representations of "support" during a task. If we want to assist learners as they listen, we cannot assume simply that more intervention leads to better learning outcomes.

In the second article of this volume, "Are They Watching?", Wagner provides a study of listener behavior in video-based test taking situations. Prior to the description of his own study, the author provides a valuable survey of recent studies on the use of video to teach listening. Wagner leads up to the now axiomatic claim that multimedia experiences provide learners with richer, more authentic and more memorable encounters with the target language. The well-established arguments for teaching listening through subtraction of all but the audio channel are outweighed by the value of authentic, engaging input. (There is however a ceiling effect on the value of richness in multimodal input for teaching purposes, particularly concerning simultaneous presentation of graphics, text, and audio.  See Clark and Mayer (2002) for a discussion of how "seductive details" in multimedia can depress learning.)

The hub of the Wagner study is an observation about viewer attention that previously had not been clearly documented. Wagner investigates the link between visual input display and attention to input. He finds that learners in his study do consistently attend to the video portion of the input in his constructed testing situations, "orienting" to the video screen on average about 70% of total time on task. This is a useful starting point for probing the notion of learner attention, though, in my view, there’s not sufficient accounting for the non-verbal version of the Observer’s Paradox. The students in the study had a video camera pointed at them from the top of each monitor during the entire test. This feature could well have stimulated additional orientation to the screen, in effect encouraging the behavior the researcher was trying to measure.

One of Wagner’s queries about his research results is especially intriguing. He wonders why learners oriented less to the video screen during dialogue input than they did during monologue input (there was about a 10% time differential). He points out that in dialogue settings a great deal of social information needed for comprehension is transmitted visually, so it would seem that listeners should use their visual channel more when processing this kind of conversational input.

My own understanding of the work on bimodal processing is that attending to multiple modes provides greater redundancy. Listeners need redundancy of all sorts. It is an essential condition for effective language processing. However, redundancy is not purely an additive-subtractive construct (Moreno & Mayer, 2002; Paivio, 1986; Reed, 2006). Once two or more modalities are combined during input processing, they cannot be separated out. The observation of this phenomenon can be traced to experimental studies of perception in the 1970s, first described by McGurk and McDonald (1976), and latter dubbed the "McGurk effect." This phenomenon demonstrates that human information processing tends to be visually dominated, but the information we perceive through hearing and through sight are coordinated, interrelated, and irreversible. In Wagner’s study, once the learners start watching and listening, and have ongoing access to both input channels, they will be utilizing both sight and hearing simultaneously. My sense is that the viewer experiences a "fused perception", which does not occur in an alternating or additive fashion, even when the viewer is seemingly ignoring the video input. Once viewers have become engaged in processing meaning from the video (images + sound), even when they temporarily turn away from the monitor, they are still "seeing" images, and when they shut out the audio input for a moment, they are still "hearing" the associated sound.

In the third article of this volume, "Using Digital Stories to Improve Listening Comprehension with Spanish Young Learners of English", Ramirez and Alonso provide a contextualized study of young Spanish learners and a relatively innovative methodology. The study involves the use of a "project-based website" ( that offers graded content lessons in the form of games, songs, and stories. The authors describe their experimental group as using "an internet-based technology" while the control group used the traditional textbook approach. The precise difference in treatments here is a bit difficult to ascertain, but it seems to be fundamentally the richness of the content and an "interactivity" platform that the teaching procedure provides for the experimental group. For the treatment group, in which a novel collaborative learning environment is established, teachers could easily and transparently integrate the technology: click on specific parts of the story graphic when requested, activate various segments of the story for quick replays, elicit responses in the form of video game moves that the children know and enjoy. This novel approach, with its spontaneous interactivity, is consistent with a fundamental sociocultural principle of learning: Learning = transformation of participation (Rogoff, Matusov, & White, 1996; Rowell, 2002).

To me, a key theme of the Ramirez and Alonso study is their excitement about the shift in participation patterns of the learners and their teachers. The teaching methodology required the learners to take an active role in the listening comprehension of the story, and presumably also required the teachers to take a more active role in teaching listening. As I understand the teaching procedure, each story unfolded in short chunks, with the children, as a group, deciding on the emotional state of a character or making a prediction about the next action before the story continued to the next phase. The authors conjecture that this particular form of participation, coupled with the richness of the input, "promoted concentration" and "focused the children’s attention on the oral input" (p. 96). Their conjecture (supported by teacher diaries) is consistent with recent listening strategies research: the goal of strategy training is to enhance concentration and inferencing (Graham, 2003;Rost, 2002; Vandergrift, 2003). Better listeners are more active in that they use their cognitive resources (such as inferencing) and social resources (such as asking questions) more intentionally. The basic claim in the strategy training research is that the effort to become more activein specific ways (such as predicting actions and construing a speaker’s motives) will make a learner a more effective listener. Moreover, especially in an EFL context, meta-cognitive strategy training – learning how to think about the listening process and how to participate – can lead to sustained attitude, motivation and behavior changes that improve long-term learning.


To link my commentary on these three studies back into my earlier query: What can we do as teachers that will help students learn to listen better? The default position, of course, is to do nothing –– other than provide lots of "comprehensible input". Stephen Krashen (1981) is often ascribed the view that comprehensible input is all that is needed to acquire a language, but I’m sure even he would agree that presentation of input alone is not enough. While we have to make sure that our learners have access to a wide range of relevant, motivating input, we also have to plan interventions that develop their skill at making the input comprehensible. Successful instructional intervention leads learners to want to make the input comprehensible, which to me is the crux of the input hypothesis. 

Helpful interventions in teaching listening then are those that promote the listener’s motivation by advancing the listener’s goals for listening. They are interruptions in the listening process that lead to a desire to listen more closely and to listen with heightened curiosity. Providing targeted interventions that focus on the component processes of listening can allow learners to get more out of each listening encounter. In tables 1-3 below I provide a breakdown of the component listening processes and related listener goals (based on Rost, 2005). I suggest types of interventions that can help learners develop these processes. These tables are designed to show which listener's goals, or goal-driven processes, (in Column 1) may be focused upon through types of interventions, or instructional plans (in Column 2). Instructional design tools (in Column 3) are learning concepts that may be useful in planning interventions.

Tables 1-3. Component Processes, Goals, and Interventions for Teaching L2 Listening

Table 1. Component Process: Decoding



Instructional design tools

  • Create an adequate phonological, grammatical, and lexical map of incoming speech 
  • Recognize a critical mass of lexical items
  • Retain unknown lexical items in short-term memory for possible processing later
  • Give user control over input speed, pausing and replay functions
  • Make lexical pushdowns available; allow for "pronounce and compare" options
  • Supply elaborated and amplified input options
  • Provide subtitling options: key word, stress group, full text


  • 3-D technology to view animated speech production (see Massaro, Cohen, Tabain, Beskow, & Clark, 2005)
  • Speech recognition tools and graphics  (see; Chun, 2002)
  • Input Processing tasks  (vanPatten, 2004)
  • Lexical pushdown options; hyperlinked annotations to target words (e.g. Al-Seghayer, 2001)
  • Online cues for noticing grammar patterns (Chapelle, 2003; 2005)
  • Automated parsers and translators (e.g. Othero, 2006; Somers and Sugita, 2003)


Table 2. Component Process: Comprehension



Instructional design tools

  • Identify salient propositions in discourse to anchor mental representations
  • Build internal model of developing discourse
  • Test hypotheses about meaning
  • Use guided online summarizing tasks
  • Provide graded questions, based on listener response
  • Furnish pop-up feedback loops on listener responses
  • Pop-up explanations and cues to aid inferencing; feedback loops and "instant replays" for incorrect responses (Rost, 2003)
  • Chatterbots to simulate discussion with learner about what the learner has understood and misunderstood. (e.g.,, see Fryer and Carpenter, 2006)

Table 3. Component Process: Interpretation



Instructional design tools

  • Work out relevance of discourse
  • Get necessary clarification of ideas 
  • Experience validation of your role as a listener
  • Allow for direct or simulated interactions with speaker
  • Create collaborative application and response tasks
  • Provide links for follow-up learner presentations
  • Participation in global cybercommunities working on common projects (e.g. Belz, 2002)
  • Involvement in video-mediated collaborations (Anderson, 2006)
  • Chatterbots to simulate conversation about texts using speech recognition (http://, see Anderson, 2006)


Building on the themes of the three articles in the current edition of LLT, I have tried to provide a framework of possible interventions for teaching listening. Which of these interventions should be used? I know that all of these interventions work – in the right context, with the right input, with the right learners. This does not mean of course that all of them can be or should be used with every listening activity or with every group of learners. A major part of our job as teachers is to know our students –  to know which aspects of listening our students tend to avoid, to know which goals are hardest for them to achieve, in short, to know what specific interventions will actually help them.

One of the very exciting aspects of teaching listening is that so many aspects of instruction, both classroom instruction and self-access instruction, can be enhanced by technology. We are now better able to offer our learners the most suitable kinds of input and provide effective forms of presentation and scaffolding. We can isolate, slow down, and manipulate listening processes in order to provide specific interventions that will actually help our learners become better – more motivated and more curious – listeners.


I would like to thank Dorothy Chun, Philip Hubbard, and Irene Thompson for inviting me to contribute this commentary and for their feedback on it. I would also like to acknowledge their ongoing efforts in making language learning technology accessible to practicing language teachers.


Michael Rost is director of Lateral Communications, an instructional design company based in San Francisco. His educational experience includes work as ESL and EFL teacher, curriculum coordinator, program director, learning lab designer, and professor in a graduate TESOL program. His research interests focus on oral language development.



Al-Seghayer, K. (2001). The effect of multimedia annotation modes on L2 vocabulary acquisition: A comparative study. Language Learning & Technology, 5 (1), 202–232.

Anderson, A. (2006). Achieving understanding in face-to-face and video-mediated multiparty interactions. Discourse Processes, 41(3), 251-287.

Belz, J. (2002). Social dimension of telecollaborative foreign language study. Language Learning & Technology, 6 (1), 60-81.

Chapelle, C. (2003). English language learning and technology. Amsterdam/ Philadelphia: John Benjamins.

Chapelle, C. (2005). Computer assisted language learning. In Hinkel, E. (ed.) Handbook of research in second language teaching and learning (pp. 743-756). Mahwah, NJ: Erlbaum.

Chun, D. (2002). Discourse intonation in L2: From theory and research to practice. Amsterdam/ Philadelphia: John Benjamins.

Clark, R., & Mayer, R. (2002). E-Learning and the science of instruction: Proven guidelines for consumers and designers of multimedia learning. San Francisco: Jossey-Bass.

Doughty, C., & Long, M. (2003). Optimal psycholinguistic environments for distance foreign language learning. Language Learning & Technology, 7(3), 50-80.

Fryer, L., & Carpenter, R. 2006. Emerging technologies: BOTS as language learning tools. Language Learning & Technology, 10(3), 8-14.

Graham, S. (2003). Learner strategies and advanced level listening comprehension. Language Learning Journal, 28, 64-69.

Krashen, S. (1981). Second language acquisition and second language learning. London: Pergamon. Retrieved January 11, 2007, from

Martinez, M. (2001). Key design considerations for personalized learning on the web. Educational Technology & Society, 4(1), 8-14.

Massaro, D. W., Cohen, M. M., Tabain, M., Beskow, J., & Clark, R. (2005). Animated speech: Research progress and applications. In Vatiokis-Bateson, E., Bailly, G., and Perrier, P (Eds.) Audiovisual Speech Processing. Cambridge: MIT Press.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

Moreno, R., & Mayer, R. (2002).Verbal redundancy in multimedia learning: When reading helps listening. Journal of Educational Psychology. 94(1), 156-63.

Norman, D. (2004). Emotional Design. New York: Basic Books.

Othero, G. (2006). Teoria X-barra: descrição do português e aplicação computacional (‘X-bar Theory: Description of Portuguese and Computational Implementation'). São Paulo: Contexto.

Paivio, A. (1986). Mental representation: A dual-coding approach. Oxford: Oxford University Press.

Reed, S. (2006). Cognitive architectures for multimedia learning. Educational Psychologist, 41(2), 87-98.

Rogoff, B., Matusov, B., & White, S. (1996). Models of teaching and learning: Participation in a community of learners. In Olson, D. & Torrance, N. (Eds.), The Handbook of Cognition and Human Development (pp. 388-414). Oxford: Blackwell.

Rost, M. (2002). Teaching and researching listening. London: Longman.

Rost, M. (2003). Longman English Interactive, 1-4. White Plains, NY: Longman.

Rost, M. (2005). L2 listening. In Hinkel, E. (Ed.) Handbook of research in second language teaching and learning (pp. 503-527). Mahwah, NJ: Erlbaum.

Rowell, P. (2002). Peer interactions in shared technological activity: A study of participation. International Journal of Technology and Design Education, 12, 1-22.

Somers, H. and Sugita, Y. (2003). Evaluating commercial spoken language translation software. Proceedings from the Ninth Machine Translation Summit, New Orleans, 370-377.

Vandergrift, L. (2003). Orchestrating strategy use: Toward a model of the skilled second language listener. Language Learning, 53(3), 463-496.

VanPatten, B. (2004). Input Processing in second language acquisition. In VanPatten, B. (Ed.), Processing instruction: Theory, research, and commentary (pp. 5-31).Mahwah, NJ: Lawrence Erlbaum Associates.

Related work About LLT Subscribe Information for Contributors Masthead Archives

Contact: Editors or Editorial Assistant
Copyright © 2007 Language Learning & Technology, ISSN 1094-3501.
Articles are copyrighted by their respective authors.