Language Learning & Technology
Vol. 8, No. 1, January 2004, pp. 24-28
External links valid at time of publication.


Paginated PDF version


Connected Speech (North American English), 2000



Minimum Hardware Requirements

Pentium 200 MHz multimedia computer with Windows 98 or higher, microphone, headphones or speakers, sound card; network compatible


Protea Textware Pty Ltd
PO Box 49 Hurstbridge
Victoria 3099 Australia

Support offered

Manual, immediate response to e-mails

Target language

English -- North American, British, and Australian versions

Target Audience

Users 10 years and over; three levels -- lower intermediate, upper intermediate, advanced


US $120.00 for a single user license


0 9587330 3 1

Reviewed by Joy Egbert, Washington State University

Overview of Pronunciation in Language Learning

To be able to speak and listen in a second language, it is clear that language learners need something other than just phonemic correctness. As or more important seems to be the ability to comprehend and produce in a near-native-like fashion aspects of pronunciation such as stress, intonation, rhythm, and pacing, and to use gestures and body language appropriately; in other words, to have both linguistic and sociolinguistic competence (Celce-Murcia, Brinton, & Goodwin, 1996; Florez, 1999). In many cases, however, pronunciation teaching still focuses on discrete phonemic awareness and production. For many reasons, this approach has been relatively ineffective to date (Boku, 1998; Donahue, 1999). Fraser (1999) notes that most language learners feel that pronunciation is a crucial part of language learning. Students believe the best way to improve their pronunciation is to practice, and many pronunciation experts agree that pronunciation teaching and learning must be situated in communicative contexts (Fraser, 1999; Levis, 1999; Otlowski, 1998; Wennerstrom, 1999) and help students to use metacognitive strategies in broader communication (Vitanova & Miller, 2002). Otlowski and Fraser (1999) concur with much of the current research that the goals of pronunciation teaching should be "developing functional intelligibility, communicability, increased self-confidence, the development of speech monitoring abilities and speech modification strategies for use beyond the classroom" (p. 3).

In order to reach these goals, Morley (1991) and Fraser (1999) call for more emphasis on individual learners' needs, supporting a learner-centered approach that involves authentic tasks and the use of peers and groups for interaction and feedback to help learners be critical listeners and develop the ability to notice and repair their own and others' errors. In this model, the role of the teacher is facilitator rather than error corrector or ultimate speech model. In the facilitator role, the teacher can offer various models, provide opportunities for practice, suggest specific techniques, and give encouragement and advice to learners as they work toward intelligibility. Morley (1991) calls this role the "speech coach." Fraser (2000) calls for "high quality, effective materials, especially computer-based materials with audio demonstrations, for learners of ESL pronunciation" (p. 2). Such materials, according to Chun (1998), would have to "present authentic speech samples within their cultural contexts and call learners' attention" (p. 73) to specific features. She also suggests that the software must support pair interaction and emphasize natural discourse.

Features of Protea's Connected Speech

Connected Speech is in many ways built on the theoretical foundation outlined above. The manual that accompanies Connected Speech (CS) states that the goal of the software is to improve clarity and accuracy of spoken communication and to help students develop effective communication skills. CS claims to do so by helping learners to identify suprasegmental features of spoken English, to reproduce them, and to be able to evaluate whether they did it well. The software, described explicitly in Darhower 2002), covers pause groups, pitch change, word and syllable stress, and linked words, and also has exercises in minimal pairs and syllable recognition. CS uses speech recognition to evaluate whether learners have produced sounds "acceptably."

CS's approach is defined as "meaningful context" in which video plays a central role. CS is theme based, incorporating video speeches on topics ranging from "butterflies as pets" to "running a marathon." According to the program's documentation, the activities are "interactive." Among other advantages, CS says that it provides opportunities for learners to control pace, choose authentic aspects, and receive help and feedback. Darhower (2002) documents technical problems with the software that may frustrate learners (and instructors), typically with the voice recording segments of the program.

Evaluating Software in Pronunciation Teaching and Learning

In order to evaluate pronunciation software, we need to assess how well it teaches, or helps us to teach, in ways that will help students improve their pronunciation. This review is based on seven criteria taken from the literature described above. This list is not all-inclusive, nor does its use imply that the software must meet all of these conditions. These focused criteria, however, can serve as a basis for a discussion of the effectiveness of a software package for pronunciation teaching and learning.


1. Present Authentic Speech Samples and Natural Discourse

In CS, nine people with different North American regional accents and ways of speaking provide stories and information to the learner in each of three levels. The learner can listen to a brief greeting from each character to get an idea of what that person sounds like. The learner works extensively with the short audio presentation from the character. Although the documentation claims that the speech samples are unscripted, the speech sounds unnaturally fluent and slow. Even in Level 3 recordings, word endings are very well pronounced, and the speech is stilted-sounding to a native speaker. In fact, the character Guillermo spoke so slowly and clearly that it was very difficult to determine links and pitch even in Level 1 exercises. His speech, however, was very easy to comprehend and this may assist learners to develop listening skills. In addition, because there is no interaction among characters in the videos (they are the ubiquitous "talking heads"), there is no chance to listen to natural discourse in this program. The characters exhibit few sociolinguistic clues, even little facial expression, to aid in comprehension and the learning of these skills. To fill this gap, the instructor can develop communication tasks for students centered on the themes and exercises in the program; for example, they may work in pairs with Web-based materials to determine what kind of butterfly would make the best pet.

2. Focus Learners' Attention on Both Segmental and Suprasegmental Features

This is a real strength of CS. It is very complete and thorough and provides many opportunities for learners to work explicitly with both kinds of features. Learners can listen, produce, and learn about these features through a large number of exercises on any of three levels.

3. Support Social Interaction and Communication

While the ultimate goal of CS is to improve students' oral communication, there are no opportunities within the software itself for authentic communication and no real examples of such. According to the documentation, CS is intended for independent or school use, as a supplement to classroom instruction, or for pairs in cooperative groups. CS does not support social interaction between learners per se. However, instructors could supplement with activities to support discussion about topics such as how "native" speakers sound different from each other or whether anyone would like to have a butterfly for a pet. Instructors could also assign roles to learners working in dyads to ensure that each learner has a reason to focus and work during program use.

4. Focus on Intelligibility

The speech recognition features of CS provide opportunities for learners to test whether their language is intelligible, but there are several problems with this feature. First, as Darhower (2002) found, the speech recognition does not always work, and when it does it does not always work well. In addition, whether the computer can recognize an utterance may not have any relation to whether the same utterance can be recognized by other speakers. Finally, this feature can recognize intonation, stress, and other suprasegmental features of language, but it cannot determine whether the sentence is grammatical or semantically plausible. These limitations are consistent across software programs that use speech recognition in its current state; instructors may want to prepare students ahead of time to work within these restrictions.

5. Support the Development of Metacognition and Critical Listening

According to pronunciation research, this happens through real communication, which is not possible with CS. Perhaps future research with this software will demonstrate that learners can develop these skills through its use.

6. Provide Opportunities for Practice

CS does provide many opportunities for practice with both segmental and suprasegmental features. This practice is within the context of the audio clips, but because of the limitations of the technology, it is typically drill-based. Learners are generally attempting to get closer to the native models presented in the software rather than being judged on whether their communication would be successful during social conversation.

7. Provide Scaffolding and Individualized Feedback

CS provides a variety of scaffolds for learners. For example, help is present in the form of both oral and written instructions. In addition, navigation ease provides support for efficient program use, and the consistent interface makes the program easy to use (once learners understand what each icon means). Learners can also replay and/or review all audio and text when the program is in learning mode. Furthermore, written scripts accompany the video segments; however the script text does not always scroll at the same speed as the audio, and the character's mouth in the video is often slightly behind the audio. The hotwords within the script are useful for learners to understand the clip, but the explanations vary in ease; for example, "to remember" is defined as to "have a picture in mind," which might be equally difficult for learners to understand.

Although these scaffolds are a strength, one major weakness of this software program is the "wrong, try again" approach. Answers are judged either to be correct or incorrect (including some cases where there can be alternate answers) with little other feedback, and answers are supplied after the third attempt with little explanation. Additional explanation would help learners to focus on their specific errors, and the addition of hints after the first and second attempts would support learners in thinking about their answers. For example, when learners are working on determining the number of spoken syllables in words, it might be more effective for some learners to be shown an answer instead of just being told "well done." The teacher (as speech coach) can work with the learners as they use the software to supply some of this important feedback.


CS has strengths and weaknesses. In addition to those listed above, strengths include

On the other hand,

Taking these comments into consideration, whether or not teachers and learners should use CS depends on programmatic goals, resources, and learner needs.


Joy Egbert is Assistant Professor of ESL and technology at Washington State University in Pullman, WA, and Director of the Training for All Teachers Grant in the College of Education. Her research and teaching interests are CALL, distance learning, teacher education, and ESL methodology.



Boku, M. (1998, October). Student-centered pronunciation practice: More than "right" or "light." The Language Teacher Online, 22(10). Retrieved November 25, 2003, from

Celce-Murcia, M., Brinton, D., & Goodwin, J. (1996). Teaching pronunciation: A reference for teachers of English to speakers of other languages. Cambridge, England: Cambridge University Press.

Chun, D. (1998, July). Signal analysis software for teaching discourse intonation. Language Learning and Technology, 2(1), 61-77. Retrieved November 25, 2003, from

Darhower, M. (2002, April). CALICO software review: Connected Speech. The CALICO Review. Retrieved November 25, 2003, from

Donahue, S. (1999). Teaching pronunciation on line. Retrieved November 25, 2003, from

Florez, M. (1999, June). Improving adult English language learners' speaking skills. National Center for ESL Literacy Education (Report No. EDO-LE-99-01). Available online at ERIC Digest

Fraser, H. (1999). ESL pronunciation teaching: Could it be more effective? Australian Language Matters, 7(4), 7-8.

Fraser, H. (2000). Coordinating improvements in pronunciation teaching for adult learners of English as a second language. Canberra: DETYA (ANTA Innovative Project).

Levis, J. (1999). Intonation in theory and practice, revisited. TESOL Quarterly, 33(1), 37-63.

Morley, J. (1991). The pronunciation component in teaching English to speakers of other languages. TESOL Quarterly, 25(1), 51-74.

Otlowski, M. (1998, January). Pronunciation: What are the expectations? The Internet TESL Journal, 4(1). Retrieved November 25, 2003, from

Vitanova, G., & Miller, A. (2002, January). Reflective practice in pronunciation learning. The Internet TESL Journal, 8(1). Retrieved November 25, 2003, from

Wennerstrom, A. (1999, October/November). Why suprasegmentals? TESOL Matters, 9(5). Retrieved November 25, 2003, from

Home | About LLT | Subscribe | Information for Contributors | Masthead | Archives