Language Learning & Technology
Vol. 6, No.1, January 2002, pp. 40-59

Paginated PDF version

Margaret A. DuFon
California State University-Chico


In recent years increasing numbers of researchers have begun to investigate second language acquisition within the socio-cultural context in which it occurs using qualitative methods and approaches such as an ethnographic approach. This frequently entails audio and/or video recording of the participants in naturalistic contexts. Yet theoretical and methodological issues related to video recording have not yet received a great deal of attention in the second language acquisition literature. The purpose of this paper is to initiate such a discussion among SLA researchers. This is accomplished by reviewing the visual anthropology, educational anthropology, and ethnographic filmmaking literature on three questions concerning the collection of valid video recorded data: a) How should the interaction be video recorded? b) Who should be video recorded? c) Who should do the video recording? Examples from my own research are presented to illustrate the kinds of problems that might be encountered in each of these areas. Finally I present my reflections on the decisions I made when videotaping so that other SLA researchers using video recording might gain some insights that will assist them when dealing with the theoretical, methodological and practical considerations of planning and implementing their SLA studies using an ethnographic approach.


In recent years, increasing numbers of second language acquisition researchers have begun to study the process of second language acquisition within the socio-cultural context in which it occurs (Lazaraton, 1995) -- whether it be in the classroom (e.g., Duff, 1995; Ohta, 1999; Poole, 1992) or naturalistic settings outside the classroom (DuFon, 2000; Iino, 1996; Krupa-Kwiatkowski, 1998; Schecter & Bayley, 1997; Siegal, 1995), or both (Rymes, 1997) -- using qualitative theoretical and methodological approaches. One type of qualitative approach is the ethnographic approach. Many studies of second language acquisition that use an ethnographic approach require microanalysis of the speech of the learners and the input from and interaction with native speakers of the target language (e.g., DuFon, 2000; Iino, 1996) or with other learners of the target language (e.g., Duff, 1995; Willett, 1995) and with their teachers (e.g., Duff, 1995; Willett, 1995). In order to meet this requirement and to study the acquisition process in the socio-cultural context in which it occurs, linguistic data are typically obtained by audio or video recording of speech during naturalistic interactions. Yet, many times, SLA researchers are not adequately trained for the task of video recording.

The training that they receive in ESL and applied linguistics programs in the areas of ethnographic methods in general, and visual ethnography in particular, is limited when compared to that received by linguistic anthropologists. Students of ESL and applied linguistics are likely to take only one or two courses in qualitative research methods, and with all the issues that must be covered (e.g., negotiating entry, selecting informants, writing field notes, utilizing various methods of data analysis), little time is available for issues specifically related to video recording. Not only are they not trained in the technical aspects of how to video record, but the theoretical and methodological aspects of video recording are not necessarily covered in classes on qualitative research methods. Additionally, little time, if any, is devoted to providing students with practical experience in video recording for research purposes. As a result, SLA researchers are often left to self-train. They may seek training from video or film production professionals, who can help researchers with learning the technical aspects of video recording, but are often unaware of and unconcerned with the theoretical and methodological issues that must be considered by the second language researcher using an ethnographic approach. Consequently, SLA researchers often go into the field less than adequately prepared to deal with video recording and its associated theoretical, methodological, and technical issues.

The purpose of this article is to share with other second language researchers who are interested in collecting data through video recording what I have learned both through the academic literature in visual anthropology, ethnographic filmmaking, educational anthropology, and the educational uses of ethnographic and video technologies and through my own field experience. This particular article will focus on an ethnographic approach to research, which is the approach I used in the investigation and that I will use to illustrate some of the points that I make here. The theoretical and methodological constraints on videotaping will vary somewhat according to the approach that is used. Therefore, researchers using other theoretical and methodological approaches to research in SLA may not have identical concerns with those discussed here. Nevertheless there may be some overlap and they may find some issues discussed here to be relevant to their work.

For those who are unfamiliar with an ethnographic approach, I begin with a brief description of what it is. This will be followed by background information on the study I conducted and which I use to illustrate the issues presented here. Next I will focus on three specific questions which deal with the issue of obtaining valid videotaped data in naturalistic settings that will enable the researcher to compose a valid account of the phenomenon under investigation: 1) How should the interaction be video recorded? 2) Who should be video recorded? and 3) Who should do the video recording? For each of these three questions, I will review the relevant academic literature from other disciplines, and then illustrate some of the problems I faced concerning the issue in question in my own research and field experiences. Through this discussion, the reader should gain some understanding of the types of problems and concerns associated with video recording in the field, as well as some of the issues that need to be considered when planning and conducting ethnographic research using video recording as a method of data collection.

This article will not deal with the technical aspects of how to film (e.g., light source, type of microphones, etc.), which are dealt with by Duranti (1997), Goodwin (1993), and Jackson (1987). Rather the technical aspects will be discussed only as they influence the theoretical and methodological considerations. Transcription and analysis of audio and video recorded data in ethnographic research are important issues and should be addressed in the second language acquisition literature, but they are beyond the scope of this paper. For further information on transcription, the reader is referred to Asch (1988), Corsaro (1982), Duranti, (1997), Edwards & Lampert (1993), Green, Franquiz, & Dixon (1997), Ochs (1988), Roberts (1997), and  Schieffelin (1990). For information on various aspects of analysis of video recorded data, good sources include Corsaro (1982), Erickson (1982, 1992), Erickson and Schulz (1982), and Goldman-Segall (1993, 1995, 1998). Furthermore, this article will not deal with the ethical issues related to the collection, transcription, and presentation of recorded data. This topic is a very large one in and deserves an article of its own. For more information on this topic, see Asch (1992), Besnier (1994), Biella (1988), Duranti (1997), Erickson (1992), Grimshaw (1982b), Harvey (1991, 1992), Heider (1976), Iino (1999), Punch (1986), Ruby (2000), and Watson-Gegeo, Maldonado-Guzman, & Gleason (1981). Finally, this article does not deal with telling the story crafted as a result of the research either through the video itself or through accompanying textual materials. For information related to issues of producing ethnographic videos or multimedia for public consumption, see Goldman-Segall (1998) and Heider (1976) and for writing the ethnographic text, see Golden-Biddle & Locke (1997) and Wolcott (1990).

An Ethnographic Approach

Ethnographic research focuses on the behaviors (including the linguistic behaviors) of the members of a particular community by studying them in naturally occurring, ongoing settings, typically while they participate in mundane day-to-day events. Its aim is to provide a thick description (Geertz, 1973) or a descriptive-explanatory-interpretive account of that community or some aspect of life within it, incorporating both an emic perspective, or the culturally specific framework used by the members of the community under study for interpreting and assigning meaning to their experiences, and an etic perspective, based on the academic frameworks, concepts, and categories of the researcher's discipline (Watson-Gegeo, 1988). The thick description of an ethnographic account is accomplished through a number of means. First, an ethnographic approach is holistic (Lutz, 1981), that is, the (linguistic) behaviors are investigated in the context in which people produce them and they are interpreted and explained in terms of their relationship to the entire system of which they are a part (Watson-Gegeo, 1988). Second, it involves prolonged or intensive fieldwork in the community under study, which allows time for the researcher to become socialized into the community, to build trust with the participants, to observe the phenomenon under investigation repeatedly so as to gain some idea as to its degree of typicality and its range of variation, and to test information and analysis for accuracy (Asch, 1992; Corsaro, 1982; Davis, 1995; Erickson, 1986, 1992; Heider, 1976; Lincoln & Guba, 1985; Lutz, 1981; Watson-Gegeo, 1988; Wolcott, 1995). Third, it involves triangulated inquiry; gathering naturalistic data using a variety of techniques -- participant-observation, field notes, audio and/or video recordings, interviews and so forth -- from different sources (Diesing, 1971; Sevigny, 1981) and checking it with various members of the community (Corsaro, 1982; Davis, 1995; Lincoln & Guba; 1985; Goldman-Segall, 1993, 1995, 1998; Tobin, Wu, & Davidson, 1989) or even with outsiders who come from other communities (e.g., Tobin, Wu, & Davidson, 1989) in order to get multiple perspectives or points of view on a particular behavior, event, or phenomenon. This triangulation in the process of interpretation of data, as well as in the collection of them, builds in layers of description, thus yielding a thicker description and increased credibility or validity (Goldman-Segall, 1995, 1998). Data analysis, which is typically done with words or textual data rather than numerical data (Miles & Huberman, 1994), begins as soon as the researcher selects a problem to study and continues throughout the project until the last word of the report is written (Fetterman, 1998). That is, data analysis is ongoing and early findings are used to guide subsequent observations in terms of what is being investigated and how that investigation is carried out.

In contrast to experimental research, the purpose of an ethnographic study is to focus on the community in question, and to gain insights regarding how it works with respect to the issues under investigation; the purpose is not to generalize beyond it to other communities. However, it is often the case that comparisons based on ethnographic studies of two or more communities can be made on a more abstract level. This kind of comparison is referred to as ethnology (Davis, 1995).

In the past, European and North American anthropologists went off to distant lands to study exotic cultures in Latin America, Africa, Asia, the Pacific Islands, and the Arctic. Therefore, the anthropologist was always an outsider with respect to the community under study. Today, there are both Western and non-Western ethnographers and both often study their own cultures; consequently, they are often cultural insiders. In the study of second language acquisition, there are at least two communities under investigation, the native language community of the learners and the target language community, though in foreign language settings, the target language community may be more imagined due to its lack of a physical presence. The ethnographer may be an insider to one community and an outsider to the other. I will discuss this further after first describing the context of the study that I draw on in this article.


The study (DuFon, 2000) I conducted using an ethnographic approach investigated the acquisition of linguistic politeness in Indonesian by foreign language learners. These learners were six undergraduate students from American universities who were studying the language during a four-month study abroad program in Malang, East Java, Indonesia during the fall semester of 1996. All were between the ages of 20 and 22. Three were male and three were female. Three were absolute beginners and three were intermediate level learners. Four were Americans, who spoke English as their native language. Two of the women were Japanese nationals, who spoke Japanese as the first language; they were also fluent speakers of English. All lived with Indonesian host families in Malang, the "second city" of East Java. They all studied in daily language classes with their teachers at IKIP-Malang (now Universitas Negeri-Malang) as well as in weekly sessions with a private tutor. All agreed to participate in the study, which involved a number of obligations including, but not limited to, the following: a) allowing me to accompany them on a tutoring session, audio taping the interaction; b) allowing me to videotape them during a naturalistic interaction of their choice once during the course of the program; c) audio taping themselves in a minimum of nine naturalistic interactions with native speakers of Indonesian during the four-month program; and d) keeping a journal on what they learned about politeness in Indonesian through their interactions with Indonesian native speakers, including those which they had audio taped.

Like the learners, I was a cultural outsider to the Indonesian community, easily identifiable as such by immutable characteristics such as my body height, skin and eye color, and nose shape, as well as behaviors such as foreign accent, posture, and gait, which with time and attention might increasingly conform to Javanese Indonesian norms. In contrast, to the learners, at least the American ones, I was to a large extent a cultural insider. I had been a student in the COTI Program (Consortium On the Teaching of Indonesian, now Consortium On the Teaching of Indonesian and Malay, COTIM), a study-abroad program for advanced learners of Indonesian, four years earlier in 1992. The 1992 COTI program was held in Malang at the same college and shared many of the same host families. Therefore, I shared with the four American learners a common national background and consequently a considerable amount of common cultural background. With all six learners, I shared a similar study-abroad experience. Yet I was not a total insider either. I was a generation older than the learners and my status was that of researcher, not student in the program. Consequently, our privileges, rights, and obligations were not the same, and our experiences were not identical. Factors such as these affected the ways in which I could participate in the two communities involved in the study -- the Indonesian native speakers and the foreign learners of Indonesian -- and the extent to which I could integrate into them.

With respect to videotaping, although I had previously conducted ethnographic research, I had used only audiotape. Videotaping was new to me, and I was not comfortable with it. Nevertheless, I found that the videotaping that I had collected under good conditions yielded valuable and usable data. Videotaping was often a challenge given the conditions of the situation, and obtaining valid data on tape was not a given. In the following sections, I discuss issues related to the problem of obtaining valid video data.


There are a number of advantages to video recording in ethnographic research. One advantage is the density of data that a visual recording provides (Grimshaw, 1982a). In an ethnographic approach to research, we seek to study real people in real situations, doing real activities. Video recorded data can provide us with more contextual data than can audio recorded data (Gass & Houck, 1999; Iino, 1999). They can give us a more complete sense of who the people are, and acquaint us with the setting in which the people function and the types of activities they engage in from day-to-day as well as the nature of these activities themselves. In second language studies, not only does video recording enable us to accurately identify who is speaking, but also it provides information about posture, gestures, clothing, and proxemics, which inform us regarding native speaker norms with respect to these features and the degree to which the learners conform to them, which in turn provide us with some information concerning the extent to which the learners have been socialized into the target language community. Gestures, facial expressions, and other visual interactional cues also provide important information both on the negotiation of meaning and the negotiation of affect. Non-native speakers, especially those whose linguistic means are limited, may rely extensively on extralinguistic means, as well as linguistic and paralinguistic means, to convey both their referential message and their relational message (Gass & Houck, 1999). Furthermore this kind of visual information can help us to disambiguate verbal messages by narrowing down the possible number of accurate interpretations (Iino, 1999). Finally, the visual information in videos also provides information on directionality and intensity of attention, which can be particularly useful in determining the levels of comfort and involvement of the interlocutors (Gass & Houck, 1999). These kinds of visual contextual information, then, can enrich our data base in many ways.

Video (as well as audio) recording also provides us with denser linguistic information than does field note taking, for ideally it allows us to record every word. When taking field notes, the researcher is limited to writing down the gist of what the interlocutors said, or recording only brief interactions consisting of a few short turns because of constraints on memory and the inherently slower speed of writing as compared with speaking (Beebe and Takahashi, 1989).

Another advantage of video recording is permanence (Grimshaw, 1982a), which allows us to experience an event repeatedly by playing it back. With each repeated viewing, we can change our focus somewhat and see things we had not seen at the time of taping or on previous viewings (Erickson, 1982, 1992; Fetterman, 1998). Replaying the event also allows us more time to contemplate, deliberate, and ponder the data before drawing conclusions, and hence serves to ward off premature interpretation of the data. Even a rare event, when captured on tape, can be replayed repeatedly for a thorough analysis so that it can still be studied intensively. Real time observation does not have this advantage (Erickson, 1992).

Nevertheless, the amount of information contained in ethnographic footage -- the unedited videotaped material of a particular event (Crawford, 1992) -- is necessarily limited, and we need to bear these limitations in mind. First, the information is limited in that the videotape itself tells us nothing about statistics, that is, how typical this event is. Is it a frequent event or an unusual event or a unique event? That kind of information must be supplemented by the ethnographer, who has spent sufficient time in the field as a participant-observer, triangulating with other methods of data collection in order to know something about the frequency (as well as other characteristics) of the event being recorded (Corsaro, 1982; Erickson, 1992; Hastrup, 1992; Heider, 1976).

Second, a video is limited because it can capture only what is observable. The unspoken thoughts and feelings of a participant cannot be seen or heard on the tape. They might be guessed at or inferred, but if a participant is successful at dissembling, the inference will not be accurate. One advantage of video, however, is that it can be played back to the participants (e.g., Corsaro, 1982; Erickson, 1975, 1982; Erickson & Schulz, 1982, Fiksdal, 1988; Iino, 1993; 1996; Watson-Gegeo et al., 1981) in order to attempt to get them to recall and describe their thoughts, feelings and reactions at different points in time during a given event, thus giving us information about the unobservable.

Third, videotaping only allows the event to be experienced vicariously. It does not allow for hypothesis testing in the way participant-observation does. With participant observation, one can test out emerging theories in the field by trying them out, thus giving an idea of what is acceptable (Erickson, 1992). Still, video data can also provide a means of hypothesis testing. By showing clips to others, both cultural insiders and outsiders, and asking them pertinent questions about what was said or done, what ought to have been said or done, and how they assess or interpret the behavior, hypotheses can be developed and tested to some extent (e.g., Tobin, Wu & Davidson, 1989). Furthermore, modern video, computer, and Web technologies have made it possible for many people to engage in collaborative theory construction in order to strengthen the findings of one researcher or a small group of researchers; this is what Goldman-Segall (1995, 1998) refers to as configurational validity. It is based on the belief that the collaborative construction of theory that results from the participation of many diverse persons in viewing and commenting on the video adds strength to a study by adding layers of interpretation and weaving a thicker description than could be accomplished by one analyst or a few analysts alone. This is because each person's interpretation is necessarily limited by his or her own experience; therefore multiple points of viewing help to offset those limitations and increase the validity or credibility of the study (Asch, 1992; Goldman-Segall, 1995, 1998; Iino, 1996).

The limits of one's perspective also affect the videotaping in another, more physical way. Although a video camera can capture a great deal of both auditory and visual information, it nevertheless confines the view. Therefore, in spite of the sense of being there that a film can provide, it does not show every observable thing that happened, but only that which was occurring within the range of the camera lens (Fetterman, 1998; Heider, 1976; Watson-Gegeo et al., 1981). This limitation can be overcome to some extent by using multiple cameras and filming the event from various perspectives simultaneously, if the research budget and human resources allow, and if it is not too intrusive for the setting. For example, Bottorff (1994) conducted a study on nurse-patient interactions using two video cameras, mounted on tripods, with remote control pan/tilt and video switcher. This enabled her to tape interactions in private hospital rooms from two perspectives with less movement and obtrusiveness and still keep both the nurse and patient in view regardless of where the nurse was standing. Even when two or more cameras are used for recording, however, when the videotape is viewed usually only one of these perspectives will typically be seen at a time, though split screen or picture-within-a-picture viewing are possible with modern technology. In many studies, however, only one camera is used at a time (e.g., Corsaro, 1982; DuFon, 2000; Iino, 1993, 1996, 1999; McMeekin, forthcoming) and even those in which two or more are used (e.g., Bottorff, 1994) some data are lost because part of the activity falls outside the range of the lens. Even the human eye has a limited range of view and cannot take in everything that is happening in a scene; still the angle of view of the human eye is wider than that of the lens of the video camera.

Given these limitations, the videotape will not be able to portray a complete picture, revealing everything there is to know about an event. Keeping this in mind, how can we obtain video data that will best enable us to construct a valid account of the phenomenon in question? This question entails a number of other questions, three of which will be addressed here: a) How should the interaction be videotaped? b) Who should be included in the videotape? and c) Who should do the videotaping? It is to these questions that we now turn.


The Literature

The question How should the interaction be videotaped? has been hotly debated in the field of visual ethnography (e.g., Biella, 1988). There are various schools of thought on this issue. Exactly how the event is portrayed on videotape depends on the purposes for which the film is being recorded. When a video is being recorded for educational or commercial purposes, cinematic and artistic concerns increase in importance (see, e.g., Heider, 1976; Rollwagon, 1988). Heider advocates putting ethnographic concerns first in any case in which one is producing an ethnographic video. Compromising the standards of ethnographic research for artistic concerns may produce an interesting an aesthetically pleasing video, but a less informative one for ethnographic purposes. For research purposes, however, when the film is going to be used only for analytical and possibly for playback purposes rather than for audience consumption, a greater degree of technical imperfection can be tolerated and I believe the way to proceed is clearer. For research purposes, it is generally advocated that one a) shoot whole events using long-takes, b) with wide angle views, and c) without manipulating the setting, the participants, or the script, that is, what the participants say (Asch, 1992; Balikci, 1988; Corsaro, 1982; Heider, 1976; Watson-Gegeo et al., 1981).

Whole events, or at least complete sequences of activities within events, are necessary if one is to determine the structure or organization of the event. Filming whole events is particularly crucial in studies focused on pragmatics and discourse because the interpretation of the meaning of any given utterance is influenced by what has come before. Having a recording of only parts of an event could make it difficult to judge the appropriateness of a comment, question or response.

What is a whole event? We can look to Blum-Kulka (1997) for an example. In this study of Israeli, American, and American-Israeli dinner talk, whole dinner conversations were videotaped from the beginning to the end. One could also videotape only part of the dinner conversation, obtaining a sample of dinner talk. Such an approach would not give a complete picture. For example, if a middle segment of a dinner were filmed, we would not know how the family comes together, whether or not they say a grace, how food is served and by whom, how the family breaks up after the meal, who leaves the table first, whether or not permission is asked to leave, and so forth. Therefore, the best option, particularly when analyzing the discourse of an event is to have a complete record of that event on videotape from start to finish. Recognizing the boundaries of a particular event are not always clear (Corsaro, 1982; Erickson, 1992; Heider, 1976). Because of this, Erickson (1992) recommends that taping begin a few minutes prior to the beginning and continue for a few minutes after it has ended. For example, Blum-Kulka (1997), who was already familiar with the culture, videotaped the dinner conversations from the point at which the family began to gather around the table to the point at which they left the table rather than waiting until they were all seated and eating before beginning. In this way, she was able to capture the complete event.

More context can be created for the viewer by videotaping a larger area for the viewer than what will appear in front of the lens once the actual event begins. For example, to continue with the dinner event, the video camera could pan the entire dining room, perhaps even show the other public space in the house to give the viewer a better sense of the family and their environment. One might even begin videotaping in the neighborhood as one approaches the house, then videotape parts of the inside, and finally the dining room itself. Alternatively, one might videotape the food being prepared in the kitchen and then being brought out to the dining room table. Another possibility is to videotape one or more of the family members prior to and following the dinner so that we might better see how that dinner fits into their lives.

Some ethnographers (Erickson, 1992; Heider, 1976) recommend the wide-angle view because this view gives the viewer a better sense of the big picture. Furthermore, since we are concerned with the socio-cultural context, we need to view not just the learner but also the other speakers that the learner is interacting with (see Corsaro, 1982; Goodwin, 1993). The wide-angle view allows us to see all (or at least a greater number) of the participants in the speech event; furthermore, it allows us to see whole bodies, not just faces, and hence to capture body language. We can see how the participants are responding to any given speaker at any given moment in time, both linguistically and extralinguistically. There may be times, when we want to zoom in closer. For example, a close up view would enable the viewer to see facial expressions better; this would be particularly useful when the interlocutors are few in number, thus reducing the physical area that needs to be captured in the film. It is also useful to zoom in on an object or picture that is the subject of conversation in order to have a better idea what the interlocutors are talking about. Nevertheless, because close-ups on the speaker's face can cause us to miss the non-vocal responses of his or her listeners, it is generally argued that they should be used sparingly and avoided except when they help us to attend to details that are lost in the whole. When close-ups are used, they should be preceded and followed by contextualizing wide-angle shots in order to give a better sense of the whole and the context in which the close-up expression took place (Heider, 1976). When using two or more cameras, one camera can take a wider view while the other focuses close up. However, in many cases, the ethnographic researcher has but one camera to work with and must make a choice concerning the angle of view.

Some visual anthropologists may disagree and perhaps even feel that angle of view is not an issue. Goldman-Segall (1998) relates that she initially held back and filmed from a distance, for fear of being too obtrusive, thus filming with a wider angle of view. However, after observing another ethnographer move in close with the camera, she changed her filming style. In her case, however, she was actually investigating the relationship between digital media and children's thinking. In such a case, moving in close worked well. In a study such as Bottorff's (1994) investigation of nurse-hospital patient interactions, moving in close to the center might have been far more intrusive and far less appropriate. In determining the angle of view, then, one needs to consider the purpose of the research, the role of the researcher, and the comfort and safety of the participants (see Grimshaw, 1982a).

Finally, ethnographic studies, which by definition study participants during naturally occurring events, must avoid any manipulation of the participants in terms of what they say or the physical setting in which they are interacting, as this would compromise the naturalness of the situation.

My Field Experience

Because I was interested in the interaction between native speakers and learners of Indonesian, I wanted to be able to capture both verbal and non-verbal reactions. Although I wanted to focus on linguistic politeness, I also wanted to use the videotape to give me a picture of the larger context that the learners were interacting in. I was able to create some context with the videotape, for example, by filming scenes of a village that we visited, zooming in on a particular house before entering it, and panning across the inside of the public areas of the house before focusing in on the space where the primary interaction was to take place. I also wanted the videotapes to provide me with non-verbal information that would be useful in interpreting the interactions between learners and native speakers in terms of politeness norms (both linguistic and extralinguistic). Therefore, I wanted wide-angle shots that could get whole bodies of all the participants involved in an interaction. Furthermore, I was interested in videotaping whole events in order to have a better sense of what the event consisted of, to be able to better interpret later utterances in relationship to what had gone before, and to observe learner and native speaker behavior throughout the entire event. To be less obtrusive, I tried to position the video camera at some distance from the participants. To keep the situation naturalistic, I tried not to manipulate the setting or what the participants said.

However, putting the theory into practice was not always successful. First, videotaping whole events requires using a lot of videotape. The videotape I needed for my camera was not always readily available in the city in Indonesia where I conducted my research. Therefore what I had, I rationed. Also, I had only one regular and one back-up battery. The two batteries enabled me to video record for several hours, but not an entire afternoon. Consequently, I was not always able to video record the learners the entire time that I was with them. Nevertheless I was able to capture whole segments of events, or events within events, on tape. For example, while I might not have videotaped an entire visit to a village on tape, I was able to record an entire meal plus the conversation that preceded it and some that followed it while we were in the village.

The biggest difficulty I had was in obtaining wide angle shots. Because of financial constraints, I had purchased low-end equipment, a SONY Handycam Video 8 CCD-TR330E, which did not have as wide an angle of view as some of the more expensive models. While the camera had performed adequately in the store, I found its viewing angle too limiting for the field, where I was working in very small spaces. It was not always possible to back up far enough in order to capture all the participants within the range of the camera lens. Therefore, I either opted to focus on the learner and those closest to him/her or to move the camera lens back and forth. Thus I was not able to simultaneously capture what the learner was saying and how others were reacting at all times. Had I bought a better camera with a greater range of view in the zoom lens, I would have been able to overcome this problem.

Since I had only one camera, I was limited to recording from one perspective at a time. I was further limited in my options regarding where to place the camera. For example, in one interaction, I video recorded the learner during a cooking lesson in the kitchen. The kitchen was small and had solid walls on three sides. That meant I could only shoot from the doorway. For most of the interaction, I faced the participants' backs as they worked at the counter and the stove. Because of the practical problems involved in finding a perspective that would work at all, I found I could devote little attention to the theoretical and methodological issues related to selecting my angle of view. With the small confining spaces in many Indonesian homes, the choice of perspective is necessarily a practical one.

Sound was another problem. Again, to save money, I had not purchased external microphones. It did not seem necessary because when I tested the camera in the store, it picked up the sound quite well. However, when I was in the field, recording participants in open spaces or at greater distances from the camera, the built in microphone did not always pick up their voices well. I compensated somewhat for this by simultaneously audio recording with a small cassette recorder. A wider angled lens would have allowed me to get closer to the participants and therefore get better sound quality with the built-in microphones. Ideally, external microphones would have been used.

In order to keep the interaction naturalistic, I made every effort not to manipulate the setting in any way. Again the low-end equipment forced me to make a few compromises here. For example, I once asked the participants to move closer together and to change their angle of seating so that I could get better light and sound quality. In other cases, I found that something like closing a curtain could improve the picture quality by reducing the backlighting effect. Nevertheless I was often reluctant to make even small changes such as these because they reduced the naturalistic quality. The result was sometimes a poor quality picture.


It is clear from my experience that buying a low-end camera with no external microphones was counterproductive. Buying a mid-range camera and external microphones was difficult for me to reconcile with my extremely limited budget. Nevertheless, in spite of the financial hardship, I feel it would have been worth it to invest several hundred dollars more in better equipment, as it would have paid off with much better quality in picture and sound in the end.

Ideally, I would have liked to try out the video camera in the field prior to purchasing it, buying one that guaranteed me satisfaction or a refund or exchange. However, this was not practical in my case. Because of cost and quality concerns, I needed to buy the video camera before I had the opportunity to enter the field and to see what it was actually going to be like to videotape there. Therefore I had no opportunity to pre-test the camera before I actually needed to use it. For that very reason, I should have bought a more expensive camera and external microphones because it would have given me more flexibility in less than ideal situations.

In future field studies, I would, based on previous experience in that same field, try to envision from the perspective of a videographer the situations in which I might be video recording and the conditions obtaining in those situations. I would think in detail about the size of space that might be available, the spatial arrangement (e.g., where walls and other barriers that might limit angle of view are), the characteristics of the setting that might affect the acoustics, and the lighting that might be available. Then, before leaving my home territory, I would try to videotape under similar conditions, using these simulations to test out the equipment under field conditions. When unsure of the conditions that I might encounter, I would give a strict test to the equipment. That is, I would try it out in small, cramped quarters with limited angles of view, poor acoustics, high ambient noise, low light, or highly variable light sources. If there were any possibility that I might be video recording outdoors, I would test the video recorder and the external microphones outdoors as well in order to verify their ability to pick up voices outdoors at various distances. By following such a procedure, I would be more likely to arrive in the field with a camera that would perform adequately for the needs of my research project. I would not have to concern myself as much with the practical concerns of obtaining quality footage and this would free me to deal more with the theoretical and methodological concerns of videotaping.

Furthermore, I think that I would be more flexible and feel a little bit freer to make minor adjustments in the setting rather than stick to a hard and fast rule not to manipulate the setting in any way. For example, if during a visit to someone's house, drawing a curtain would significantly change the lighting so as to cause a major improvement in the quality of the picture, I would most likely be willing to draw the curtain, expecting that it would have little, if any, effect on this type of interaction in most cases and hence would not destroy the naturalness of the interaction. In other words, the benefits of this kind of small manipulation would outweigh the costs.


The Literature

In an ethnographic study of second language acquisition using videotaping as a method of data collection, the learners and their interlocutors would of course be among those who are video recorded. However, another question to consider is whether or not the researcher should also be filmed. More specifically, should the ethnographer/videographer be video recorded while doing his/her research tasks including that of video recording? This is another area of considerable debate in the field of ethnographic filmmaking. Filmmaking involves a producer, a process, and a product. Films are categorized into three major categories -- observational, participatory, and reflexive -- according to the emphasis they give to these various dimensions.

Observational films emphasize the product, in this case, the film or videotape. In observational filmmaking, the camera acts as a "passive" recording device, meaning that it allows the viewer of the film to see the events as they unfold and to let these events "speak for themselves" (Crawford, 1992, p. 78). The researcher and videographer remain behind the camera lens and do not appear in the films themselves. Thus the viewer sees the events almost as if the camera and the researcher were not there (Rollwagon, 1988), giving the film a more "objective" quality (Ruby, 1980). In some cases, the researcher may not actually be there. For example, in his study of dinner table conversations between Western learners of Japanese and their Japanese host families, Iino (1996, 1999) used what he refers to as the "remote observation method" (Iino, 1996, p. 116). He set up the camera in the dining area and then left the scene so that his presence would not affect the interaction between the learners and the host families. He chose this approach after first trying to operate the camera himself during the dinners. However, he discovered that his presence encouraged the host families to interact with him, who like them was a native speaker of Japanese, and the language learners became mere observers at the dinners. Iino's leaving the scene encouraged the families to interact with the learners. Such an option is possible when the participants remain stationary for the duration of the event being video recorded. Obviously, when the participants are mobile, remote observation is not an option.

Participatory films focus on the producer as well as the product. In participatory film, the ethnographer and/or videographer step out in front of the lens, thus allowing themselves to be seen by the viewer, reminding their audience that that their presence is having an effect on the course of events (Collier, 1988; Rollwagon, 1988). One study in second language acquisition research that used this approach is that of Blum-Kulka (1997) in her study of family dinner conversations in American Jewish families, native Israeli families, and American immigrant families in Israel. In all three cases, the researcher was invited to join the dinner table as a matter of course, and in all three cases, the presence of the researcher affected the course of the interaction; however, the ways in which it did so varied according to the particular cultural group. In Israeli society, where symbolically minimizing social distance is more highly valued, the participant-observer took a more participatory role, divulging personal information and even taking sides in conflicts, all with encouragement from the participants, whether in the homes of natives or immigrants. In contrast, in American society, with its scientific tradition in the social sciences that insists that observers be as unobtrusive as possible, the researchers felt that they should not participate any more than necessary. Likewise, those American participants being observed maintained a non-intimate relationship with the observer and did not encourage more than minimal interaction.

A third style is reflexive filmmaking, which gives attention to the process as well as to the product and the producer (Ruby, 1980, 2000). In reflexive filmmaking, ethnographers reveal not only themselves as producers and the subjects of their study in the product, but also focus on the process. In the film itself, they try to reveal their methods of inquiry. Proponents of the reflexive style of filmmaking contend that in the production of an ethnographic film, one of the most important things that is happening is the fact that the film is being produced at all. Thus it is important to allow the viewer to see not only the people and events that are the subject of the film, but also to see the researcher-producer and to show how the film was made and the data collected (Banks, 1992).

The first ethnographic film using this style is believed to be Jean Rouch and Edgar Morin's (1961) Chronicle of a Summer, which was influenced by the work of avant-garde Soviet documentary director Dziga Vertov's (1928) Man with a Movie Camera (Ruby, 2000). Rouch and Morin explored the thoughts and feelings of the people of Paris during the Algerian War. One or the other of the filmmakers often appeared before the camera, thus making their role in the process clear to the viewers. After editing some of the footage, they played back the rough cut to some of the participants and interviewed them regarding their reactions to the film and to how they had been portrayed. These playback interviews were also filmed and added on to the rough cut. Finally, discussions between the two filmmakers in which they evaluated the film were filmed and added to the final version. Even today, Chronicle of a Summer is considered one of the best examples of reflexive ethnographic filmmaking because it was designed precisely for that purpose (Ruby, 2000).

Reflexivity is perhaps a more important issue when ethnographic footage is used to produce films for public consumption than when it is viewed strictly by the ethnographic researcher, who was present at the event and who has field notes available on the methodology. Nevertheless, a reflexive approach might be useful when, for research purposes, the film is shown to others for their points of view, as recommended by Goldman-Segall (1995, 1998), which with modern multimedia technology is a procedure that is likely to increase in frequency. A reflexive approach could also be useful when a long time has elapsed between the videotaping and the analysis of data from the videotape in that it might remind the researcher of some of the details of the data collection procedure that have since been forgotten.

Thus ethnographic SLA researchers have decisions to make regarding the extent to which they want to observe, to participate, and to focus on their methodology in their video recording. These decisions should be based on sound theoretical grounds. Nevertheless, the degree to which one chooses to participate or observe will likely be influenced not only by theory but by the other roles the ethnographer plays in the community (Grimshaw, 1982a) and by cultural preferences as well (Blum-Kulka, 1997).

My Field Experience

The extent to which I participated in any taped interaction varied according to the situation. I was always present at the events that I video recorded for this study. I did not use the remote observation method. One reason for this choice is that usually the activities that were video recorded involved some movement or change of setting from time to time; that is, the participants were not stationary for the entire duration of the event. Another factor influencing my level of observation versus participation was the pressure applied by Indonesians for me to participate. Although originally it was my intention to remain behind the camera lens, I found that the Indonesians were not always content to have me there. They frequently coaxed me to join them in their activity, to the point that it felt rude not to accept. Consequently, I often became a participant in the interactions I was observing.

As with the other second language acquisition studies discussed so far, my presence as a researcher, a more advanced learner of Indonesian, did have an affect on the interaction that occurred between Indonesian native speakers and the learners in my study. In some cases, my presence had an inhibitory effect on the learners' acquisition of Indonesian and in other cases it had a facilitative effect. When I was actively involved as a participant, the Indonesians tended to talk with me and to pitch their language to my level of comprehension rather than to that of the less fluent learner. This was particularly noticeable in one interaction. Since much of the conversation was beyond the learner's comprehension, he was less able to participate. His role, as also happened in Iino's (1996, 1999) study, was reduced to that of observer for much of the time.

On the other hand, there were positive effects that resulted from my presence. Most Indonesians are bilingual, speaking a local language as their home language and Indonesian as the language for inter-group communication. Since this study took place in Java, most people speak Javanese as their home language. Javanese is often used even in the presence of non-Javanese speakers. However, my presence added to the number of non-Javanese speaking participants present in multiparty interactions and therefore most likely increased the amount of Indonesian used as the medium for communication.

My presence as a researcher also seemed to cause some participants to make a greater effort to use Indonesian (rather than Javanese or English) as the medium of communication than was typically the case. Thus the recorded interactions were atypical in some respects; at the same time, however, they promoted language acquisition by increasing the amount of input (some of which was comprehensible) in Indonesian available to the learner than might have otherwise been the case.

Another positive aspect of my presence was that it allowed the learners to observe variations in the pragmalinguistic and sociolinguistic behavior of their Indonesian interlocutors as they interacted with different non-native speakers of the language. This kind of exposure helped to promote the learners' acquisition of pragmatics and sociolinguistics in Indonesian language. For example, during the play back interview, one learner commented that he noticed that his tutor frequently addressed me and an Indonesian friend of mine with vocatives using our names preceded by a kin term whereas his tutor almost never did this with the learner or with family members. (Later, in checking the entire video recording from that day as well as the audio recordings from other days, I was able to verify that this observation was accurate.) The learner then began to speculate as to why that might be so and eventually concluded that it had something to do with the social distance between his tutor and the various interlocutors in question. While social distance is not the only factor that affects the frequency of terms of address, it is indeed a key factor (DuFon, 2000, in press).


In reflecting back on my videotaping experience, I believe that at the time I first entered the field, I was hoping to remain as "objective" and unobtrusive as possible by maintaining my distance as dictated by American scientific tradition in the social sciences (Blum-Kulka, 1997). Pressure from the participants to participate and to join into the activities, however, caused me to reconsider and to behave in a more sociable manner. Consequently, I sometimes appeared before the camera lens once I got the camera set up for any activity that did not involve frequently moving from place to place. Other pressures were more internal. On one hand, there were times when I wanted to be a participant and to join in the fun rather than be an observing researcher at a distance from the activity. On the other hand, I wanted to be a good ethnographer and not forget the task that had brought me there in the first place. Therefore, I needed to monitor myself and the effect my participation was having on the interaction of the other participants.

One step I took to monitor the effect of my participation on the interaction was to critically view the videotapes shortly after the taping. With the more proficient learners, I could see that my participation did not seem to have an adverse effect on their participation, as they could remain active participants in the conversation. With the beginning learners, however, my participation sometimes increased the level of the discourse beyond what was comprehensible for them so that they were reduced to observers for some segments of the interaction. I used this information to make adjustments in my degree of participation in the next interaction.

In future studies, I think I would spend more time planning the videotaping events, carefully considering which approach to videotaping I thought would be best to take for each videotaped event given the nature of the event itself, the learner and his or her proficiency level, and the relationship of the various participants to each other and to me. I would try to imagine how each approach might affect the outcomes in terms of the learner's access to comprehensible input and opportunity to speak during the interactions with native speakers as well as the native speakers' reactions to my behaviors in terms of politeness norms. I might even ask native speaker researchers about what they would suggest I do in various situations (e.g., when invited to eat a meal with the rest of the guests) and how they thought people might react to different responses (e.g., refusing an invitation to eat in order to continue controlling the camera). By considering each approach and its consequences beforehand, my decision would be more strongly grounded in theoretical and methodological principles rather than just practical considerations. Nevertheless, there are practical considerations and I would want to remain flexible and perhaps change my pre-decided approach in a given situation depending on circumstances. Having carefully thought about the consequences beforehand, however, would give me a better idea of what to expect as a result of my decision.

In future projects, I might check on the effect of my participation in a number of other ways. One way would be to experiment with different approaches of the same kind of interaction with a similar group of interlocutors and then view the tapes to see how my physical presence or active participation had affected the interaction and access to input. I would also consider asking one or two assistants or friends to view the recording with me and to comment on the effect that they felt that I was having on the interaction.

If I were to try remote observation, I would view the video tapes as soon as possible after the taping in order to determine whether the tape was being shot as I had wished (e.g., whole events, wide angle views), whether there were any technical problems that needed to be ironed out, and whether the participants were indeed remaining sufficiently stationary for this technique to work.


The Literature

The third question under consideration here is Who should do the filming? There are several options: the researcher, a videographer, or the participants. The first option is to have the researcher double as the videographer. This has the advantage of using a minimum number of human resources, thus cutting down on costs and intrusion by outsiders at the events being recorded. It also gives the researcher greater control over the filming process. On the other hand, while the researcher is actively occupied with operating the camera, he or she may not be able to attend to other research tasks (e.g., taking field notes) and his or her view of the event will be constricted to what can be taken in by the lens. Likewise while operating the camera, the researcher may be unable to attend to the required social tasks (e.g., properly greeting a participant) in a given situation.

A second choice is using a professional videographer. This approach brings with it technical expertise in filming and a better quality product from a technical standpoint. It likewise frees the researcher to do other activities such as note taking or to view the big picture rather than be confined to the limited view of the lens. There are several disadvantages with this approach. It can be more costly and more intrusive since more people are involved. Also the videographer may have different goals in mind than the researcher. For example, the videographer may want to produce a film that is more interesting and artistic, while the ethnographer may want something that is more complete and accurate in the story it tells (Heider, 1976).

The third choice is to have the participants of the study do the video recording. This option might be particularly enlightening when the learners in the study are from a different cultural group than the researcher because significant differences in video recording behavior have been found across cultural and sub-cultural groups (e.g., Chalfen, 1981, 1992; Collier, 1988; Faris, 1992; Hughes-Freeland, 1992; Worth & Adair, 1972). Worth and Adair (1972) are generally recognized as the innovators of this approach. They gave cameras and film to Navajos, and spent two days teaching them the basics of camera operation (i.e.,  how to load and unload film, and how to achieve proper exposure, good focus and so forth). They did not teach or discuss aspects related to the content of the film or the art of filming so as not to influence them in any way in terms of what they thought was a proper film. They then gave the Navajos film and sent them off to produce their movies. What they found was that there were some startling cultural differences between their films and the films of mainstream American society. One notable difference was that in films about traditional Navajo society a high proportion of the time (roughly 75%) was spent walking. In mainstream films, walking is usually viewed as a bridge between activities or places, and the amount of time that this transitioning is depicted in mainstream films is relatively short. For the Navajo, walking is not a transitioning, but rather is the activity itself. Another clear difference was the avoidance of close-ups of the head by the Navajo filmmakers. Close-ups tended to be cut-off at the head, or showed the head with the face turned away from the camera. The close-ups of the face that did occur were of short duration and limited in function. For example, a certain pose "with the eyes looking slightly upward -- sort of staring inwardly" (Worth & Adair, 1972, p. 152) was used by filmmakers to indicate that the person was thinking about something. A third difference was that Navajos were extremely reluctant to film anything (e.g., horses, sheep, houses) that did not belong to them.

Chalfen (1981, 1991, 1992) conducted a number of studies comparing the photographic habits of people according to their cultural or sub-cultural group. In one study, Chalfen (1991) compared the still photos of two Japanese American families with the photo albums of mainstream American families of European descent. He found some striking differences between the albums of Japanese-Americans as compared with Anglo Americans. For example, for the Japanese-Americans there were many more photographs of group membership, and these were broader in scope that what was typically found in Anglo American photo albums. Also, while both groups photographed happy social events such as birthdays and weddings, the Japanese-Americans took pictures of events and situations that Anglo-Americans typically did not, such as snapshots of funerals, people at church, and people working at work or school. He concluded that these Japanese-American albums emphasized the significance of family, work, achievement, group experience, and honor.

In another study, Chalfen (1981) used an approach modeled after Worth and Adair (1972) to compare the approach to filming taken by eight groups of Philadelphia youths ages 14 to 16 in four stages of film production: planning, filming, editing, and exhibition. He found significant differences in terms of the type of material that they selected for shooting, the photographer's relationship to the material, the patterns of searching and looking for material, and the patterns of narrating and telling the story of the film. Chalfen (1981, 1992), reports that these differences were influenced more by social class than by either ethnicity or gender. The higher socio-economic groups preferred a more observational and distant approach to filmmaking, which Chalfen notes, is associated with a position of power, that is, with someone who is calling the shots. In contrast, the lower socio-economic class preferred a more participatory approach, which is not associated with a position of power.

From this literature, it is clear that there is significant variation across cultures and subcultures with respect to how they view the process of video recording in terms of who should video record, and what should be video recorded and what should be avoided, and how one should go about the process of recording. Since the SLA researcher is by definition working across cultures, it is necessary to give some thought to whom should do the video recording as the decisions made could dramatically affect the output. For example, if the learners were from a different culture than the researcher, the events that they chose to videotape or the way in which they conducted the videotaping may be quite different from what the researcher would have chosen. This might yield some unexpected yet fruitful results. I do not know of any study in second language acquisition that has systematically invested the effect of participant recording versus their being recorded by the researcher, but it is a question that merits investigation.

My Field Experience

In my own study, I chose a middle ground between giving the learners control of the equipment and maintaining complete control myself. I controlled the camera, either by mounting it on a tripod and participating in the interaction, or by operating the camera myself from behind the tripod. Given the cost of the video camera, which, while low end, was not trivial (see Fetterman, 1998), I preferred to keep it in my possession and to be the one to operate it. However, I allowed the learners to choose the situation they wanted to be videotaped in provided that they chose a relatively private and quiet place where the camera could pick up the sound of their voices. The video recordings that I took were supplemented by audio recordings, over which the learners had greater control. I gave each of them a tape recorder and told them to record themselves approximately once every week or two in an interaction with a native speaker of Indonesian. I did give them some suggestions and guidelines so that they would have some idea of the kinds of events they might record and the conditions under which they needed to record (e.g., a relatively quiet environment without significant ambient noise).

I also exercised control in another way in that I asked the learners to change their recording habits to some extent part way through the program. For example, I noticed that because I had instructed the learners to get permission to record before they began, I rarely got any greetings, which typically preceded the request for permission to record. I then asked the learners to begin recording before they began the interaction, then to ask permission, and to erase the tape in front of their interlocutor if permission was not granted. Nevertheless, more often than not, the learners began recording sometime after the interaction had begun, and continued sometimes till the end, sometimes not, but the recordings are often missing the beginnings of the interactions, and hence they do not document whole events (see Heider, 1976, pp. 84-85), but only partial ones. This was unfortunate because greetings were a key feature of interest in my investigation. Yet they rarely occurred in the recorded data; typically they appeared on the tapes only when another interlocutor entered the scene in the middle of an ongoing interaction.

The decisions regarding who did the taping were motivated mostly by practical, rather than theoretical concerns. I had only one video camera and a small budget, thus I maintained control of the video camera. On the other hand, I had seven audio recorders -- one for each learner and one for me. I did audio record some of their interactions myself as I accompanied them on some of the events in which they participated but I wanted the learners to be able to record whenever they had a good opportunity to do so. I could not have collected as many recordings if I had needed to be physically present at all the events recorded by the learners.

Giving the learners the freedom to choose the interactions they recorded had both advantages and disadvantages. One advantage was that of taking individual differences in experience into account. The learners had somewhat different language learning experiences in Indonesia, and in this way they were able to record what they experienced rather than what I told them to experience. Secondly, this diversity gave me a good range of data. One disadvantage of allowing for this diversity was that it limited comparability. For example, one of the subjects, Minako, collected almost all of her data with familiar people, namely her host family and her tutor. The only time she collected data with strangers was when she was interviewing Indonesians for her course project. Another participant, Kyle, in contrast, collected a number of tapes with strangers and acquaintances that he did not know well. He often recorded a conversation with someone he had just met. In most cases, he had only one interaction with these people, and that was the tape recorded interaction. These differences in recording habits do not necessarily reflect differences in actual experience in terms of the learners' interactions with Indonesian native speakers. Minako reported that she did have interactions with strangers but did not feel comfortable asking strangers to allow her to tape record. Kyle on the other hand, recorded himself with strangers a number of times. He did not have any major reservations about asking them to allow him to tape. Thus the freedom given to the learners resulted in different recording patterns. These patterns may be more representative of whom the learners felt comfortable in recording rather than of their language learning experience as a whole. Thus, the problem of statistics (Hastrup, 1992) is evident here. I had to rely on other data sources such as learner journals and interviews in order to determine how representative the learner recordings were of their language learning experience as a whole.

In my study, I opted for a middle ground. I controlled the video camera, but let the learners choose the context in which they wished to be video recorded within certain limits dictated by the limitations of my video camera. I also gave them control over the audio recording.


As I conducted this study with its procedure of videotaping, I did not feel comfortable with a number of aspects of it. I was concerned with how the people being videotaped felt and did not want to be too intrusive. I had technical problems that needed to be overcome with better equipment and more experience operating it. There were also the practical concerns of protecting the equipment from the weather and thieves. Consequently I was not ready to be too experimental in my approach to videotaping, particularly when it involved giving up control of the equipment. As I have gained some more experience in videotaping, I feel somewhat more comfortable with all of these issues. I believe one direction for future research in SLA would be to experiment with having the various participants -- the learners and members of the target culture -- as well as the researcher do the actual videotaping. By putting people from the various cultural groups, who fulfill different roles in the study, behind the camera and seeing what kind of recording they produce, we might learn something both about the cultures involved and about their use of language. It is possible that they might select events that the researcher had not even considered but which may turn out to yield very interesting and revealing data. Furthermore, such an approach might provide insights into what taboos there might be in terms of what should not be filmed. Inadvertently violating these taboos could interfere with obtaining good interactions since the participants would likely be uncomfortable and might even refuse to participate.

One concern that I have is that in giving the control of the video recording to others, I risk not getting the kind of data I am hoping for or looking for. However, there is also the possibility that this approach would produce far richer results than if I did all the videotaping myself, providing me with greater insights into the target language culture as well as the learner culture. Therefore, it seems to be an approach to videotaping worth experimenting with in ethnographic second language acquisition studies.


This paper has initiated a discussion among SLA researchers of some of the theoretical and methodological issues as well as some of the practical concerns related to the topic of video recording naturalistic interactions in investigations using an ethnographic approach. Specifically three questions have been addressed which relate to obtaining valid video data on tape: a) How should the interactions be video recorded? b) Who should be video recorded? And c) Who should do the video recording? The academic literature in the fields of visual anthropology, educational anthropology, and ethnographic filmmaking, my field experience, and my reflections on that field experience have been presented in order shed light on these issues for SLA researchers using video recordings as part of their data collection procedures. Finally, some directions for future research on video recording in SLA research have been suggested.


I would like to thank Aneta Pavlenko for encouraging me to write this article and Gabriele Kasper and three anonymous reviewers for their comments on earlier drafts of this article.


Margaret A. DuFon is an assistant professor in the English Department (Linguistics Program) at California State University -- Chico. Her research interests include interlanguage pragmatics and second language socialization in Indonesian as a Second Language. She is currently working with Gabriele Kasper in producing video materials for teaching Indonesian pragmatics.



Asch, T. (1988). Collaboration in ethnographic filmmaking: A personal view. In J. R. Rollwagon (Ed.), Anthropological filmmaking (pp. 1-29). Philadelphia: Harwood Academic Publishers.

Asch, T. (1992). The ethics of ethnographic filmmaking. In P. I. Crawford & D. Turton (eds.), Film as ethnography (pp. 196-204). Manchester,UK: Manchester University Press.

Balikci, A. (1988). Anthropologists and ethnographic filmmaking: A personal view. In Jack R. Rollwagon (Ed.), Anthropological filmmaking (pp. 31-45). Philadelphia: Harwood Academic Publishers.

Banks, M. (1992). Which films are the ethnographic films? In P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 116-129). Manchester, UK: Manchester University Press

Beebe, L. M., & Takahashi, T. (1989). "Do you have a bag?": Social status and patterned variation in second language acquisition. In S. Gass, C. Madden, D. Preston, & L. Selinker (Eds.), Variation in second language acquisition: Discourse and pragmatics (pp. 103-125). Clevedon, UK: Multilingual Matters.

Besnier, N. (1994). The truth and other irrelevant aspects of Nukulaelae gossip. Pacific Studies, 17(3), 1-39.

Biella, P. (1988). Against reductionism and idealist self-reflexivity: The Ilparakuyo Maasai film project. In J. R. Rollwagon (Ed.), Anthropological filmmaking (pp. 47-72). Philadelphia: Harwood Academic Publishers.

Blum-Kulka, S. (1997). Dinner talk: Cultural patterns of sociability and socialization in family discourse. Mahwah, NJ: Erlbaum.

Bottorff, J. L. (1994). Using videotaped recording in qualitative research. In J. M. Morse (Ed.), Critical issues in qualitative research methods (pp. 244-261). Thousand Oaks, CA: Sage.

Chalfen, R. (1981). A sociovidistic approach to children's film-making: The Philadelphia project. Studies in Visual Communication, 7(1), 2-33.

Chalfen, R. (1991). Turning leaves: The photographic collections of two Japanese American Families. Albuquerque, NM: University of New Mexico Press.

Chalfen, R. (1992). Picturing culture through indigenous imagery: A telling story. In P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 222-241). Manchester, UK: Manchester University Press.

Collier, J., Jr. (1988). Visual anthropology and the future of ethnographic film. In J. R. Rollwagon (Ed.), Anthropological filmmaking (pp. 73-96). Philadelphia: Harwood Academic Publishers.

Corsaro, W. (1982). Something old and something new: The importance of prior ethnography in the collection and analysis of audiovisual data. Sociological Methods and Research, 11(2), 145-166.

Crawford, P. I. (1992). Film as discourse: The invention of anthropological realities. In P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 66-82). Manchester, UK: Manchester University Press.

Davis, K. A. (1995). Qualitative theory and methods in applied linguistics research. TESOL Quarterly, 29(3), 427-453.

Diesing, P. (1971). Patterns of discovery in the social sciences. Chicago: Aldine

Duff, P. (1995). Ethnography in a foreign language immersion context: Language socialization through EFL and history. TESOL Quarterly, 29(3), 505-537.

DuFon, Margaret A. (2000). The acquisition of linguistic politeness in Indonesian as a second language by sojourners in naturalistic interactions. (Doctoral dissertation. University of Hawai'i, 1999). Dissertation Abstracts International 60(11), 3985-A.

DuFon, Margaret A. (in press). The acquisition of terms of address in Indonesian by foreign language learners in a study abroad program in East Java. NUSA, 51.

Duranti, A. (1997). Linguistic anthropology. Cambridge, UK: Cambridge University Press.

Edwards, J. A., & Lampert, M. D. (Eds.). (1993). Talking data: Transcription and coding in discourse research. Hillsdale, NJ: Erlbaum.

Erickson, F. (1975). Gatekeeping and the melting pot: Interaction in counseling encounters. Harvard Educational Review, 45(1), 44-70.

Erickson, F. (1982). Audiovisual records as a primary data source. Sociological Methods and Research, 11(2), 213-232.

Erickson, F. (1986). Qualitative methods in research on teaching. In M. C. Wittrock (Ed.), Handbook of research on teaching (pp. 119-161). New York: Collier-Macmillan.

Erickson F. (1992). Ethnographic microanalysis of interaction. In M. D. LeCompte, W. L. Milroy & J. Preissle (Eds.), The handbook of qualitative research in education (pp. 201-225). New York: Academic Press.

Erickson, F., & Shultz, J. (1982). The counselor as gatekeeper: Social interaction in interviews. New York: Academic Press.

Faris, J. C. (1992). Anthropological transparency: Film, representation and politics. In P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 171-182). Manchester, UK: Manchester University Press

Fetterman, D. M. (1998). Ethnography: Step by step (2nd ed.). Thousand Oaks, CA: Sage.

Fiksdal, S. (1988). Verbal and nonverbal strategies of rapport in cross-cultural interviews. Linguistics and Education 1, 3-17.

Gass, S. M., & Houck, N. (1999). Interlanguage refusals. New York: Mouton de Gruyter.

Geertz, C. (1973). Thick description: Toward an interpretive theory of culture. In The interpretation of cultures: Selected essays (pp. 3-30). New York, Basic Books.

Golden-Biddle, K., & Locke, K. D. (1997). Composing qualitative research. Thousand Oaks, CA: Sage

Goldman-Segall, R. (1993). Interpreting video data: introducing a "Significance Measure" to layer description. Journal of Educational Multimedia and Hypermedia, 2(3), 261-281.

Goldman-Segall, R. (1995). Configurational validity: A proposal for analyzing ethnographic multimedia narratives. Journal of Educational Multimedia and Hypermedi,a 4(2/3), 163-182.

Goldman-Segall, R. (1998). Points of viewing children's thinking: A digital ethnographer's journey. Mahwah, NJ: Erlbaum.

Goodwin, C. (1993). Recording human interaction in natural settings. Pragmatics, 3(2), 181-209.

Green, J., Franquiz, M., & Dixon, C. (1997). The myth of the objective transcript: Transcribing as a situated act. TESOL Quarterly, 31(1), 172-176.

Grimshaw, A. D. (1982a). Sound-image data records for research on social interaction: Some questions and answers. Sociological Methods and Researc,h 11(2), 121-144.

Grimshaw, A. D. (1982b). Whose privacy? What harm? Sociological Methods and Research, 11(2), 233-247.

Harvey, P. (1991). Drunken speech and the construction of meaning: Bilingual competence in the Southern Peruvian Andes. Language in Society, 20, 1-36.

Harvey, P. (1992). Bilingualism in the Peruvian Andes. In D. Cameron, E. Frazer, Pl Harvey, B. Rampton, & K. Richardson (Eds.), Researching languages: Issues of power and method. London: Routledge.

Hastrup, K. (1992). Anthropological visions: Some notes on visual and textual authority. . In. P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 8-25). Manchester, UK: Manchester University Press.

Heider, K. (1976). Ethnographic film. Austin, TX: University of Texas Press.

Hughes-Freeland, F. (1992). Representations by the Other: Indonesian cultural documentation. In P. I. Crawford & D. Turton (Eds.), Film as ethnography (pp. 242-256). Manchester, UK: Manchester University Press.

Iino, M. (1993). The trap of heneralization: A case of encountering a new culture. Working Papers in Educational Linguistics, 9(1), 21-45.

Iino, M. (1996). "Excellent Foreigner!": Gaijinization of Japanese language and culture in contact situations -- an ethnographic study of dinner table conversations between Japanese host families and American students. (Doctoral dissertation. University of Pennsylvania, 1996). Dissertation Abstracts International, 57(4), 1451-A.

Iino, M. (1999, March). Issues of video recording in language studies. Obirin Studies in Language and Literature, 39, 65-85.

Jackson. B. (1987). Fieldwork. Urbana, IL: University of Illinois Press.

Krupa-Kwiatkowski, M. (1998). Second language acquisition in the context of socialization: A case study of a Polish boy learning English. (Doctoral dissertation, State University of New York at Buffalo, 1997). Dissertation Abstracts International, 58(8), 2969-A.

Lazaraton, A. (1995). Qualitative research in applied linguistics: A progress report. TESOL Quarterly, 29(3), 455-472.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills: Sage.

Lutz, F. W. (1981). Ethnography: The holistic approach to understanding schools. In J. L. Green & C. Wallat (Eds.), Ethnography and language in educational settings. Advances in discourse processes 5 (pp. 51-63). Norwood, NJ: Ablex.

McMeekin, A. (forthcoming). NS-NNS Negotiation in the Study Abroad Homestay Versus Classroom. Unpublished doctoral dissertation. University of Hawai'i-Manoa.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage.

Ochs, E. (1988). Culture and language development: Language acquisition and language socialization in a Samoan village. Cambridge, UK: Cambridge University Press.

Ohta, A. S. (1999). Interactional routines and the socialization of interactional style in adult learners of Japanese. Journal of Pragmatics 31, 1493-1512.

Poole, D. (1992). Language socialization in the second language classroom. Language Learning, 42, 593-616.

Punch, M. (1986). The politics and ethics of fieldwork. Beverly Hills, CA: Sage.

Rollwagon, J. R. (1988). The role of anthropological theory in "ethnographic" filmmaking. In J. R. Rollwagon (Ed.), Anthropological filmmaking (pp. 287-315). Philadelphia: Harwood Academic Publishers.

Roberts, C. (1997). Transcribing talk: Issues of representation. TESOL Quarterly, 31(1), 167-172.

Ruby, J. (1980). Exposing yourself, Reflexivity, film, and anthropology. Semiotica, 3, 153-179.

Ruby, J. (2000). Picturing culture: Explorations of film and anthropology. Chicago: University of Chicago Press.

Rymes, B. (1997). Second language socialization: A New Approach to Second Language Acquisition Research. Journal-of-Intensive-English-Studies, 11, 143-154.

Schecter, S. R., & Bayley, R. (1997). Language socialization practices and cultural identity: Case studies of Mexican-descent families in California and Texas. TESOL Quarterly, 31(3), 513-560.

Schieffelin, B. B. (1990). The give and take of everyday life: Language socialization of Kaluli children. Cambridge, UK: Cambridge University Press.

Sevigny, M. J. (1981). Triangulated Inquiry: A methodology for the analysis of classroom interaction. In J. L. Green & C. Wallat (Eds.), Ethnography and language in educational settings. Advances in discourse processes, Vol. 5 (pp. 65-85). Norwood, NJ: Ablex.

Siegal, M. (1995). Looking east: Identity construction and white women learning Japanese. (Doctoral dissertation, University of California-Berkeley, 1994). Dissertation Abstracts International, 56(5), 1692A.

Tobin, J. J., Wu, D. Y. H., &. Davidson, D. H. (1989). Preschool in three cultures: Japan, China and the United States. New Haven, CT: Yale University Press.

Watson-Gegeo, K. A. (1988), Ethnography in ESL: Defining the essentials. TESOL Quarterly, 22(4), 575-592.

Watson-Gegeo, K. A., Maldonado-Guzman, A. A. & Gleason, J. J. (1981). Establishing research goals: The ethnographer-practitioner dialectic. Proceedings of selected research paper presentations, Theory and Research Division (pp. 670-714). Association for Educational Communications and Technology.

Willett, J. (1995). Becoming first graders in an L2: An ethnographic study of L2 socialization. TESOL Quarterly, 29(3), 473-503.

Wolcott, H. F. (1995). The art of fieldwork. Walnut Creek, CA: Altamira Press.

Wolcott, H. F. (1990). Writing up qualitative research. Newbury Park, CA: Sage

Worth, S., & Adair, J. (1972). Through Navajo eyes: An exploration in film communication and anthropology. Bloomington, IN: Indiana University Press.

Home |About LLT | Subscribe | Information for Contributors | Masthead | Archives