Toward a Communicative Approach to Language Testing

 

A Critical Study of Achievement Tests for Non-Specialist Students Learning English for Specific Purposes in the

Faculty of Arabic and Social Sciences at

King Khalid University

 

ABSTRACT

 

The aim of this paper is to develop a critical awareness of the tests administered to ESP students in the Faculty of Arabic and Social Sciences, King Khalid University, so that language teachers can better reflect on their tests and the way they should be constructed.

The first part of this paper introduces the problem, describing the population of this study and their curriculum.

The second part presents the key issues in language testing, particularly with regard to the criteria involved in carrying out communicative language testing.

The third part is a detailed scrutiny for the 1418 & 1419 tests administered to ESP students from the different departments in the Faculty, shedding a critical light on each test, and pinpointing its positive and negative points.

The fourth part presents conclusions and recommendations, underpinning the findings that language teachers need to make their tests as direct as possible in terms of real-life operations if they are to measure anything of  value.

 

نحو طريقة إتصاليه لاختبار اللغة

 

دراسة نقدية لاختبارات اللغة المقدمة للطلبة الذين يدرسون الإنجليزية لأغراض

خاصة في كلية اللغة العربية والعلوم الاجتماعية

جامعة الملك خالد

 

الملخص

 

الهدف من هذا البحث تسليط ضوء نقدي على اختبارات اللغة الإنكليزية التي أُجريت على طلاب الكلية من أقسام مختلفة , لكي يستطيع أساتذة اللغة التفكير في جدوى اختباراتهم وفي الطريقة الصحيحة التي يجب أن تُنَفَّذ هذه الاختبارات على أساسها.

عرض الفصل الأول مشكلة البحث , ووصف الطريقة التقليدية السائدة في الامتحانات وعرض جانباً من ثغراتها.

وعرض الفصل الثاني مفاهيم أساسية عن اختبارات اللغة الإنكليزية وعُني بشكل خاص بالمعايير المنشودة في تنفيذ اختبار اللغة التواصلي.

وعرض الفصل الثالث تحليلاً مفصلاً لاختبارات اللغة عامي 1418 و 1419 , وسلَّط الضوء النقدي على اختبار كل قسم من أقسام الكلية , وأشار إلى الجوانب الإيجابية والسلبية التي تضمنها كل اختبار وبحث في أفضل الوسائل الهادفة إلى تحسين الاختبار.

وتضمن الفصل الرابع النتائج والتوصيات , حيث أوصى أساتذة اللغة بالاهتمام بطريقة الامتحانات التواصلية التي تُعنى بحاجات الطلبة اليومية وتبتعد عن الأسئلة غير الواقعية , ثم قدَّم الباحث نموذجاً لتصميم هذه الامتحانات مع هدفها وطرق تقويمها.


 

I- Introduction

 

Progressive strides have recently been made toward establishing a communicative atmosphere at the Faculty of Arabic and Social Sciences, King Khalid University. The goal is to motivate students to use the target language competently. In spite of this great accomplishment, we notice that quite a few students are probably unjustly evaluated due to the inappropriateness of the assessment procedures.  Instructors teaching the courses could hardly ever determine the extent to which their teaching had been effective and efficient, because their students’ achievement tests could not be taken as an objective indication of the quality of the teaching received.

Analyzing classroom achievement tests, we find that they reflect the thinking of traditional approaches to testing. Most tests tend to be largely discrete-point in nature, reflecting an orientation toward the behavioristic language-learning theories. This conservative stance in classroom testing has resulted in an ever-widening gap between the description of the course goals and their testing procedures.

The present work intends to answer questions such as the following:

 

1-      Since communicative language teaching has been implemented in the Faculty, is there any truly testing approach compatible with testing procedures developed?

2-      How communicative, valid, reliable, and appropriate are the tests administered to the students?

 

          In fact, most language teachers in the Faculty of Arabic and Social Sciences have wrongly put the blame on the students by assuming that the latter do not do well on English tests merely because they do not study hard. If it is true that most practitioners have been fully prepared to teach well, it is also a fact that most of them have so far failed to realize that a logical explanation for their students’ poor test scores are essentially ascribable to wrong test items that have been administered. Their teaching has certainly aimed at improving not only the students’ linguistic competence, but also their communicative competence. However, their tests provide us with a good account of the students’ ability to manipulate only the grammar of the target language and some comprehension questions to answer. Although instruction has been oriented toward helping students use the language genuinely, testing has remained static; it only measures whether or not the test takers have been linguistically accurate. A language test should be dynamic, reflecting students’ communicative needs rather than being a body of passive items. Thus, language tests in the Faculty sacrifice communicative fluency in favor of grammatical accuracy. Such a mismatch should be eradicated to allow reliability, validity, and authenticity to take place in the process of assessment.

Therefore, this research will primarily focus on the identification of possible problems that exist in the tests administered to non-specialist students learning English for specific purposes. Test samples of the years 1418 and 1419 AH from several departments in the Faculty will be examined in order to determine whether or not they were constructed in accordance with the requirements of a communicative approach to language testing. Light will also be shed on negative and positive aspects of each section categorized in the 1418 and 1419 tests. Finally, the writer will conclude his research paper by presenting a summative language test together with its objectives and ratings to serve as a model for in-service language teachers to build upon it their classroom tests according to a communicative approach to language testing. The summative test is suggested as a possible starting point for revising classroom tests in the Faculty of Arabic and Social Sciences, King Khalid University to reflect language communicative goals more directly.

 

1. Description of the Students and their English Syllabus

 

After receiving their high school diplomas, students are academically screened and then granted admission in one of the various departments in the Faculty. Throughout their studies at the Faculty, students usually acquire a fairly acceptable background knowledge due to the richness of their English syllabus and the extensive number of hours allocated to the study of English over four years. Upon completion of their studies, students graduate with a bachelor’s degree from their respective departments.

The syllabi of the Faculty of Arabic and Social Sciences at King Khalid University comply with the requirements of an Islamic society, keeping the students abreast in moral values and traditions. Textbooks are written by language specialists and revised by the professors of the English department at the Faculty. Each textbook is divided into several topics and each topic covers a variety of subjects, teaching students grammar, history, culture, and many exercises to work on.

In addition to their final tests rated at 70 points, students are given homework, quizzes, and a mid-term exam which are worth 30 points. If the students do not attend their classes regularly and miss the aforementioned evaluation procedures, their final test will be rated at 100 points. By the time they take their final test, the students will have successfully completed one whole semester of intensive advanced study. Thus, they have a fair exposure to English and plenty of time to practice before their finals.

 

II- Review of Related Literature

 

This section presents an overview of testing in general, and communicative testing in particular. It addresses the following questions: What is testing and why do we test? What is communicative testing and in what ways is it different from conventional testing approaches?

 

1. Overview of Educational Testing

 

Tests are means of obtaining systematic evidence on which to base instructional decision. Educators see tests as motivators that stimulate individuals to do their best. If they are well designed and properly used, tests can effectively enhance the educational process, (Richards, 1990).  Educational testing is in fact a world endeavor. In everyday life, there is a necessity to use some devices, determining people’s ability. It is difficult to imagine, for example, some organizations hiring interpreters without some knowledge of their proficiency. Therefore, tests will be needed in order to provide information about the achievement of the testees, without which we cannot make decisions.

The words testing, evaluation, and measurement are closely associated. However, they should be viewed as three different terms. In some instances, evaluation is used as a synonym for the term measurement. In other cases, it is used interchangeably with the term testing. Thus, when teachers administer achievement tests they might say we are “testing” achievement,“measuring”achievement, or “evaluating” achievement, with little regard for these terms’ specific meanings. Smith and Adams (1972) explain that measurement, the science of obtaining a numerical description, should be objective and impersonal, whereas evaluation involves the use of information collected by the process of measurement. For example, if we use a ruler and ascertain that a table is five feet wide, that is measuring. But if we add that the same table is too large to go through a 20-inch door, this is evaluation. As Smith and Adams (1972)  assert, tests given in school attempt to measure the achievement of students. Grades assigned on the basis of test results are evaluations of the students’ achievements.

The terms quiz, test, and exam also need some clarification. Hammerly (1985 - p. 539) states that the differences between a quiz, a test, and an exam are in duration and comprehensiveness. A quiz takes about five to ten minutes and covers the current materials. A test lasts from half an hour to one hour and covers one or more units. An exam is two hours or longer and covers at least half of the content of the course. Despite these distinctions, in this paper, test or testing will be used as an umbrella term to refer to any type of measurement procedures.

In any consideration of educational testing, a distinction must be drawn between teacher-made tests of the classroom and those formal standardized tests which are usually prepared by professional testing services to assist students’ admission to universities. Classroom tests are generally prepared and scored by one teacher. Test objectives can be based directly on course content; the students know what is expected of them and what is likely to be covered in the test questions. Standardized tests, on the other hand, are designed to be used with hundreds of thousands of subjects throughout the world. They are prepared by a team of testing specialists without personal knowledge of the examinees. Such tests often take years to construct as opposed to a few days for a teacher-made test, (Weir, 1993).

Perhaps the most common use of educational tests is to pinpoint strengths and weaknesses in the learned abilities of the students. Linden et al (1974) consider evaluation of students’ progress to be a major aspect of the teacher’s job. It gives a sense of where the students are, relative to the curriculum and to other students, as well as how students are progressing toward the attainment of specified objectives. A test is also a tool which teachers need in their evaluative repertoires.

 

2. Interpretation of Test Data

 

Within each category of the kinds of educational tests mentioned above, there are varieties of different techniques and procedures that can be classified according to how the results are interpreted. Two main types of techniques used to make educational decisions will be discussed along with the different types of information that each test yields.

One type of information helps us determine a student’s rank. This is accomplished by comparing the student’s performance to the performances of other students whose scores are given as the norm. A student’s score is therefore interpreted with reference to the scores of other students, rather than an agreed criterion score. We call this technique a norm-referenced test.

A second type of information provided by tests tells us about a student proficiency in a set of skills. This is accomplished by comparing a student’s performance to a certain criterion, which has been agreed upon. The students must reach this level of performance to pass the test, and a student’s score is therefore interpreted with reference to the criterion score, rather than the scores of other students. We call this technique a criterion-referenced test, (Bachman, 1991).

 

3. Communicative Language Testing

 

There has been a growth of interest in the communicative testing approach. It considers language to be interactive, purposive,authentic,contextualized, and based and assessed in terms of behavioral outcomes. The tests analyzed in this paper do not follow  these principles.

Madsen (1983) states that language testing has evolved through three major stages, which reflect people’s attitudes towards the goals of language teaching and language learning. These stages are summarized as follows:

 

1.              The Intuitive Stage focuses on subjective testing and is dependent on personal impressions of the teachers.

2.              The Scientific Stage stresses objective evaluation focusing on language usage.

3.              The Communicative Stage emphasizes evaluation of language use rather than usage.

The communicative approach is based on the premise that language is first and foremost a tool for communication. From this perspective, tests designed to assess student proficiency can be tailored to include items which possibly measure the students’ communicative ability in all levels of language. Brown (1987) elaborates on the characteristics of a communicative language test:

A communicative test has to meet some rather stringent criteria. It has to test for grammatical, discourse, sociolinguistic, and illocutionary competence as well as strategic competence. It has to be pragmatic in that it requires the learner to use language naturally for genuine communication and to relate to thoughts and feelings, in short, to put authentic language to use within a context. It should be direct (as opposed to indirect tests which may lose validity as they lose content validity). And it should test the learner in a variety of language functions. (p. 230)

 

An important observation in this quotation is that in testing communicative performance, test items should measure how well students are able to engage in meaningful, purposeful, and authentic communicative tasks. Students must have a good performance linguistically and communicatively. That is, they must have a good command of the components involved in communication. The best exams in this communicative era, Madson (1983) comments, are those that combine the various subskills necessary for the exchange of oral and written ideas. He asserts that communicative tests need to measure more than isolated language skills, to omprehensively indicate how well a person can function in another language.

 

4. The Requirements of Good Language Tests

 

The common concepts needed in communicative testing include reliability, validity, practicality, and authenticity. They fall under the heading of desirable test characteristics. Marshall and Hales (1972) point out that any test that is to be used effectively as a measuring instrument should be reliable, valid, authentic, and practical. They warn that a drawback in any of these test attributes can render a test futile.

Reliability has to do with test consistency. Two tests should give evidence that they are likely to produce the same results when taken at different times by the same or similar students. That is, students who obtain high scores on one set of items also obtain high scores on other sets of equivalent items, and those who have a low score on one set of items also have a low score on other sets of items, (Scannel and Tracy, 1975).

Validity in testing refers to whether the test measures what it claims to measure, and whether it measures what was taught. For example, a test which is designed to determine the extent to which a particular group of students have mastered specific algebraic concepts will not be valid when administered to a different group of students with the intent to determine their performance in Elizabethan literature. Similarly, a test of English as a Second Language (ESL) is not valid for students learning translation theory, (Heaton, 1995).

Questions pertaining to the validity of a test include what the test measures, does it measure what it wants to measure, and whether it measures what was taught? Henning (1987) claims that a good language test should consider how relevant is language behavior being tested to the meeting of communicative needs and whether or not the users of the test will accept its content and format.

Practicality or usability is the third important attribute of a good test. It involves the economical use of time and expenses in test construction, test administration, and test scoring. A test may be highly reliable and valid and yet not be practical for use in a school-testing program.

Another equally important feature of a good test is authenticity. In communicative testing, authenticity is a key element in the designing of materials and test items. It means assessing language behavior by observing it in real, or at least realistic, language-use situations which should be as authentic as possible, (Gronlund, 1985).

To sum up, much has recently been written about communicative language testing. Discussions have focused on the desirability of assessing the ability that takes part in the acts of communication. All interests assume that it is communicative competence that teachers want to test. Tests should therefore assess the learner’s communicative behavior and not be based on linguistic items alone. In taking communicative tests, student’s performance should be measured not only in terms of formal correctness, but also primarily in terms of interaction, for the concern is not how much the students know, but how well they can perform.

 

III - Procedures for Analysis of 1418 and 1419 Tests

 

In order to accomplish the goals mentioned in this paper, the researcher will select some excerpts from the 1418 and 1419 tests of several departments in the Faculty of Arabic and Social Sciences. Then he will make a content analysis of them to determine whether or not they are in accordance with the requirements of a communicative language test. The analysis will be section by section in order to identify the strong and weak points in terms of explicitness of the questions and the nature of the tasks. In clear terms, he will try to find out the effectiveness of these tests in relation to the characteristics of reliability, validity, authenticity, and practicality that constitute an effective communicative language test. These tests will be analyzed and evaluated, based on the requirements of a good test as proposed by the literature review. After the analysis and evaluation, the researcher hopes to be able to point out the positive aspects of the testing system in the Faculty of Arabic and Social Sciences as well as those aspects that need to be improved. Finally, in the concluding section, he intends to suggest tentative solutions to improve and update classroom tests in the Faculty to make them suit the communicative teaching approach currently in use. Toward this end, a summative test together with its objectives and ratings will be elaborated in the hope of helping in-service teachers upgrade their strategies in testing student’s achievement.


1.            Overview of the 1418 tests

 

          The 1418 tests of various departments (see appendix I) are divided into a number of subcategories: Reading Comprehension, General Questions, Translation, Vocabulary, Essay, and Grammar. Each division is somewhat related to the topics discussed in the syllabus of the students. The time allotted to finish the entire test is two hours. The whole test is rated at a hundred points: seventy points are given for the final exam, and thirty points are awarded during the semester. However, for those who do not attend the quizzes and homeworks given during the semester, the test will be rated out of one hundred and they will be given additional half an hour in their final. The distribution of the test points indicates that the reading comprehension section is the most important component of the test and has a total of twenty points. General questions come next with fifteen points, and translation, vocabulary, essay, and grammar are given the rest. The layout of the test covers two printed pages with the text for reading comprehension and the general questions on the first, and the rest of the questions on the second.

          A general observation about this test is that it is the type referred to as partially subjective, which has abundant writing in various forms including translation, essay, and open-ended answers? based on reading comprehension and common sense. Generally speaking, such a test generally measures linguistic competence. It is designed according to the traditional testing approaches as reflected by the nature of the questions, the length of the test, and the distribution of the points over the subcategories mentioned earlier. A general analysis of the tests’ content reveals the following aspects:

 

A. Reading Comprehension

 

This section includes some questions that the test takers have to answer in order to show evidence of their understanding of the reading passage. Some questions revealed in the passage deal with information that is obviously known. Hence, the testee is able to answer the questions correctly without paying much attention to the reading passage. For example, the question reads: “What is the weather like in Arabia?” (The History Department). Such a strategy fails to discriminate between students on various levels of proficiency. The teacher should test students’ understanding not only of the surface meaning of a passage, but also of the author’s purpose and attitude.

However, the problem which remains with the comprehension questions is that the direction, “Read the following passage and answer the questions”, is vaguely stated. Test takers may have difficulty deciding whether they should answer the questions in accordance with the information provided in the text or whether their answers should emanate from personal experiences.

 

B. General Questions

 

Some of the questions are subjective, requiring the students to have advanced skills and strong background knowledge in the target language. For example, the question reads: “How do we learn the lessons of history?” (The History Department). The criteria for judging general questions require several scorers; therefore, such questions tend to be unreliable.

Moreover, some questions can be answered without reference to the textbook. Because of this, it is difficult for the teacher to determine whether or not good answers indicate good reading of their textbook. The question reads: ”Why are traveler’s checks useful?” (The Administration Department). The broadness of the questions, however, offers more latitude to the students by using all means to write the appropriate answer.

 

C. Vocabulary

 

The vocabulary test requires that the students use a number of words taken from their textbooks. in meaningful sentences The question reads: “Use each of the following words in meaningful sentences of your own” (The Geography Department). The trouble with this method of vocabulary assessment however is that most users of language may know the meaning of particular passive words without being able to properly use them in meaningful sentences. This is what makes such a vocabulary test a little too demanding for non-native speakers. This kind of test, however, may be useful if the students are asked to compose a sentence out of active vocabulary, that is, words that are needed to understand newspaper, periodical, literature, and textbooks.

 

D. Grammar

 

There is a variety of grammar questions in the tests. Students are asked to do what is required between brackets (as the example below) or according to the directions mentioned in the test paper. Success or failure to do well in such traditional grammar questions gives little or no account of students’ communicative ability and is not therefore an adequate measure. In addition, the directions mentioned in the grammar questions are not so well stated which may prevent students from successfully performing the required task. For example, the question reads:

 

 “Do as shown in brackets: I (be) going to travel to Jeddah. (correct)” (The Psychology Department)

 

Here, the verb between parentheses can be placed in two different tenses and the sentence remains true.

 

E. Translation

 

Students are asked to translate from English into Arabic a portion of the reading passage or a series of sentences written in their exam paper. The translation into Arabic shifts the emphasis from demonstrating competence in English for showing the students’ skill in Arabic, and thus targets the native language. The translation from Arabic into English is more appropriate, mainly if the text to be translated presents a coherent unit and makes sense.

A common problem with translation is that very often it degenerates into interpretations. This means that students who achieve higher scores are essentially those who have succeeded in interpreting the content of the required translation. In addition to that, this kind of traditional translation test does not adhere to the requirements of communicative language testing.

 

F. Essay Questions

 

The goal of the essay section is to determine the students’ ability to write well. Students are asked to discuss one or two topics written in their test paper. The question allows students to compose their own relatively free and extended answers. However, the directions sometimes do not indicate how lengthy or concise students should be. This usually becomes a serious problem when essays of such different lengths are corrected. For example, the question reads: “Write a short essay on the following topic: The most important achievements done by Omar ibn Al-Khatab during his caliphate.” (The Arabic Department). An essay question like this can be regarded as both uneconomical and imprecise, requiring two scorers to make the test reliable.

To summarize, after scrutinizing each section of the 1418 exam, the following conclusions can be drawn. First, the tests have both positive as well as negative aspects. Their merits in relation to a language communicative test are the following:

 

                           ·                  The tests are in some sections economical; they take little paper and little time to design.

                           ·                  The tests tap into the students’ prior knowledge, and, as its title suggests, they have at least face validity.

                           ·                  The questions measure linguistic skills.

                           ·                  The items induce students to do the thinking and the reasoning tasks. Hence, to pass the tests, students have to study their textbooks very well.

The problems with the 1418 tests are the following:

                           ·                  The results of each test cannot lead to a generalization that the passing student is good at English. Most items test only the linguistic ability of the learners. Thus, good performers on these tests may still be poor communicators.

                           ·                  Students’ ability to do the tasks relies heavily on their knowledge and memorization of their textbooks. This strategy can make students passive and not creative.

                           ·                  The vagueness of the directions may negatively affect students’ performance. As a result, they may not be able to answer well.

                           ·                  Many items are subjective. Because of this, the test can therefore be considered to lack reliability. Different scorers or even the same scorer will give different scores when the test is administered at different times.

                           ·                  Essay questions lack practicality because they are time consuming to grade and difficult to rate. They involve at least two raters in order to have a higher interrater reliability.

                           ·                  The tests are not fully authentic. The tasks do not completely reflect some communicative activities that students are likely to come across in real-life situations.

 

2. Overview of the 1419 Tests

 

The content of these exams (see appendix I) includes a text of reading comprehension followed by a series of questions. Like the 1418 tests, the questions are grouped into some distinct categories: Reading Comprehension, General Questions, Translation,Vocabulary,Essay Questions, and Grammar. Each category is rated from 10 to 20 points and the whole test is worthy of 70 points for those who attend the class regularly, and 100 points for those who do not. Those who attend are given two hours to finish the test, and the others are given two hours and a half to finish.

The interesting thing about the 1419 test is that some departments have adopted new techniques of testing. Some tasks, for example, are assigned in True/False statements, multiple choice questions, matching questions, and fill in the blanks. Some of the questions are answerable either directly from their textbook or indirectly from the students’ personal experience. Here again, like 1418 tests, reading comprehension and general questions are the most highly rated with twenty points. The Vocabulary and Grammar section and the Composition section come next with 15 points each, followed by Translation with 10 points. The length of the test, as presented on the original copy, is from one to two printed pages. This is determined by the number and the diversity of the items.

Although the 1419 tests have some subjective tasks, the same generalization cannot be made like the 1418 test. This is due to its content and the various response types the test shows. The basic difference, however, is that the 1419 tests are less elaborate and have more objective items than the 1418 test. The general analysis of the component parts of the 1419 tests is as it follows:

 

A. Reading Comprehension and General Questions

 

These sections contain techniques different from 1418 tests for checking understanding of the reading material. They are: True/False questions, Multiple Choice questions, questions answerable from the information in the text, and questions which relate to the students’ personal experience. The first two are referred to as objective questions while the last two are subjective. Because students are provided with the right answer and are only asked to select it from among other answers, objective questions are easier to answer and to score than are subjective questions. Scoring for such questions can be done easily because it involves no judgments as to the degrees of correctness. Owing to this strategy, such tests tend to have superior reliability and validity.

Concerning the subjective questions, even though such questions do not allow reliability, they have, as Gronlund (1985) points out, the advantage of providing a freedom of response which is sometimes needed in measuring certain complex outcomes such as the ability to create, to organize, to integrate, to express, and to demonstrate other similar behaviors that require the production and the synthesis of ideas.

The directions for the “fill in the space” questions in the 1419 tests are, however, vaguely formulated. They read, “Complete the following: 1- Novel is ________.” (The Arabic Department). Since there is no indication of whether students should refer to the text or to their personal experiences to determine the best answer, confusion may ensue. Also, because the source is not specified, the opportunity for making inferences cannot be excluded. Thus, by making inferences and analogies, these questions can have more than one correct or best answer.

The advantage of True/False and Multiple Choice questions is that they are pure tests. Short-answer items are very useful in classroom achievement tests. They are relatively quick to write and easy to answer. However, their limitation is that they can measure very little of the students’ understanding. For example, the identification of the correct answer by some students in the Accounting Department does not necessarily mean that they have perfectly studied. They may guess the correct answers sometimes even without reading their textbook.

The test of the Sociology Department is composed of open-ended items which relate to the student’s personal experience. Such general questions are sometimes recommended because they are interesting to answer. However, they may cause students to write longer answers than necessary. Students are usually eager to give their personal feelings on things that interest them. However, because of proficiency of some students, they usually write at greater length than what is exactly required, like an answer to the question: “What is a social worker?” Thus, open-ended questions can become more time consuming and more elaborate than the testing situation requires.

 

B. Vocabulary

 

The third part of 1419 tests is that of vocabulary which is done the same way as it was in the 1418 test. Students are asked to create sentences of their own to demonstrate their comprehension of words used in the textbooks. This technique of testing vocabulary does not reflect a truly communicative task. A serious problem with this test is that many ESL students may be able to conceptualize the meaning of a word without being able to express it in writing.

The most useful type of testing vocabulary that some departments made is matching which represents a problem-solving task for which students use their cognitive skills. Students are given some words and asked to find their meanings from the list. This is an economical method of testing vocabulary, (The History Department).

 

C. Grammar

 

Grammar items are concerned with finding out if the students have mastered some particular grammatical points. Questions are focused on some specific issues. Section III in the Geography Department is about English verb tenses. It assesses students’ understanding of the simple present tense. This is a more meaningful exercise and a better technique than the one in the Accounting Department telling the student to study some situations using may/might followed by the correct form of infinitive. However, the questions revealed in grammar sections do not cover the rules discussed in students’ textbook. A grammar achievement test should include the full range of structures that were taught throughout the course.

 

D. Translation

 

The technique for testing translation does not differ from that of 1418. In fact, this method of testing translation does not develop fluency in communication skills. On the contrary, it may impede communicative fluency in language learning because interference between the first and the target language can take place. In addition, in an achievement test for translation it is very difficult to evaluate this traditional way of testing, because such a test is highly unreliable.

 

E. Essay Questions

 

Essay questions in the 1419 tests are minimized. It is worth mentioning, however, that this kind of test is still widely used as a means of measuring the writing skill. A student’s ability to organize ideas and express them in his own words is a skill essential for real-life communication. Hence, if a more reliable means of scoring the composition could be used, controlled essay questions like the ones in the Arabic Department may be recommended.

To conclude, once again, after examining each section of the 1419 tests, the following conclusions can be made. Like 1418 tests, the tests of 1419 have both positive as well as negative aspects. The positive aspects of 1419 exams include the following points:

 

                           ·                  They test many areas of language skills using a variety of new techniques and strategies.

                           ·                  They is a positive combination of both objective and subjective questions.

                           ·                  There is an attempt to test grammatical competence as well as communicative competence.

                           ·                  Some texts of reading comprehension are appealing to the students because they are about a topic on which they have high schemata or background knowledge.

                           ·                  The items are related to the content of the syllabus. Because of this, the tests can be said to have content and face validity.

                           ·            The tests contain a number of good communicative tasks, that is, those which induce students to do meaningful and purposeful activities.

                           ·                  Some of the test items are easy to score because of their objectivity.

 

The major problems with 1419 exams are:

 

                           ·                  The tests sometimes contain unclear and confusing directions.

                           ·                  The tests as a whole are not reliable because some of their items include subjective questions.

                           ·                  The layout of the test is not very good. Items, which measure the same learning outcome and language aspects, should be grouped together.

                           ·                  Generally speaking, the tests are not fully authentic.

 

In the light of the above analysis, it can be concluded that neither of the two tests 1418 & 1419 fulfils the requirements of a communicative language test. The 1418 tests adopted a traditional approach whereas the tests administered in 1419 include features of both traditional approaches and a little of communicative ones.

 

IV - Conclusions and Recommendations

 

Since efforts have been made to improve English teaching in the Faculty of Arabic and Social Sciences at King Khalid University by adopting a communicative teaching approach, there should accordingly be some efficient ways to enhance and assess the effectiveness of examinations. In other words, if communicative teaching exists, there has to be a communicative way of testing. The writer of this paper strongly believes that changes in the testing system of the Faculty must be made so that the procedures of measuring student’s achievement become more accurate and fair.

Since great efforts Have been exerted to teach students to become good at communication skills, the researcher believes that it would also be appropriate to introduce the desired changes to the testing system in the Faculty. He also believes that adjusting the goals of traditional testing to those of the syllabus will allow teachers and students to better assess their own efforts and to accurately interpret the efforts of the others.

As mentioned in the preceding section, neither one of the two tests under study can be viewed as completely communicative. They reflect testing principles and procedures of the traditional testing approach. The two tests (1418 and 1419) can be greatly improved to adequately reflect students’ excellence in the target language if the following points were taken into account:

 

                           ·                  The materials and the tasks should be authentic, that is, they have to reflect questions that students may encounter in real-life situations. Researchers state that test constructors should be on the alert for materials in newspapers, magazines, or picture files that could serve as the basis for test items. This makes the tasks become interesting and reduce students’ anxiety.

                           ·                  The tasks ought to be entirely communicative. Rather than assigning purely grammatical tasks, teachers should create situations engaging students to do meaningful activities, which can reveal their grammatical as well as their communicative performance as the following example:

Directions:

Your “pen pal” Ahmed is sending a brief letter telling you about himself. Rewrite his letter, adding punctuation and capitalization where needed. (The student receives one point for each of the 42 punctuation markings and capitalizations called for in the letter:

 

dear pen pal

permit me to introduce myself my name is ahmed i am from syria i was born on september 16 1970 i like reading it is my best friend when i am lonely i just finished the book titled seeking happiness my teacher who teaches me english loves me and always tells me ahmed if you want to master punctuation you have to practice a lot i believe all of you agree on that dont you

i am looking forward to hearing from you soon

          sincerely

          ahmed

 

Such a test reflects the activities that students are likely to undertake in real-life. This procedure is attractive; it is easy to construct, to administer, and to score.

 

                           ·                  Teachers should use simple directions, avoiding verbosity and unfamiliar grammar terminology. In order to facilitate understanding of the instructions, examples should be provided with some of the items rather than leaving students to guess.

                           ·                  The reading passage should be what ordinary readers are likely to read in real-life situations, such as authentic excerpts from newspaper articles, pictures, or short stories, rather than artificial constructs designed for the mere purpose of testing. With authentic materials, students should be induced to do some skimming or scanning in order to answer questions.

                           ·                  In essay questions, students should be rated not only on their use of the grammatical structures and lexicon of the target language but also on their coherent ideas and their organization. The challenge that remains difficult in assessing an essay question is (1) eliciting the specific language constituents that the teacher wishes to test, and (2) finding a way to evaluate it reliably, (Harris, 1969).

                           ·                  Translation tests should be contextualized allowing the teacher to in everyday life. In a contextualized translation test, elements of a real conversation are deleted from a dialogue. Students must attempt to restore the missing elements using a native language version of the text as their guide. Such a test is valid, reliable, and less time consuming than full translation. Omaggio (1986, p. 328) states that: “This format [strategy] elicits specific features of the language in a controlled fashioned and therefore has high diagnostic power.” The following example is for freshmen students. The teacher can adapt it according to the level of his examinees.


 

Directions:

Complete the following passage on the left using the equivalent Arabic version on the right as a guide.


Mohammed: Tom! ……………….……. To Saudi Arabia.

محمد: مرحباً بك في السعودية يا توم!

Tom:              ……………..

توم: أشكرك.

Mohammed: When did ……………..………………………….…    Abha?

محمد: متى وصلت إلى أبها؟

Tom:             A week …………….

توم: منذ أسبوع.

Mohammed: Did you ……….….. your    ……………..?

محمد: هل اصطحبت عائلتك معك؟

Tom:    I ……….……………

توم: أتيت بمفردي.

 


To sum up, communicative tests must be concerned with how language is used in communication. One basic principle to be observed in designing them is that they should focus not only on the linguistic accuracy of the learner’s language, but also on precise specifications of the learner’s needs (communicative competence).

 

V - Summative Language Test

 

The subjects

 

The students I will test are Saudi freshmen students ranging from eighteen to twenty years in age. They have been studying English for one whole semester. During the semester I have given them several formative tests, separately addressing the four different language skills. My students’ competence will be assessed through the following summative test.

 



The objectives of the test

 

The test is designed to evaluate the global command of the four language skills in terms of:

 

                          1.                Listening

                         A.        Discriminating between distinctive factors of English phonology such as /p/ and /b/, /f/ and /v/, /š/ and /č/ and others.

                         B.         Identifying the supra-segmental aspects, mainly stress placement.

                         C.        Overall listening comprehension.

                          2.                Speaking

                         A.        Speaking with clear pronunciation and use of various kinds of vocabulary words, and good command of English rhythm.

                         B.         The ability to use various verb tenses (simple present, present progressive, past, future, etc.).

                         C.        The ability to hypothesize and persuade.

                          3.                Reading

                         A.        The ability to read following the given punctuation.

                         B.         The ability to recognize synonyms and antonyms.

                         C.        The ability to scan a passage and to infer.

                         D.                Overall reading comprehension.

                          4.                Writing

                         A.        The ability to write a paragraph using punctuation and capitalization.

                         B.         Mastery of the orthography of English.

                         C.        Vocabulary and grammatical structure.

 

Ratings

 

The grading scale is criterion-referenced. A grade of “1” will indicate “superior “ command of English language according to a 90% or better criterion, “2”, “above average” command according to an 80% criterion, “3”, “average” command according to a 70% criterion, and “4”, “poor” command for anything below a 70% criterion. Since speaking and writing involve more than on element, more detailed rating of these two skills will be done according to the following:

 

Speaking: “1” (superior) will be given when a student’s speech is effortless and smooth with right English rhythm, (native-like pronunciation), good command of structure. and vocabulary. “2” (above average) will be awarded when a student’s speech contains no conspicuous mispronunciation but would not be taken as something from a native speaker. Errors in structure will be quite rare and the speech will be even and fluent with an occasional pause. “3” (average) will be given when there is a perceptible foreign accent and occasional mispronunciation which still do not hinder communication. Grammatical errors should not overly disturb a native listener. The speaker’s discourse should capture the gist of the topic. “4” (poor) will be awarded when the speech is halting, fragmented, and jerky. A very heavy accent, making understanding impossible. Vocabulary and structure are so limited as to impede listener comprehension.

Writing: “1” (Superior) will be awarded when there are no or rare mistakes in grammar, spelling, punctuation, and capitalization. Writing is well-organized, beginning with a clear topic sentence with supporting sentences. The vocabulary used is appropriate to the topic, and ideas are adequately linked together. “2” (above average) will be given when the paragraph is written smoothly but not to a level of fluency characteristic of that of an educated native speaker. Ideas should be related to the topic and vocabulary use appropriate to the topic. Some mistakes in grammar, spelling, punctuation, and capitalization will be present, but not to the degree that meaning is obscured. “3” (average) will be awarded when the paragraph has a main idea, but contains writing that is difficult to follow. Some ideas support the topic sentence but some do not. Several mistakes in grammar, spelling, punctuation, and capitalization will be present. Sometimes the correct vocabulary word will not be used and ideas may not be linked together. “4” (poor) will be given when there are many errors in grammar, spelling, punctuation, and capitalization which make the paragraph difficult to understand. Only basic sentences are used; some sentences may not be complete. Many of the sentences will not relate with the topic sentence and much vocabulary will be used inappropriately.

 

Summative listening test

Directions: this test will be completed in the language lab. Directions for the students are “Listen to the following sentences. Draw a circle around the word you think you hear. Example: The nurse gave him the (bill - pill), Answer: pill.” Students will have five minutes to respond to each of the two parts of the test. Each part will be worth a possible ten points, with each item worth two points. The total examination will be worth twenty points.

Examples follow.

 

Answer sheet

 

[part a: discriminating vowels]

 

    Response

 

Stimulus materials (audio-recorded)

1. A           B

 

A- cold       B- gold       “Are you getting cold?”

2. A           B

 

A- race       B- raise      “I’ll raise you to the top.”

3. A           B

 

A- pear       B- bear      “She can’t eat a whole pear.”

4. A           B

 

A- glass      B- grass    “Please don’t walk on the grass.”

5. A           B

 

A- win         B- wean    “It is time to wean the child.”

 

[part b: discriminating words]

          You will hear Mary’s mother telling her to set the table. Write the number of each statement in the circle corresponding to the item she mentioned.  Number 1 is solved for you.

 

1-                  We will use napkins today.

2-                  Take the large plate.

3-                  Each person should have one fork.

4-                  Put the knife on the table.

5-                  Eat what is left on your plate.

6-                  Hold the cup carefully.

 

 

 

The grading scale for the test will be:

 

Points         

 

Rating

18-20

 

1 (superior)

16-17

 

2 (above average)

14-15

 

3 (average)

0-13

 

4 (poor)

 


Speaking summative test

 

Directions: This part of the test is to be completed in the language lab. Record all of your answers on the tape. Limit your answer to 4-5 minutes per item. Four points are possible for each of the skills of pronunciation, vocabulary use, structure, and fluency. Both parts of this test, together, will be worth a possible 32 points.

 

(1) Describing a picture: Make up a story about the picture in front of you. Who are these people and what are they doing? Do you like this activity? Why? What did you do when you were the age of the students? What will you do when you are the teacher’s age? (There will be a real colorful picture in which there are elementary students drawing and their teacher watching them).

(2) Who is more important? A scientist or an artist? Who would you want to be? Justify your answer.


 

The grading scale for the test will be:

                                    

Points

 

Rating

28-32

 

1 (superior)

25-27

 

2 (above average)

22-24

 

3 (average)

0-21

 

4 (poor)


Reading summative test

 

Directions: Look at the picture below. (There will be a picture of a man and around him, there is a vacuum cleaner and its parts scattered on the floor). Read the passage that follows carefully. Then answer the questions that appear after the passage. You have thirty minutes to complete this test. Each item is worth a possible two points; the test is worth sixteen points in total.

 

Mr. Scott thought that he was very good at fixing household appliances when they broke, so when Mrs. Scott told him that she needed a new vacuum cleaner, he said, “What’s wrong with the old one? I can easily fix it.”

Mr. Scott fixed the vacuum cleaner, but the same thing happened again several times, until one day, after he had unscrewed all the parts, and had gone to have lunch, Mrs. Scott added a few extra pieces to the pile on the floor.

“Do you know,” she said to her friend Mrs. Brown, the next morning, “if I’d just taken away a few pieces, he’d have noticed that they were missing, and would have gone out and bought some more, But when he couldn’t find places for all the pieces that were on the floor, he gave up and agreed to buy me a new machine.”


 


[Part a]

 

1-             This anecdote is:

          a. humorous

          b. sad

          c. scientific

 

2-             In the last line of the story, the word “give up” means:

a. surrender

b. offer

c. become angry

 

3-             Why did Mr. Scott agree to buy a new machine?

a. Because he wants to please his wife.

b. Because he wanted to save time by not fixing it.

c. Because he did not know how to fix the old machine.

 

 [Part b]

Match the words on the left to the corresponding words on the right, by drawing lines between the two.

 

1.  Mrs. Brown                                  a. Didn’t want to buy a new

    vacuum cleaner for his wife.

2. Mr. Scott                                       b. had been added by Mrs. Scott.

3. Mrs. Scott                                      c. needed fixing on more than one

    occasion.

4. The extra pieces                             d. was a friend of Mrs. Scott’s.

5. The vacuum cleaner                        e. was smart enough to get a new

    vacuum cleaner.

 


The grading scale for the reading test will be:

                                    

Point 

 

Rating

14-16

 

1 (superior)

12-13

 

2 (above average)

11-12

 

3 (average)

0-10

 

4 (poor)

 


Writing summative test

         

Directions: Choose one of the following questions. Then write a paragraph on the topic. You have fifteen minutes to write the paragraph. The total test is worth 64 points. 16 points will be possible for each of the following: (1) freedom from errors in grammar, spelling, punctuation, and capitalization, (2) how you organize the paragraph, (3) appropriateness of vocabulary use, and (4) how you link your ideas together.

                          1.                                                              Why I want to study a foreign language.

                          2.                                                              Write what you see in the picture below. Use complete sentences. (students see a picture of a car accident and an ambulance near the damaged car trying to help the injured people)


 

 

The grading scale for the reading test will be:

 

Points

 

Rating

57-64

 

1 (superior)

51-56

 

2 (above average)

44-55

 

3 (average)

0-43

 

4 (poor)


Conclusion

 

This criterion-referenced test measures student’s competence in the four language skills of listening, speaking, reading, and writing. Since it is possible for a student to be strong in reading, for example, and weak in speaking, it is necessary to retain separate scores for each skill tested in order to have an accurate profile of each student’s English language skills. But an overall score is also necessary to provide an appropriate method of deciding whether a student passes or fails the test as a whole. Therefore, the four subtest scores will be combined (with four points possible for each subtest) for a total possible score of 16 points. The grading scale for the overall test, then, is:


                                             

Points         

 

Rating

 

14-16

 

1 (superior)

Passing

12-13

 

2 (above average)

Passing

11-12

 

3 (average)

Passing

 0 -10

 

4 (poor)

Failing


REFERENCES

 


Bachman, L. F., (1991). Fundamental Considerations in Language Testing. Oxford University Press.

Brown, H. D., (1987). Principles of Language Learning and Teaching. NJ: Prentice Hall Regents.

Gronlund, N. E., (1985). Measurement and Evaluation in Teaching. NY: Macmillan Publishing Company.

Hammerly, H., (1985). An Integrated Theory of Language Teaching and its Practical Consequences. Blaine, WA: Second Language Publications.

Harris, D. P., (1969). Testing English as a Second Language. St. Louis: McGraw-Hill Book Company.

Heaton, J. B., (1995). Writing English Language Tests. London: Longman Group Limited.

Henning, G., (1987). A Guide to Language Testing. Cambridge: Newbury House Publishers.

Linden, K. W., Kryspin, J. W. & Feldhusen, J. F. (1974). Developing Classroom Tests. Minneapolis, Minnesota: Burgess Publishing Company.

Madsen, H. S. (1983). Techniques in Testing. New York, NY: Oxford University Press

Marshall, J. C., & Hales, L. W. (1972). Essentials of Testing. Reading, MA: Addison-Wesley Publishing Company, Inc.

Omaggio, A. C., (1986). Teaching Language in Context: Proficiency-Oriented Instructions. Boston: Heinle & Heinle Publisher.

Richards, J. C. (1990). The Language Teaching Matrix. New York, NY: Cambridge University Press.

Scannell, D. P., & Tracy, D. B., (1975). Testing and Measurement in the Classroom. Boston, MA: Houghton Mifflin Company.

Smith, F. M., & Adams, S., (1972). Educational Measurement for the Classroom Teacher. New York, NY: Harper & Row Publishers.

Weir, C. (1993). Understanding & Developing Language Tests. UK: Prentice Hall International.