Toward a Communicative Approach to Language
Testing
A Critical
Study of Achievement Tests for Non-Specialist Students Learning English for
Specific Purposes in the
Faculty of
Arabic and Social Sciences at
The
aim of this paper is to develop a critical awareness of the tests administered
to ESP students in the Faculty of Arabic and Social Sciences,
The
first part of this paper introduces the problem, describing the population of
this study and their curriculum.
The
second part presents the key issues in language testing, particularly with
regard to the criteria involved in carrying out communicative language testing.
The
third part is a detailed scrutiny for the 1418 & 1419 tests administered to
ESP students from the different departments in the Faculty, shedding a critical
light on each test, and pinpointing its positive and negative points.
The
fourth part presents conclusions and recommendations, underpinning the findings
that language teachers need to make their tests as direct as possible in terms
of real-life operations if they are to measure anything of value.
نحو طريقة إتصاليه لاختبار اللغة
دراسة نقدية لاختبارات اللغة
المقدمة للطلبة الذين يدرسون الإنجليزية لأغراض
خاصة في كلية اللغة العربية
والعلوم الاجتماعية
جامعة الملك خالد
الملخص
الهدف
من هذا البحث تسليط ضوء نقدي على اختبارات اللغة الإنكليزية التي أُجريت على طلاب
الكلية من أقسام مختلفة , لكي يستطيع أساتذة اللغة
التفكير في جدوى اختباراتهم وفي الطريقة الصحيحة التي يجب أن تُنَفَّذ هذه
الاختبارات على أساسها.
عرض
الفصل الأول مشكلة البحث , ووصف الطريقة التقليدية
السائدة في الامتحانات وعرض جانباً من ثغراتها.
وعرض
الفصل الثاني مفاهيم أساسية عن اختبارات اللغة الإنكليزية وعُني بشكل خاص
بالمعايير المنشودة في تنفيذ اختبار اللغة التواصلي.
وعرض
الفصل الثالث تحليلاً مفصلاً لاختبارات اللغة عامي 1418 و 1419
, وسلَّط الضوء النقدي على اختبار كل قسم من أقسام الكلية , وأشار إلى
الجوانب الإيجابية والسلبية التي تضمنها كل اختبار وبحث في أفضل الوسائل الهادفة
إلى تحسين الاختبار.
وتضمن
الفصل الرابع النتائج والتوصيات , حيث أوصى أساتذة اللغة
بالاهتمام بطريقة الامتحانات التواصلية التي تُعنى بحاجات الطلبة اليومية وتبتعد
عن الأسئلة غير الواقعية , ثم قدَّم الباحث نموذجاً لتصميم هذه الامتحانات مع
هدفها وطرق تقويمها.
Progressive
strides have recently been made toward establishing a communicative atmosphere
at the Faculty of Arabic and Social Sciences,
Analyzing classroom
achievement tests, we find that they reflect the thinking of traditional
approaches to testing. Most tests tend to be largely discrete-point in nature,
reflecting an orientation toward the behavioristic language-learning theories.
This conservative stance in classroom testing has resulted in an ever-widening
gap between the description of the course goals and their testing procedures.
The
present work intends
to answer questions such as the following:
1-
Since communicative language
teaching has been implemented in the Faculty, is there any truly testing
approach compatible with testing procedures developed?
2-
How communicative, valid, reliable,
and appropriate are the tests administered to the students?
In fact, most language teachers in the
Faculty of Arabic and Social Sciences have wrongly put the blame on the
students by assuming that the latter do not do well on English tests merely
because they do not study hard. If it is true that most practitioners have been
fully prepared to teach well, it is also a fact that most of them have so far
failed to realize that a logical explanation for their students’ poor test
scores are essentially ascribable to wrong test items that have been
administered. Their teaching has certainly aimed at improving not only the
students’ linguistic competence, but also their communicative competence.
However, their tests provide us with a good account of the students’ ability to
manipulate only the grammar of the target language and some comprehension
questions to answer. Although instruction has been oriented toward helping
students use the language genuinely, testing has remained static; it only
measures whether or not the test takers have been linguistically accurate. A
language test should be dynamic, reflecting students’ communicative needs
rather than being a body of passive items. Thus, language tests in the Faculty
sacrifice communicative fluency in favor of grammatical accuracy. Such a
mismatch should be eradicated to allow reliability, validity, and authenticity
to take place in the process of assessment.
Therefore, this research will primarily focus on the
identification of possible problems that exist in the tests administered to
non-specialist students learning English for specific purposes. Test samples of
the years 1418 and 1419 AH from several departments in the Faculty will be
examined in order to determine whether or not they were constructed in
accordance with the requirements of a communicative approach to language testing.
Light will also be shed on negative and positive aspects of each section
categorized in the 1418 and 1419 tests. Finally, the writer will conclude his
research paper by presenting a summative language test together with its
objectives and ratings to serve as a model for in-service language teachers to
build upon it their classroom tests according to a communicative approach to
language testing. The summative test is suggested as a possible starting point
for revising classroom tests in the Faculty of Arabic and Social Sciences,
After receiving their high school diplomas, students are
academically screened and then granted admission in one of the various
departments in the Faculty. Throughout their studies at the Faculty, students
usually acquire a fairly acceptable background
knowledge due to the richness of their English syllabus and the extensive
number of hours allocated to the study of English over four years. Upon
completion of their studies, students graduate with a bachelor’s degree from
their respective departments.
The syllabi of the Faculty of Arabic and Social Sciences
at
In addition to their final tests rated at 70 points,
students are given homework, quizzes, and a mid-term exam which are worth 30
points. If the students do not attend their classes regularly and miss the
aforementioned evaluation procedures, their final test will be rated at 100
points. By the time they take their final test, the students will have
successfully completed one whole semester of intensive advanced study. Thus,
they have a fair exposure to English and plenty of time to practice before
their finals.
This section presents an overview of testing in general,
and communicative testing in particular. It addresses the following questions:
What is testing and why do we test? What is communicative testing and in what
ways is it different from conventional testing approaches?
Tests are means of obtaining systematic evidence on
which to base instructional decision. Educators see tests as motivators that
stimulate individuals to do their best. If they are well designed and properly
used, tests can effectively enhance the educational process, (Richards, 1990). Educational testing is in fact a world
endeavor. In everyday life, there is a necessity to use some devices,
determining people’s ability. It is difficult to imagine, for example, some
organizations hiring interpreters without some knowledge of their proficiency.
Therefore, tests will be needed in order to provide information about the
achievement of the testees, without which we cannot make decisions.
The words testing, evaluation, and measurement are
closely associated. However, they should be viewed as three different terms. In
some instances, evaluation is used as a synonym for the term measurement. In
other cases, it is used interchangeably with the term testing. Thus, when
teachers administer achievement tests they might say we are “testing” achievement,“measuring”achievement, or “evaluating” achievement, with
little regard for these terms’ specific meanings. Smith and Adams (1972)
explain that measurement, the science of obtaining a numerical description,
should be objective and impersonal, whereas evaluation involves the use of
information collected by the process of measurement. For example, if we use a
ruler and ascertain that a table is five feet wide, that is measuring. But if
we add that the same table is too large to go through a 20-inch door, this is
evaluation. As Smith and Adams (1972) assert, tests given in school attempt
to measure the achievement of students. Grades assigned on the basis of test
results are evaluations of the students’ achievements.
The terms quiz, test, and exam also need some
clarification. Hammerly (1985 - p. 539) states that the
differences between a quiz, a test, and an exam are in duration and
comprehensiveness. A quiz takes about five to ten minutes and covers the
current materials. A test lasts from half an hour to one hour and covers one or
more units. An exam is two hours or longer and covers at least half of the
content of the course. Despite these distinctions, in this paper, test or
testing will be used as an umbrella term to refer to any type of measurement
procedures.
In any consideration of educational testing, a
distinction must be drawn between teacher-made tests of the classroom and those
formal standardized tests which are usually prepared by professional testing
services to assist students’ admission to universities. Classroom tests are
generally prepared and scored by one teacher. Test objectives can be based
directly on course content; the students know what is expected of them and what
is likely to be covered in the test questions. Standardized tests, on the other
hand, are designed to be used with hundreds of thousands of subjects throughout
the world. They are prepared by a team of testing specialists without personal
knowledge of the examinees. Such tests often take years to construct as opposed
to a few days for a teacher-made test, (Weir, 1993).
Perhaps the most common use of educational tests is to
pinpoint strengths and weaknesses in the learned abilities of the students.
Linden et al (1974) consider evaluation of students’ progress to be a major
aspect of the teacher’s job. It gives a sense of where the students are,
relative to the curriculum and to other students, as well as how students are
progressing toward the attainment of specified objectives. A test is also a
tool which teachers need in their evaluative repertoires.
Within each category of the kinds of educational tests
mentioned above, there are varieties of different techniques and procedures
that can be classified according to how the results are interpreted. Two main
types of techniques used to make educational decisions will be discussed along
with the different types of information that each test yields.
One type of information helps us determine a student’s
rank. This is accomplished by comparing the student’s performance to the
performances of other students whose scores are given as the norm. A student’s
score is therefore interpreted with reference to the scores of other students,
rather than an agreed criterion score. We call this technique a norm-referenced
test.
A second type of information provided by tests tells us
about a student proficiency in a set of skills. This is accomplished by
comparing a student’s performance to a certain criterion, which has been agreed
upon. The students must reach this level of performance to pass the test, and a
student’s score is therefore interpreted with reference to the criterion score,
rather than the scores of other students. We call this technique a
criterion-referenced test, (Bachman, 1991).
There has been a growth of interest in the communicative
testing approach. It considers language to be interactive, purposive,authentic,contextualized, and based and assessed in terms
of behavioral outcomes. The tests analyzed in this paper do not follow these
principles.
Madsen (1983) states that language testing has evolved
through three major stages, which reflect people’s attitudes towards the goals
of language teaching and language learning. These stages are summarized as follows:
1.
The Intuitive Stage focuses on subjective testing and is dependent on personal impressions of the
teachers.
2.
The Scientific Stage stresses objective evaluation focusing on language usage.
3.
The Communicative Stage emphasizes evaluation of language use rather than usage.
The communicative approach is based on the premise that
language is first and foremost a tool for communication. From this perspective,
tests designed to assess student proficiency can be tailored to include items
which possibly measure the students’ communicative ability in all levels of
language. Brown (1987) elaborates on the characteristics of a communicative
language test:
A
communicative test has to meet some rather stringent criteria. It has to test
for grammatical, discourse, sociolinguistic, and illocutionary competence as
well as strategic competence. It has to be pragmatic in that it requires the
learner to use language naturally for genuine communication and to
relate to thoughts and feelings, in short, to put authentic language to
use within a context. It should be direct (as opposed to indirect
tests which may lose validity as they lose content validity). And it should
test the learner in a variety of language functions. (p. 230)
An important observation in this quotation is that in
testing communicative performance, test items should measure how well students
are able to engage in meaningful, purposeful, and authentic communicative
tasks. Students must have a good performance linguistically and
communicatively. That is, they must have a good command of the components
involved in communication. The best exams in this communicative era, Madson
(1983) comments, are those that combine the various subskills necessary for the
exchange of oral and written ideas. He asserts that communicative tests need to
measure more than isolated language skills, to omprehensively indicate how well
a person can function in another language.
The common concepts needed in communicative testing
include reliability, validity, practicality, and authenticity. They fall under
the heading of desirable test characteristics. Marshall and Hales (1972) point
out that any test that is to be used effectively as a measuring instrument
should be reliable, valid, authentic, and practical. They warn that a drawback
in any of these test attributes can render a test
futile.
Reliability has to do with test consistency. Two tests
should give evidence that they are likely to produce the same results when
taken at different times by the same or similar students. That is, students who
obtain high scores on one set of items also obtain high scores on other sets of
equivalent items, and those who have a low score on one set of items also have
a low score on other sets of items, (Scannel and Tracy, 1975).
Validity in testing refers to whether the test measures
what it claims to measure, and whether it measures what was taught. For
example, a test which is designed to determine the extent to which a particular
group of students have mastered specific algebraic concepts will not be valid
when administered to a different group of students with the intent to determine
their performance in Elizabethan literature. Similarly, a test of English as a Second
Language (ESL) is not valid for students learning translation theory,
(Heaton, 1995).
Questions pertaining to the validity of a test include
what the test measures, does it measure what it wants
to measure, and whether it measures what was taught? Henning (1987) claims that
a good language test should consider how relevant is language behavior being
tested to the meeting of communicative needs and whether or not the users of
the test will accept its content and format.
Practicality or usability is the third important
attribute of a good test. It involves the economical use of time and expenses
in test construction, test administration, and test scoring. A test may be
highly reliable and valid and yet not be practical for use in a school-testing
program.
Another equally important feature of a good test is
authenticity. In communicative testing, authenticity is a key element in the
designing of materials and test items. It means assessing language behavior by
observing it in real, or at least realistic, language-use situations which
should be as authentic as possible, (Gronlund, 1985).
To sum up, much has recently been written about
communicative language testing. Discussions have focused on the desirability of
assessing the ability that takes part in the acts of communication. All
interests assume that it is communicative competence that teachers want to
test. Tests should therefore assess the learner’s communicative behavior and
not be based on linguistic items alone. In taking communicative tests,
student’s performance should be measured not only in terms of formal
correctness, but also primarily in terms of interaction, for the concern is not
how much the students know, but how well they can perform.
In order to accomplish the goals mentioned in this
paper, the researcher will select some excerpts from the 1418 and 1419 tests of
several departments in the Faculty of Arabic and Social Sciences. Then he will
make a content analysis of them to determine whether or not they are in
accordance with the requirements of a communicative language test. The analysis
will be section by section in order to identify the strong and weak points in
terms of explicitness of the questions and the nature of the tasks. In clear
terms, he will try to find out the effectiveness of these tests in relation to
the characteristics of reliability, validity, authenticity, and practicality
that constitute an effective communicative language test. These tests will be
analyzed and evaluated, based on the requirements of a good test as proposed by
the literature review. After the analysis and evaluation, the researcher hopes
to be able to point out the positive aspects of the testing system in the Faculty
of Arabic and Social Sciences as well as those aspects that need to be
improved. Finally, in the concluding section, he intends to suggest tentative
solutions to improve and update classroom tests in the Faculty to make them
suit the communicative teaching approach currently in use. Toward this end, a
summative test together with its objectives and ratings will be elaborated in
the hope of helping in-service teachers upgrade their strategies in testing
student’s achievement.
The 1418
tests of various departments (see appendix I) are divided into a number of
subcategories: Reading Comprehension, General Questions, Translation,
Vocabulary, Essay, and Grammar. Each division is somewhat related to the topics
discussed in the syllabus of the students. The time allotted to finish the
entire test is two hours. The whole test is rated at a hundred points: seventy
points are given for the final exam, and thirty points are awarded during the
semester. However, for those who do not attend the quizzes and homeworks given
during the semester, the test will be rated out of one hundred and they will be
given additional half an hour in their final. The distribution of the test
points indicates that the reading comprehension section is the most important
component of the test and has a total of twenty points. General questions come
next with fifteen points, and translation, vocabulary, essay, and grammar are
given the rest. The layout of the test covers two printed pages with the text
for reading comprehension and the general questions on the first, and the rest
of the questions on the second.
A general
observation about this test is that it is the type referred to as partially
subjective, which has abundant writing in various forms including translation,
essay, and open-ended answers? based on reading
comprehension and common sense. Generally speaking, such a test generally
measures linguistic competence. It is designed according to the traditional
testing approaches as reflected by the nature of the questions, the length of
the test, and the distribution of the points over the subcategories mentioned
earlier. A general analysis of the tests’ content reveals the following
aspects:
This section includes some questions that the test
takers have to answer in order to show evidence of their understanding of the
reading passage. Some questions revealed in the passage deal with information
that is obviously known. Hence, the testee is able to answer the questions
correctly without paying much attention to the reading passage. For example,
the question reads: “What is the weather like in
However, the problem which remains with the
comprehension questions is that the direction, “Read the following passage and
answer the questions”, is vaguely stated. Test takers may have difficulty
deciding whether they should answer the questions in accordance with the
information provided in the text or whether their answers should emanate from
personal experiences.
Some of the questions are subjective, requiring the
students to have advanced skills and strong background knowledge in the target
language. For example, the question reads: “How do we learn the lessons of
history?” (The History Department). The criteria for
judging general questions require several scorers; therefore, such questions
tend to be unreliable.
Moreover, some questions can be answered without
reference to the textbook. Because of this, it is difficult for the teacher to
determine whether or not good answers indicate good reading of their textbook.
The question reads: ”Why are traveler’s checks
useful?” (The Administration Department). The
broadness of the questions, however, offers more latitude to the students by
using all means to write the appropriate answer.
The vocabulary test requires that the students use a
number of words taken from their textbooks. in
meaningful sentences The question reads: “Use each of the following words in
meaningful sentences of your own” (The Geography Department). The trouble with
this method of vocabulary assessment however is that most users of language may
know the meaning of particular passive words without being able to properly use
them in meaningful sentences. This is what makes such a vocabulary test a
little too demanding for non-native speakers. This kind of test, however, may
be useful if the students are asked to compose a sentence out of active
vocabulary, that is, words that are needed to understand newspaper, periodical,
literature, and textbooks.
There is a variety of grammar questions in the tests.
Students are asked to do what is required between brackets (as the example
below) or according to the directions mentioned in the test paper. Success or
failure to do well in such traditional grammar questions gives little or no
account of students’ communicative ability and is not therefore an adequate
measure. In addition, the directions mentioned in the grammar questions are not
so well stated which may prevent students from successfully performing the
required task. For example, the question reads:
“Do as shown in
brackets: I (be) going to travel to Jeddah. (correct)” (The Psychology Department)
Here, the verb between parentheses can be placed in two
different tenses and the sentence remains true.
Students are asked to translate from English into Arabic
a portion of the reading passage or a series of sentences written in their exam
paper. The translation into Arabic shifts the emphasis from demonstrating
competence in English for showing the students’ skill in Arabic, and thus
targets the native language. The translation from Arabic into English is more
appropriate, mainly if the text to be translated presents a coherent unit and
makes sense.
A common problem with translation is that very often it
degenerates into interpretations. This means that students who achieve higher
scores are essentially those who have succeeded in interpreting the content of
the required translation. In addition to that, this kind of traditional
translation test does not adhere to the requirements of communicative language
testing.
The goal of the essay section is to determine the
students’ ability to write well. Students are asked to discuss one or two
topics written in their test paper. The question allows students to compose
their own relatively free and extended answers. However, the directions
sometimes do not indicate how lengthy or concise students should be. This
usually becomes a serious problem when essays of such different lengths are
corrected. For example, the question reads: “Write a short essay on the
following topic: The most important achievements done by Omar ibn Al-Khatab
during his caliphate.” (The Arabic Department). An
essay question like this can be regarded as both uneconomical and imprecise,
requiring two scorers to make the test reliable.
To summarize, after scrutinizing each section of the
1418 exam, the following conclusions can be drawn. First, the tests have both
positive as well as negative aspects. Their merits in relation to a language
communicative test are the following:
·
The tests are in
some sections economical; they take little paper and little time to design.
·
The tests tap into
the students’ prior knowledge, and, as its title suggests, they have at least
face validity.
·
The questions
measure linguistic skills.
·
The items induce
students to do the thinking and the reasoning tasks. Hence, to pass the tests,
students have to study their textbooks very well.
The problems with the 1418
tests are the following:
·
The results of each
test cannot lead to a generalization that the passing student is good at
English. Most items test only the linguistic ability of the learners. Thus,
good performers on these tests may still be poor communicators.
·
Students’ ability
to do the tasks relies heavily on their knowledge and memorization of their
textbooks. This strategy can make students passive and not creative.
·
The vagueness of
the directions may negatively affect students’ performance. As a result, they
may not be able to answer well.
·
Many items are
subjective. Because of this, the test can therefore be considered to lack
reliability. Different scorers or even the same scorer will give different
scores when the test is administered at different times.
·
Essay questions
lack practicality because they are time consuming to grade and difficult to
rate. They involve at least two raters in order to have a higher interrater
reliability.
·
The tests are not fully
authentic. The tasks do not completely reflect some communicative activities
that students are likely to come across in real-life situations.
The content of these exams (see appendix I) includes a
text of reading comprehension followed by a series of questions. Like the 1418
tests, the questions are grouped into some distinct categories: Reading
Comprehension, General Questions, Translation,Vocabulary,Essay
Questions, and Grammar. Each category is rated from 10 to 20 points and the
whole test is worthy of 70 points for those who attend the class regularly, and
100 points for those who do not. Those who attend are given two hours to finish
the test, and the others are given two hours and a half to finish.
The interesting thing about the 1419 test is that some
departments have adopted new techniques of testing. Some tasks, for example,
are assigned in True/False statements, multiple choice questions, matching
questions, and fill in the blanks. Some of the questions are answerable either
directly from their textbook or indirectly from the students’ personal
experience. Here again, like 1418 tests, reading comprehension and general
questions are the most highly rated with twenty points. The Vocabulary and
Grammar section and the Composition section come next with 15 points each,
followed by Translation with 10 points. The length of the test, as presented on
the original copy, is from one to two printed pages. This is determined by the
number and the diversity of the items.
Although the 1419 tests have some subjective tasks, the
same generalization cannot be made like the 1418 test. This is due to its
content and the various response types the test shows. The basic difference,
however, is that the 1419 tests are less elaborate and have more objective
items than the 1418 test. The general analysis of the component parts of the
1419 tests is as it follows:
These sections contain techniques different from 1418
tests for checking understanding of the reading material. They are: True/False
questions, Multiple Choice questions, questions answerable from the information
in the text, and questions which relate to the students’ personal experience.
The first two are referred to as objective questions while the last two are
subjective. Because students are provided with the right answer and are only
asked to select it from among other answers, objective questions are easier to
answer and to score than are subjective questions. Scoring for such questions
can be done easily because it involves no judgments as to the degrees of
correctness. Owing to this strategy, such tests tend to have superior
reliability and validity.
Concerning the subjective questions, even though such
questions do not allow reliability, they have, as Gronlund (1985) points out,
the advantage of providing a freedom of response which is sometimes needed in
measuring certain complex outcomes such as the ability to create, to organize,
to integrate, to express, and to demonstrate other similar behaviors that
require the production and the synthesis of ideas.
The directions for the “fill in the space” questions in
the 1419 tests are, however, vaguely formulated. They read, “Complete the following:
1- Novel is ________.” (The Arabic Department).
Since there is no indication of whether students should refer to the text or to
their personal experiences to determine the best answer, confusion may ensue.
Also, because the source is not specified, the opportunity for making
inferences cannot be excluded. Thus, by making inferences and analogies, these
questions can have more than one correct or best answer.
The advantage of True/False and Multiple Choice
questions is that they are pure tests. Short-answer items are very useful in
classroom achievement tests. They are relatively quick to write and easy to
answer. However, their limitation is that they can measure very little of the
students’ understanding. For example, the identification of the correct answer
by some students in the Accounting Department does not necessarily mean that
they have perfectly studied. They may guess the correct answers sometimes even
without reading their textbook.
The test of the Sociology Department is composed of open-ended
items which relate to the student’s personal experience. Such general questions
are sometimes recommended because they are interesting to answer. However, they
may cause students to write longer answers than necessary. Students are usually
eager to give their personal feelings on things that interest them. However,
because of proficiency of some students, they usually write at greater length
than what is exactly required, like an answer to the question: “What is a
social worker?” Thus, open-ended questions can become more time consuming and
more elaborate than the testing situation requires.
The third part of 1419 tests is that of vocabulary which
is done the same way as it was in the 1418 test. Students are asked to create
sentences of their own to demonstrate their comprehension of words used in the
textbooks. This technique of testing vocabulary does not reflect a truly
communicative task. A serious problem with this test is that many ESL students
may be able to conceptualize the meaning of a word without being able to
express it in writing.
The most useful type of testing vocabulary that some
departments made is matching which represents a problem-solving task for which
students use their cognitive skills. Students are given some words and asked to
find their meanings from the list. This is an economical method of testing
vocabulary, (The History Department).
Grammar items are concerned with finding out if the
students have mastered some particular grammatical points. Questions are
focused on some specific issues. Section III in the Geography Department is
about English verb tenses. It assesses students’ understanding of the simple
present tense. This is a more meaningful exercise and a better technique than
the one in the Accounting Department telling the student to study some
situations using may/might followed by the correct form of infinitive. However,
the questions revealed in grammar sections do not cover the rules discussed in students’
textbook. A grammar achievement test should include the full range of
structures that were taught throughout the course.
The technique for testing translation does not differ
from that of 1418. In fact, this method of testing translation does not develop
fluency in communication skills. On the contrary, it may impede communicative
fluency in language learning because interference between the first and the
target language can take place. In addition, in an achievement test for translation
it is very difficult to evaluate this traditional way of testing, because such
a test is highly unreliable.
Essay questions in the 1419 tests are minimized. It is
worth mentioning, however, that this kind of test is still widely used as a
means of measuring the writing skill. A student’s ability to organize ideas and
express them in his own words is a skill essential for real-life communication.
Hence, if a more reliable means of scoring the composition could be used,
controlled essay questions like the ones in the Arabic Department may be
recommended.
To conclude, once again, after examining each section of
the 1419 tests, the following conclusions can be made. Like 1418 tests, the
tests of 1419 have both positive as well as negative aspects. The positive
aspects of 1419 exams include the following points:
·
They test many
areas of language skills using a variety of new techniques and strategies.
·
They is a positive combination of both objective and subjective
questions.
·
There is an attempt
to test grammatical competence as well as communicative competence.
·
Some texts of
reading comprehension are appealing to the students because they are about a
topic on which they have high schemata or background knowledge.
·
The items are
related to the content of the syllabus. Because of this, the tests can be said
to have content and face validity.
·
The tests contain a
number of good communicative tasks, that is, those which induce students to do
meaningful and purposeful activities.
·
Some of the test
items are easy to score because of their objectivity.
The major problems with 1419 exams are:
·
The tests sometimes
contain unclear and confusing directions.
·
The tests as a
whole are not reliable because some of their items include subjective
questions.
·
The layout of the
test is not very good. Items, which measure the same learning outcome and
language aspects, should be grouped together.
·
Generally speaking,
the tests are not fully authentic.
In the light of the above analysis, it can be concluded
that neither of the two tests 1418 & 1419 fulfils the requirements of a
communicative language test. The 1418 tests adopted a traditional approach
whereas the tests administered in 1419 include features of both traditional
approaches and a little of communicative ones.
Since efforts have been made to improve English teaching
in the Faculty of Arabic and Social Sciences at
Since great efforts Have
been exerted to teach students to become good at communication skills, the
researcher believes that it would also be appropriate to introduce the desired
changes to the testing system in the Faculty. He also believes that adjusting
the goals of traditional testing to those of the syllabus will allow teachers
and students to better assess their own efforts and to accurately interpret the
efforts of the others.
As mentioned in the preceding section, neither one of
the two tests under study can be viewed as completely communicative. They
reflect testing principles and procedures of the traditional testing approach.
The two tests (1418 and 1419) can be greatly improved to adequately reflect
students’ excellence in the target language if the following points were taken
into account:
·
The materials and
the tasks should be authentic, that is, they have to reflect questions that
students may encounter in real-life situations. Researchers state that test
constructors should be on the alert for materials in newspapers, magazines, or
picture files that could serve as the basis for test items. This makes the
tasks become interesting and reduce students’ anxiety.
·
The tasks ought to
be entirely communicative. Rather than assigning purely grammatical tasks,
teachers should create situations engaging students to do meaningful
activities, which can reveal their grammatical as well as their communicative
performance as the following example:
Directions:
Your
“pen pal” Ahmed is sending a brief letter telling you about himself. Rewrite
his letter, adding punctuation and capitalization where needed. (The student
receives one point for each of the 42 punctuation markings and capitalizations
called for in the letter:
dear pen pal
permit
me to introduce myself my name is ahmed i am from syria i was born on september
16 1970 i like reading it is my best friend when i am lonely i just finished
the book titled seeking happiness my teacher who teaches me english loves me
and always tells me ahmed if you want to master punctuation you have to
practice a lot i believe all of you agree on that dont you
i am looking forward to
hearing from you soon
sincerely
ahmed
Such a test reflects the activities that students are
likely to undertake in real-life. This procedure is attractive; it is easy to
construct, to administer, and to score.
·
Teachers should use
simple directions, avoiding verbosity and unfamiliar grammar terminology. In
order to facilitate understanding of the instructions, examples should be
provided with some of the items rather than leaving students to guess.
·
The reading passage
should be what ordinary readers are likely to read in real-life situations,
such as authentic excerpts from newspaper articles, pictures, or short stories,
rather than artificial constructs designed for the mere purpose of testing.
With authentic materials, students should be induced to do some skimming or
scanning in order to answer questions.
·
In
essay questions, students should be rated not only on their use of the
grammatical structures and lexicon of the target language but also on their
coherent ideas and their organization. The challenge that remains difficult in
assessing an essay question is (1) eliciting the specific language constituents
that the teacher wishes to test, and (2) finding a way to evaluate it reliably,
(Harris, 1969).
·
Translation tests
should be contextualized allowing the teacher to in everyday life. In a
contextualized translation test, elements of a real conversation are deleted
from a dialogue. Students must attempt to restore the missing elements using a
native language version of the text as their guide. Such a test is valid,
reliable, and less time consuming than full translation. Omaggio (1986, p. 328)
states that: “This format [strategy] elicits specific features of the language
in a controlled fashioned and therefore has high diagnostic power.” The
following example is for freshmen students. The teacher can adapt it according
to the level of his examinees.
Directions:
Complete
the following passage on the left using the equivalent Arabic version on the
right as a guide.
|
Mohammed: Tom! ……………….……. To |
محمد: مرحباً
بك في السعودية يا توم! |
|
Tom: …………….. |
توم: أشكرك. |
|
Mohammed: When did
……………..………………………….… Abha? |
محمد: متى
وصلت إلى أبها؟ |
|
Tom: A week ……………. |
توم: منذ أسبوع. |
|
Mohammed: Did you
………….….. your ……………..? |
محمد: هل اصطحبت عائلتك
معك؟ |
|
Tom: I ……….…………… |
توم: أتيت بمفردي. |
To sum up, communicative tests must be concerned with
how language is used in communication. One basic principle to be observed in
designing them is that they should focus not only on the linguistic accuracy of
the learner’s language, but also on precise specifications of the learner’s
needs (communicative competence).
The students I will test are Saudi freshmen students
ranging from eighteen to twenty years in age. They have been studying English
for one whole semester. During the semester I have given them several formative
tests, separately addressing the four different language skills. My students’
competence will be assessed through the following summative test.
The test is designed to evaluate the global command of
the four language skills in terms of:
1.
Listening
A.
Discriminating between distinctive
factors of English phonology such as /p/ and /b/, /f/ and /v/, /š/ and
/č/ and others.
B.
Identifying the supra-segmental
aspects, mainly stress placement.
C.
Overall listening comprehension.
2.
Speaking
A.
Speaking with clear pronunciation
and use of various kinds of vocabulary words, and good command of English
rhythm.
B.
The ability to use various verb
tenses (simple present, present progressive, past, future, etc.).
C.
The ability to hypothesize and
persuade.
3.
A.
The ability to read following the
given punctuation.
B.
The ability to recognize synonyms
and antonyms.
C.
The ability to scan a passage and to
infer.
D.
Overall reading comprehension.
4.
Writing
A.
The ability to write a paragraph
using punctuation and capitalization.
B.
Mastery of the orthography of
English.
C.
Vocabulary and grammatical
structure.
Ratings
The grading scale is criterion-referenced. A grade of
“1” will indicate “superior “ command of English language according to a 90% or
better criterion, “2”, “above average” command according to an 80% criterion,
“3”, “average” command according to a 70% criterion, and “4”, “poor” command
for anything below a 70% criterion. Since speaking and writing involve more
than on element, more detailed rating of these two skills will be done
according to the following:
Speaking:
“1” (superior) will be given when a
student’s speech is effortless and smooth with right English rhythm,
(native-like pronunciation), good command of structure. and
vocabulary. “2” (above average) will be awarded when a student’s speech
contains no conspicuous mispronunciation but would not be taken as something from
a native speaker. Errors in structure will be quite rare and the speech will be
even and fluent with an occasional pause. “3” (average) will be given when
there is a perceptible foreign accent and occasional mispronunciation which
still do not hinder communication. Grammatical errors should not overly disturb
a native listener. The speaker’s discourse should capture the gist of the
topic. “4” (poor) will be awarded when the speech is halting, fragmented, and
jerky. A very heavy accent, making understanding impossible.
Vocabulary and structure are so limited as to impede listener comprehension.
Writing:
“1” (
Summative listening test
Directions:
this test will be completed in the language lab.
Directions for the students are “Listen to the following sentences. Draw a
circle around the word you think you hear. Example: The nurse gave him the
(bill - pill), Answer: pill.” Students will have five
minutes to respond to each of the two parts of the test. Each part will be
worth a possible ten points, with each item worth two points. The total
examination will be worth twenty points.
Examples follow.
Answer sheet
[part
a: discriminating vowels]
|
Response |
|
Stimulus
materials (audio-recorded) |
|
1.
A B |
|
A-
cold B- gold “Are you getting cold?” |
|
2.
A B |
|
A-
race B-
raise “I’ll raise you to the top.” |
|
3.
A B |
|
A-
pear B-
bear “She can’t eat a whole pear.” |
|
4.
A B |
|
A-
glass B-
grass “Please don’t walk on the
grass.” |
|
5.
A B |
|
A-
win B-
wean “It is time to wean the child.” |
[part
b: discriminating words]
You will hear Mary’s mother telling
her to set the table. Write the number of each statement in the circle
corresponding to the item she mentioned.
Number 1 is solved for you.
|
1-
We will
use napkins today. 2-
Take the
large plate. 3-
Each
person should have one fork. 4-
Put the
knife on the table. 5-
Eat what
is left on your plate. 6-
Hold the
cup carefully. |
|
The grading scale for the
test will be:
|
Points |
|
Rating |
|
18-20 |
|
1
(superior) |
|
16-17 |
|
2
(above average) |
|
14-15 |
|
3
(average) |
|
0-13 |
|
4
(poor) |
Speaking summative test
Directions: This part of the test is to be completed in the language lab. Record all
of your answers on the tape. Limit your answer to 4-5 minutes per item. Four
points are possible for each of the skills of pronunciation, vocabulary use,
structure, and fluency. Both parts of this test, together, will be worth a
possible 32 points.
(1)
Describing a picture: Make up a story about the picture
in front of you. Who are these people and what are they doing? Do you like this
activity? Why? What did you do when you were the age of the students? What will
you do when you are the teacher’s age? (There will be a real
colorful picture in which there are elementary students drawing and their
teacher watching them).
(2) Who is
more important? A scientist or
an artist? Who would you want to be? Justify
your answer.
The grading scale for the
test will be:
|
Points |
|
Rating |
|
28-32 |
|
1
(superior) |
|
25-27 |
|
2
(above average) |
|
22-24 |
|
3
(average) |
|
0-21 |
|
4
(poor) |
Directions:
Look at the picture below. (There will be a picture of a man and around him,
there is a vacuum cleaner and its parts scattered on the floor). Read the
passage that follows carefully. Then answer the questions that appear after the
passage. You have thirty minutes to complete this test. Each item is worth a
possible two points; the test is worth sixteen points in total.
Mr. Scott
thought that he was very good at fixing household appliances when they broke,
so when Mrs. Scott told him that she needed a new vacuum cleaner, he said,
“What’s wrong with the old one? I can easily fix it.”
Mr. Scott
fixed the vacuum cleaner, but the same thing happened again several times,
until one day, after he had unscrewed all the parts, and had gone to have
lunch, Mrs. Scott added a few extra pieces to the pile on the floor.
“Do you
know,” she said to her friend Mrs. Brown, the next morning, “if I’d just taken
away a few pieces, he’d have noticed that they were missing, and would have
gone out and bought some more, But when he couldn’t find places for all the
pieces that were on the floor, he gave up and agreed to buy me a new machine.”
[Part a]
1-
This anecdote is:
a. humorous
b. sad
c. scientific
2-
In the last line of the story, the
word “give up” means:
a. surrender
b. offer
c. become
angry
3-
Why did Mr. Scott agree to buy a new
machine?
a. Because
he wants to please his wife.
b. Because
he wanted to save time by not fixing it.
c. Because
he did not know how to fix the old machine.
[Part b]
Match the words on the left to the corresponding words on
the right, by drawing lines between the two.
1. Mrs. Brown a. Didn’t want to buy a new
vacuum cleaner for his wife.
2. Mr. Scott b.
had been added by Mrs. Scott.
3. Mrs. Scott c.
needed fixing on more than one
occasion.
4. The extra pieces d.
was a friend of Mrs. Scott’s.
5. The vacuum cleaner e.
was smart enough to get a new
vacuum cleaner.
The grading scale for the
reading test will be:
|
Point |
|
Rating |
|
14-16 |
|
1
(superior) |
|
12-13 |
|
2 (above
average) |
|
11-12 |
|
3
(average) |
|
0-10 |
|
4 (poor) |
Writing
summative test
Directions:
Choose one of the following questions. Then write a paragraph on the topic. You
have fifteen minutes to write the paragraph. The total test is worth 64 points.
16 points will be possible for each of the following: (1) freedom from errors
in grammar, spelling, punctuation, and capitalization, (2) how you organize the
paragraph, (3) appropriateness of vocabulary use, and (4) how you link your
ideas together.
1.
Why
I want to study a foreign language.
2.
Write
what you see in the picture below. Use complete sentences. (students see a picture
of a car accident and an ambulance near the damaged car trying to help the
injured people)
The grading scale for the
reading test will be:
|
Points |
|
Rating |
|
57-64 |
|
1
(superior) |
|
51-56 |
|
2
(above average) |
|
44-55 |
|
3
(average) |
|
0-43 |
|
4
(poor) |
Conclusion
This criterion-referenced test measures student’s
competence in the four language skills of listening, speaking, reading, and
writing. Since it is possible for a student to be strong in reading, for
example, and weak in speaking, it is necessary to retain separate scores for
each skill tested in order to have an accurate profile of each student’s
English language skills. But an overall score is also necessary to provide an
appropriate method of deciding whether a student passes or fails the test as a
whole. Therefore, the four subtest scores will be combined (with four points
possible for each subtest) for a total possible score of 16 points. The grading
scale for the overall test, then, is:
|
Points |
|
Rating |
|
|
14-16 |
|
1
(superior) |
Passing |
|
12-13 |
|
2 (above
average) |
Passing |
|
11-12 |
|
3
(average) |
Passing |
|
0 -10 |
|
4 (poor) |
Failing |
Bachman, L. F.,
(1991). Fundamental Considerations in Language Testing.
Brown, H. D.,
(1987). Principles of Language Learning and Teaching.
NJ: Prentice Hall Regents.
Gronlund, N. E.,
(1985). Measurement and Evaluation in Teaching.
NY: Macmillan Publishing Company.
Hammerly,
H., (1985). An Integrated
Theory of Language Teaching and its Practical Consequences.
Harris, D. P.,
(1969). Testing English as a Second Language.
Heaton, J. B.,
(1995). Writing English Language Tests.
Henning, G.,
(1987). A Guide to Language Testing.
Madsen, H. S.
(1983). Techniques in Testing.
Marshall,
J. C., & Hales, L. W. (1972). Essentials
of Testing.
Omaggio, A. C., (1986). Teaching
Language in Context: Proficiency-Oriented Instructions.
Richards, J. C.
(1990). The Language Teaching Matrix.
Scannell, D. P.,
& Tracy, D. B., (1975). Testing and Measurement in the
Classroom.
Smith,
F. M., & Adams, S., (1972). Educational
Measurement for the Classroom Teacher.
Weir, C.
(1993). Understanding & Developing Language
Tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|