INFORMAL ASSESSMENT INEDUCATIONAL EVALUATION: (2024)

INFORMAL ASSESSMENT IN EDUCATIONAL EVALUATION: IMPLICATIONS FOR BILINGUAL EDUCATION PROGRAMS

Cecilia Navarete; Judith Wilde; Chris Nelson; Robert Martínez; Gary Hargett

INTRODUCTION (1)

Central to the evaluation of any educational program are the instruments and procedures used to assess that program's effects. Many programs use commercially available standardized tests to measure academic achievement or language proficiency. There are good reasons for doing so. Standardized tests usually are administered annually by school districts, providing a ready source of achievement data. Test publishers provide information about the test's validity and reliability, fulfilling another requirement of evaluation. And, standardized test scores generally have been accepted by educators and the community.

However, recent research on student achievement has focused on problems associated with over-reliance on standardized tests (e.g., Haney & Madaus 1989; Marston & Magnusson 1987; Pikulski 1990; Shepard 1989). Alternative approaches to assessing student progress have been suggested that address many of the problems associated with standardized tests (e.g., Marston & Magnusson 1987; Rogers 1989; Wiggins 1989; Wolf 1989). The purpose of this guide is to review some of the problems associated with standardized testing, describe alternative assessment approaches, and discuss how these approaches might be employed by bilingual educators to supplement the use of standardized tests.

CONCERNS WITH STANDARDIZED TESTING

Criticisms of standardized tests seem to have grown in proportion to the frequency with which, and the purposes for which, they are used (Haney & Madaus 1989). Pikulski (1990) suggests that the greatest misuse of standardized tests may be their overuse. Many districts now administer such tests at every grade level, define success or failure of programs in terms of test scores, and even link teacher and administrator salaries and job security to student performance on standardized test performance. Three areas often criticized in regard to standardized tests are content, item format, and item bias. Standardized tests are designed to provide the best match possible to what is perceived to be the "typical" curriculum at a specific grade level. Because a bilingual education program is built on objectives unique to the needs of its students, many of the items on a standardized test may not measure the objectives or content of that program. Thus a standardized test may have low content validity for specific bilingual education programs. In such a situation, the test might not be sensitive to actual student progress. Consequently, the program, as measured by this test, would appear to be ineffective.

Standardized achievement tests generally rely heavily on multiple-choice items. This item format allows for greater content coverage as well as objective and efficient scoring. However, the response required by the format is recognition of the correct answer. This type of response does not necessarily match the type of responses students regularly make in the classroom, e.g., the production or synthesis of information. If students are not used to responding within the structure imposed by the item format, their test performance may suffer. On the other hand, students may recognize the correct form when it is presented as a discrete item in a test format, but fail to use that form correctly in communication contexts. In this case, a standardized test may make the student appear more proficient than performance would suggest.

Further, some tests have been criticized for including items that are biased against certain kinds of students (e.g., ethnic minorities, limited English proficient, rural, inner-city). The basis for this criticism is that the items reflect the language, culture, and/or learning style of the middle-class majority (Neill & Medina, 1989). Although test companies have attempted to write culture-free items, the removal of questions from a meaningful context has proved problematic for minority students.

Thus, there are strong arguments in favor of educators considering the use of alternative forms of assessment to supplement standardized test information. These alternate assessments should be timely, not time consuming, truly representative of the curriculum, and tangibly meaningful to the teacher and student. Techniques of informal assessment have the potential to meet these criteria as well as programmatic requirements for formative and summative evaluations. Validity and reliability are not exclusive properties of formal, norm- referenced tests. Informal techniques are valid if they measure the skills and knowledge imparted by the project; they are reliable if they measure consistently and accurately.

DEFINING INFORMAL ASSESSMENT

"Formal" and "informal" are not technical psychometric terms; therefore, there are no uniformly accepted definitions. "Informal" is used here to indicate techniques that can easily be incorporated into classroom routines and learning activities. Informal assessment techniques can be used at anytime without interfering with instructional time. Their results are indicative of the student's performance on the skill or subject of interest. Unlike standardized tests, they are not intended to provide a comparison to a broader group beyond the students in the local project.

This is not to say that informal assessment is casual or lacking in rigor. Formal tests assume a single set of expectations for all students and come with prescribed criteria for scoring and interpretation. Informal assessment, on the other hand, requires a clear understanding of the levels of ability the students bring with them. Only then may assessment activities be selected that students can attempt reasonably. Informal assessment seeks to identify the strengths and needs of individual students without regard to grade or age norms.

INFORMAL ASSESSMENT TECHNIQUES

Methods for informal assessment can be divided into two main types: unstructured (e.g., student work samples, journals) and structured (e.g., checklists, observations). The unstructured methods frequently are somewhat more difficult to score and evaluate, but they can provide a great deal of valuable information about the skills of the children, particularly in the areas of language proficiency. Structured methods can be reliable and valid techniques when time is spent creating the "scoring" procedures.

While informal assessment utilizes open-ended exercises reflecting student learning, teachers (and students) can infer "from the mere presence of concepts, as well as correct application, that the student possesses the intended outcomes" (Muir & Wells 1983, p.95). Another important aspect of informal assessments is that they actively involve the students in the evaluation process-they are not just paper-and-pencil tests.

Unstructured Assessment Techniques

Unstructured techniques for assessing students can run the gamut from writing stories to playing games and include both written and oral activities. The range of possible activities is limited only by the creativity of the teacher and students. Table 1 on page 4 presents several illustrative unstructured assessments/techniques.

Structured Assessment Techniques

Structured assessments are planned by the teacher much more specifically than are unstructured assessments. As the examples listed and described in Table 2 on page 6 indicate, structured assessment measures are more varied than unstructured ones. Indeed, some of them are types of tests of one kind or another. In each case, definitely "right" and "wrong," "completed" or "not completed" determinations can be made. Consequently, the scoring of structured assessment activities is relatively easier compared to the scoring of unstructured assessment activities.

Table 1 - Types of Unstructured Assessment Techniques

Writing Samples - When students write anything on specific topics, their products can be scored by using one of the techniques described in Table 3. Other creative writing samples that can be used to assess student progress include newspapers, newsletters, collages, graffiti walls, scripts for a play, and language experience stories.

Homework - Any written work students do alone, either in class or in the home, can be gathered and used to assess student progress. With teacher guidance, students can participate in diagnosing and remediating their own errors. In addition, students' interests, abilities, and efforts can be monitored across time.

Logs or journals - An individual method of writing. Teachers can review on a daily, weekly, or quarterly basis to determine how students are perceiving their learning processes as well as shaping their ideas and strengths for more formal writing which occurs in other activities.

Games - Games can provide students with a challenging method for increasing their skills in various areas such as math, spelling, naming categories of objects/people, and so on.

Debates - Students' oral work can be evaluated informally in debates by assessing their oral presentation skills in terms of their ability to understand concepts and present them to others in an orderly fashion.

Brainstorming - This technique can be used successfully with all ages of children to determine what may already be known about a particular topic. Students often feel free to participate because there is no criticism or judgment.

Story retelling - This technique can be used in either oral or written formats. It provides information on a wide range of language-based abilities. Recall is part of retelling, but teachers can use it to determine whether children understood the point of the story and what problems children have in organizing the elements of the story into a coherent whole. This also can be used to share cultural heritage when children are asked to retell a story in class that is part of their family heritage.

Anecdotal - This method can be used by teachers to record behaviors and students' progress. These comments can include behavioral, emotional, and academic information. For instance, "Jaime sat for five minutes before beginning his assignment." These should be written carefully, avoiding judgmental words.

Naturalistic - Related to anecdotal records, this type of observation may take the form of notes written at the end of the day by a teacher. They may record what occurred on the playground, in the classroom, among students, or may just reflect the general classroom atmosphere.

Table 2 - Types of Structured Informal Assessments

Checklists - Checklists specify student behaviors or products expected during progression through the curriculum. The items on the checklist may be content area objectives. A checklist is considered to be a type of observational technique. Because observers check only the presence or absence of the behavior or product, checklists generally are reliable and relatively easy to use. Used over time, checklists can document students' rate and degree of accomplishment within the curriculum.

Cloze Tests - Cloze tests are composed of text from which words have been deleted randomly. Students fill in the blanks based on their comprehension of the context of the passage. The procedure is intended to provide a measure of reading comprehension.

Criterion-referenced Tests - Criterion-referenced tests are sometimes included as a type of informal assessment. This type of test is tied directly to instructional objectives, measures progress through the curriculum and can be used for specific instructional planning. In order for the test to reflect a particular curriculum, criterion-referenced tests often are developed locally by teachers or a school district. Student performance is evaluated relative to mastery of the objectives, with a minimum performance level being used to define mastery.

Rating Scales - This is an assessment technique often associated with observation of student work or behaviors. Rather than recording the "presence" or "absence" of a behavior or skill, the observer subjectively rates each item according to some dimension of interest. For example, students might be rated on how proficient they are on different elements of an oral presentation to the class. Each element may be rated on a 1 to 5 scale, with 5 representing the highest level of proficiency.

Questionnaires - A questionnaire is a self-report assessment device on which students can provide information about areas of interest to the teacher. Questionnaire items can be written in a variety of formats and may be forced-choice (response alternatives are provided) or open-ended (students answer questions in their own words). Questionnaires designed to provide alternative assessments of achievement or language proficiency may ask students to report how well they believe they are performing in a particular subject or to indicate areas in which they would like more help from the teacher. One type of questionnaire (which assumes that the student can read in the native language) requests that students check off in the first language the kinds of things they can do in English. For a questionnaire to provide accurate information, students must be able to read the items, have the information to respond to the items, and have the writing skills to respond.

Miscue Analysis - An informal assessment of strategies used by students when reading aloud or retelling a story. Typically, students read a grade-level passage (e.g., 250 words) while a judge follows along with a duplicate copy of the passage. The student may be tape recorded. Each time an error occurs, the judge circles the word or phrase. A description of the actual error can be taken from the tape after the session and analyzed for errors in pronunciation, sentence structure, vocabulary, use of syntax, etc. (see Goodman 1973).

Structured Interviews - Structured interviews are essentially oral interview questionnaires. Used as an alternative assessment of achievement or language proficiency, the interview could be conducted with a student or a group of students to obtain information of interest to a teacher. As with written questionnaires, interview questions could be forced-choice or open- ended. Because the information exchange is entirely oral, it is important to keep interview questions (including response alternatives for forced-choice items) as simple and to-the-point as possible.

GUIDELINES FOR INFORMAL ASSESSMENT

In order to be effective, informal assessment activities must be carefully planned. With appropriate planning, they can be reliable and valid, and they can serve diagnostic purposes as well as formative and summative evaluation purposes within all types of bilingual education programs. General guidelines are presented here to ensure these qualities. These guidelines apply both to formal and informal assessments.

Validity and Reliability

Standardized tests often are selected because their technical manuals report validity and reliability characteristics. However, if the content of these tests does not match the instructional objectives of the project, their validity is negated. For example, many standardized tests include structural analysis skills as part of the reading or language arts sections. If a bilingual education project does not teach structural analysis skills, concentrating instead on the communicative aspects of reading/writing, such a test may not be valid for that particular project.

The validity of informal measures can be established by demonstrating that the information obtained from a given technique reflects the project's instructional goals and objectives. If, for example, the project is teaching communicative writing, a collection of holistically scored writing samples would be a valid measure. Therefore, a first step toward validating the use of informal assessment measures is a clear statement of curricular expectations in terms of goals and objectives.

Reliability, in its purest sense, refers to the ability of a measure to discriminate levels of competency among persons who take it. This is accomplished through the consistent application of scoring criteria. As with validity, the reliability of informal measures can be established by a clear statement of the expectations for student performance in the curriculum and ensuring that teachers apply consistent criteria based on those expectations. If the informal measures accurately represent students' progress, and if they accurately distinguish the differential progress made by individual students, they are reliable.

Scoring Procedures

Consideration has to be given to the reliability and validity of the scoring procedures used in assessment, both formal and informal. Among critical issues to be addressed are:

1. The validity of the judgment may be limited by the heavy dependency on the opinion of raters. To ensure high reliability, raters must be trained to meet a set criterion (e.g., when judging ten individuals, raters should rate eight of them similarly).

2. The scores must be specific to the learning situation. The scoring procedure must match the exercise or performance. To ensure this match, the purpose for assessment and the content to be assessed must first be decided. Agreement should also be sought on the descriptors developed for each scoring category to be used.

3. Scoring procedures may be time consuming. To ensure success, the commitment and support of project and school personnel must be sought. Training and practice must be offered to the raters.

Scoring procedures utilized in unstructured assessment activities can be used to:

measure progress and achievement in most content areas;
measure literacy skills such as oral, reading, and written production;
develop summative and formative evaluations;
make an initial diagnosis of a student's learning;
guide and focus feedback on students' work;
measure students' growth over time or for specific periods;
determine the effectiveness of an instructional program;
measure group differences between project students and nonproject comparison groups;
analyze the performance of an individual student; and
correlate student outcomes with formal, standardized tests of achievement and language proficiency.

Table 3 - Scoring Assessments for Unstructured Activities

Holistic - A guided procedure for evaluating performance (oral or written) as a whole rather than by its separate linguistic, rhetorical, or informational features. Evaluation is achieved through the use of a general scoring guide which lists detailed criteria for each score. Holistic judgments are made on the closest match between the criteria and the students' work. Criteria typically are based on a rating scale that ranges from 3 to 10 points (3 3D low quality level and 10 3D high quality level).

Primary Trait - A modified version of holistic scoring; the most difficult of all holistic scoring procedures, its primary purpose is to assess a particular feature(s) of a discourse or a performance (oral or written) rather than the students' work as a whole. Secondary level traits also can be identified and scored using this approach.

Analytic - A complex version of holistic scoring; students' work is evaluated according to multiple criteria which are weighted based on their level of importance in the learning situation. For example, a writing sample can be assessed on organization, sentence structure, usage, mechanics, and format. Each criterion is rated on a 1 to 5 scale (1 = low and 5 = high). A weighting scheme then is applied.

For example, the organization of an essay can be weighted six times as much as the format; sentence structure five times as much as format; and so on. This procedure can be used for many purposes such as diagnostic placement, reclassification and exiting, growth measurement, program evaluation, and educational research.

Holistic Survey - Uses multiple samples of students' written work representing three of five discourse modes: expressive, narrative, descriptive, expository, and argumentative. Prior to scoring, students select topics, repeat oral directions to demonstrate understanding of the task, and have the opportunity to revise and edit their work before submitting it for evaluation. The scoring procedures used in the survey can include primary trait, analytic, or other holistic scoring devices relevant to the goals and objectives of the written assignment.

General Impression Markings - The simplest of the holistic procedures. The raters score the papers by sorting papers along a continuum such as excellent to poor, or acceptable to unacceptable. Critical to this approach is that raters become "calibrated" to reach consensus by reading and judging a large sample of papers.

Error Patterns - The assessment of students' written work or mathematical computations. Scoring is based on a criterion that describes the process or continuum of learning procedures that reflect understanding of the skill or concept being assessed. A minimum of three problems or written assignments are collected and assessed to ensure that a student's error is not due to chance.

Assigning Grades - The "old standard." Students are assigned a number or letter grade based on achievement, competency, or mastery levels. Grades can be pass-fail or can reflect letter grades, such as A to F. The major limitation of this scoring procedure is that grades do not provide any information on the strengths or weaknesses in a content area.

COMBINING ASSESSMENTS FOR EVALUATION

Different methods of combining types of structured and unstructured informal assessments and associated scoring procedures appear in the literature. While these approaches have different labels and differ somewhat in philosophy, all are offered as alternatives to standardized testing and use informal assessment to measure student performance in the context of the curriculum.

1. Curriculum-based assessment uses the "material to be learned as the basis for assessing the degree to which it has been learned" (Tucker 1985, p. 199). This approach employs informal measures such as writing samples, reading samples from the basal series, and teacher-made spelling tests from the basal series. It has received a good deal of attention in the special education literature (e.g., Deno 1985; Marston & Magnusson 1987) and was developed, in part, in response to the need to address performance criteria specified in students' individualized education plans (IEPs).

2. Ecological assessment (e.g., Bulgren & Knackendoffel 1986) evaluates student performance in the context of the environment. Sources of such data include student records, student interviews, observations, and collections of student products. Ecological assessment takes into account such things as the physical arrangement of the classroom; patterns of classroom activity; interactions between the teacher and students and among students; student learning styles; and expectations of student performance by parents, peers, and teachers.

3. Performance assessment (Stiggins 1984) provides a structure for teachers to evaluate student behavior and/or products. Assessments can take any form, depending on the behavior or product of interest, and are designed according to four considerations: (1) a decision situation that defines the basic reason for conducting the assessment; (2) a test activity or exercise to which the student responds; (3) the student response; and (4) a rating or judgment of performance.

ABOUT THE AUTHORS

The authors are on the staff of the Evaluation Assistance Center (West) at the University of New Mexico.

Cecilia Navarete, Senior Research Associate, Received her Ph.D. In Education from Stanford University.

Judith Wilde, Methodologist, received her Ph.D. in the Psychological Foundations of Education from the University of New Mexico.

Chris Nelson, Senior Research Associate, received her Ph.D. in Educational Psychology and Research from the University of Kansas.

Robert Martínez, Senior Research Associate, received his Ph.D. in Educational Research from the University of New Mexico.

Gary Hargett, Research Associate, is a doctoral candidate in Education at the University of Washington.

As an expert in educational evaluation, particularly in the context of bilingual education programs, I bring a wealth of knowledge and experience to the discussion. My expertise is rooted in extensive research, practical application, and a deep understanding of the challenges and nuances within the field.

The article "INFORMAL ASSESSMENT IN EDUCATIONAL EVALUATION: IMPLICATIONS FOR BILINGUAL EDUCATION PROGRAMS" by Cecilia Navarete, Judith Wilde, Chris Nelson, Robert Martínez, and Gary Hargett is a comprehensive exploration of the role of informal assessment in educational evaluation, with a focus on bilingual education programs. The authors critically examine the limitations and concerns associated with standardized testing and advocate for alternative assessment approaches.

Key Concepts Discussed in the Article:

1. Criticisms of Standardized Testing:

Overreliance on standardized tests is highlighted as a major concern in educational evaluation.
Issues include the misuse of test scores to determine program success, teacher evaluations, and job security.
Criticisms extend to content, item format, and item bias in standardized tests.

2. Content Validity in Bilingual Education Programs:

Standardized tests may lack content validity for bilingual education programs with unique objectives.
The article emphasizes the importance of aligning assessments with the specific needs and content of bilingual education.

3. Issues with Item Format in Standardized Tests:

Standardized tests often heavily rely on multiple-choice items, which may not align with real-world classroom activities.
The discrepancy between recognizing correct answers in a test format and using that knowledge in communication contexts is highlighted.

4. Item Bias in Standardized Tests:

Some standardized tests are criticized for including items biased against certain student groups, such as ethnic minorities and limited English proficient students.
Efforts to create culture-free items have proven challenging.

5. Alternative Assessment Approaches:

The article advocates for alternative approaches to assessing student progress in bilingual education programs.
Informal assessment techniques are introduced as timely, representative of the curriculum, and meaningful to both teachers and students.

6. Types of Informal Assessment:

Informal assessment is categorized into unstructured (e.g., student work samples, journals) and structured (e.g., checklists, observations) methods.
Unstructured techniques actively involve students in the evaluation process and are not limited to paper-and-pencil tests.

7. Examples of Unstructured Assessment Techniques:

Writing samples, homework, logs or journals, games, debates, brainstorming, story retelling, anecdotal, and naturalistic assessments are discussed as unstructured techniques.

8. Examples of Structured Informal Assessments:

Checklists, cloze tests, criterion-referenced tests, rating scales, questionnaires, miscue analysis, structured interviews are presented as structured informal assessment techniques.

9. Guidelines for Informal Assessment:

Informal assessment activities must be carefully planned to ensure validity, reliability, and diagnostic value.
Guidelines apply to both formal and informal assessments.

10. Scoring Procedures for Informal Assessment:

Different scoring procedures for unstructured activities, such as holistic, primary trait, analytic, holistic survey, general impression markings, error patterns, and assigning grades, are discussed.

11. Combining Assessments for Evaluation:

The article introduces different methods of combining types of structured and unstructured informal assessments, including curriculum-based assessment, ecological assessment, and performance assessment.

12. Authors' Background:

The article concludes with information about the authors, who are affiliated with the Evaluation Assistance Center (West) at the University of New Mexico, highlighting their qualifications and expertise in education and research.

In summary, the article provides a comprehensive exploration of the challenges associated with standardized testing in bilingual education programs and offers valuable insights into the implementation of informal assessment techniques as alternatives. The authors' backgrounds underscore their qualifications in the field of educational evaluation.

INFORMAL ASSESSMENT IN EDUCATIONAL EVALUATION: (2024)

INFORMAL ASSESSMENT IN EDUCATIONAL EVALUATION: IMPLICATIONS FOR BILINGUAL EDUCATION PROGRAMS