“You should look at this section if you have only a limited idea of the key steps required in developing high quality cognitive items.”
The purpose of an assessment is to gather valid and reliable information on where students are in their learning. Many tests and examinations include cognitive items (the technical term for questions in an assessment) that are not able to collect accurate data on student learning. This results in inaccurate and misleading conclusions about student learning.
The reason that cognitive items are often poor quality is not due to a lack of understanding of the subject. Most items are written by teachers who understand their subject very well. Writing good items, however, requires more than subject knowledge alone. Developing high quality cognitive items requires practice and technical knowhow of the principles of robust assessment.
The first document that an item writer needs to consult before starting writing items is the assessment framework. The assessment framework includes information about the purpose of the assessment, the target group to be assessed, the kind of test instrument to be developed and the test specifications or blue-print. During the process of item writing, the assessment framework guides the item writer in deciding the key content areas to be assessed and the balance of key skills and difficulty level.
The first step in writing an item is the development or selection of an authentic stimulus. This might be a text, situation, a picture, a diagram or a graph. The job of the stimulus is to provide an interesting context for students, so it should be appropriate to their grade level and age. A very common mistake is to use a stimulus that is too difficult for students, for example using difficult words or complex diagrams.
The second step is writing the question (referred to as item stem). The type of item will have been defined in the assessment framework and is either closed (for example, a multiple-choice item) or open (for example, a short or long response).
If writing a multiple-choice item (referred to as an MCQ), there should be around four or five response options. One of these should be correct (the key) and the others should be incorrect (distractors). Importantly, the distractors should not be random but should be specifically created to highlight a misconception.
As mentioned above, the creation of high quality test items requires a great deal of skill and it is very important that item writers receive training. It is also very important that all test items go through a detailed quality assurance process. This includes:
- Panelling (when a group of item writers review, discuss and critique items);
- Cognitive laboratories (also called think-alouds) in which small numbers of students work slowly through items and say what they are thinking; and
- Piloting, in which a group of students complete the items under test conditions.
All of these processes are likely to lead to revisions to cognitive items, making sure that they are as high quality as possible. This means that they will be able to collect accurate information on student skills and knowledge.
To find out more about developing cognitive items, move to #Intermediate.
” You should look at this section if you already know the key steps in developing good quality items but would like to deepen your understanding. “
All cognitive items are developed to map to an assessment framework. This document defines the purpose of the assessment, the target group to be assessed, the kind of test instrument to be developed and the test specifications or blue-print. A key element in the assessment framework is to specify what response formats should be used. There are number of options, each of which have advantages and disadvantages.
Perhaps the most common response formats are commonly referred to as multiple-choice questions (MCQs). They are particularly popular in large scale assessment as there is no subjectivity involved in scoring them (and with digital assessment scoring is done automatically). This makes them ideal for comparisons of performance. They are also quick for students to respond to. This means that more content can be covered in less time. They can also be written so that incorrect options (distractors) highlight misconceptions. This information can be used to better target teaching and learning activities.
The drawback of MCQs is that they do not allow students to express themselves. Also, the skill involved in writing good MCQs is high, and this means that many MCQs are poor quality, confusing students rather than collecting useful data on their performance.
Another common type of response format is known as open response (or constructed response) where the students are expected to construct or write their own answers in response to an item. This might be as little as one word or number (in a closed constructed response) or a short explanation, equation or diagram (in a short constructed response).
In both of these response types, it is essential to have a robust marking guide (also known as a scoring guide or rubric). This identifies what is, and is not, an acceptable response, and may include a range of score points. For example if the item stem is ‘12 ÷ 3 =’ then the marking guide should indicate whether ‘4’, ‘four’, and ‘12/3’ are acceptable responses. It is common to give two points for a fully correct response and one point for a partially correct response (for example where the correct response is given but spelled incorrectly, or where the correct equation is used but the answer is wrong).
The benefit of using constructed response items is that students are free to provide their own responses. This can help to highlight their thought processes or ability to express themselves. The disadvantage is that marking may not be done consistently, making it difficult to compare data. Ensuring consistent marking means not only creating a marking guide, but also training markers, monitoring their work and – often – having a proportion of responses marked by two different people. This can be time-consuming and expensive.
In many assessment activities the use of extended response items is common. These may require students to write essays, describe a scientific experiment or show lengthy mathematical problem solving. These are valuable in allowing students to demonstrate their skills and knowledge in a free format. The challenge of marking these consistently, however, is much greater (even with a very detailed marking guide). This means that extended response items are often not used in large scale assessment.
Whatever items are developed, the key quality assurance processes of panelling, cognitive laboratories and piloting should not be missed – these are important steps in ensuring that only high quality items are included in an assessment instrument.
To find out more about developing cognitive items, move to #Advanced
” You should look at this section if you are already familiar with the process of developing cognitive items but would like to know more. “
A key challenge in undertaking the assessment of students lies in developing good quality items that adhere to international good practice. Unfortunately this is often not the case, and many items in national and regional assessment instruments do not meet the criteria for a good item. This includes that items must:
- Map to the assessment framework;
- Contribute to the purpose of the assessment;
- Measure what they intend to measure;
- Not contain any bias that would favour some students over others;
- Not contain any irrelevant information; and
- Stimulate the interest of students to perform as well as they can.
It is a good start if all items in an assessment instrument meet these standards. But many other components are also required before an item can be considered to be high quality.
Items are often accompanied by stimulus material. This may be a text, image, diagram, table or chart. Stimuli should be appropriate for the students being assessed. For example, they should contain words and images that are relevant to that grade and subject. Moreover, the stimulus used should not impose a heavy reading load on students. Equally, however, an approach such as removing labels from diagrams is not appropriate – a stimulus should include all necessary information for students to respond to items appropriately.
For multiple choice items (MCQs) a common mistake is to include incorrect responses (known as distractors) that are not plausible. This means that one or more of the options appear so unlikely that a student can guess the correct response. Instead, distractors should be plausible, should pose a challenge to students and should make students think.
In addition – and to the extent possible – distractors should be written to highlight misconceptions. For example, if assessing students on a mathematical skill, distractors could be written to reflect the most common errors that students make on that mathematical skill. This can ensure that assessment data is able to shed light on areas of the curriculum or particular learning outcomes that require more focus in the classroom.
For constructed response tasks, the challenging part lies less in drafting the item than in defining how it should be marked in the marking guide (also known as a scoring guide). This is because the more open a response (for example an essay rather than a one word response) the more room there is for subjective interpretation. Hence a robust marking guide should be developed that includes information all possible responses – both those that are regarded as fully acceptable (for full credit), those that are judged as partially acceptable (for partial credit) and common errors that are likely to be encountered.
Finally, an essential part of developing good items is the incorporation of quality assurance steps, including panelling, cognitive laboratories and piloting. Drafting items is only the first step in a lengthy process and many months of work may be required before items have reached the standards necessary for inclusion in an assessment instrument.