Designing Assessment Instruments
INTRODUCTORY
“You should look at this section if you have only a limited idea of how to design assessment instruments.”
An assessment is designed to measure student performance in one or more subjects or domains. The design of the assessment instrument is extremely important in the ability to collect valid and reliable information the skills and knowledge of students. This involves much more than simply compiling some assessment items.
The two biggest constraints in the design of assessment instruments are time and coverage. Including sufficient items from every part of a domain on an assessment instrument that enables the collection of valid data will lead to a very long instrument that takes students a lot of time to respond to. On the other hand, reducing testing time will mean that it is not possible to include sufficient items to measure all of the domain. The art of assessment instrument design is to balance these two demands to meet the purpose of the assessment programme.
The fundamental components of assessment instrument design include:
Assessment framework–Instruments must be designed to map to the assessment framework, which should define exactly how the instruments should be built, including elements such as the balance and number of items and the amount of testing time.
Testing time–The length of time that it is appropriate to test students depends on their age. Students should not be tested for more time than they can be reasonably expected to concentrate. For young students this may be only around 20 minutes. Beyond this they will become tired, and the assessment will no longer be able to collect accurate information on their skills and knowledge.
Test booklets–In simple assessment such as that used in classrooms and schools it is common to have just one assessment booklet, meaning that all students respond to exactly the same items. But in large scale assessment it is common to use multiple booklets. With a sophisticated approach to data analysis, this will still allow all students to have their results reported on the same performance scale. The advantage of multiple booklets is that they can include more assessment items (and, hence, domain coverage).
Linking–While it is possible to use a number of test booklets in an assessment programme, these cannot be completely different. Instead, it is necessary to link booklets together and there are minimum number of items that need to be in more than one booklet. This is known as horizontal linking. Equally, if student performance between grade levels or years is to be reported on the same scale, vertical linking is required between booklets.
Clusters–Assessment items tend to be developed, and used, in clusters. For example, there might be a stimulus with four or five items that link to it. It is common for clusters to be rotated around booklets. In addition, the order of items might be rotated within one test booklet so that some students take them at the beginning and some at the end.
Layout–As well as adhering to the assessment framework, the layout of the test should be relevant to target students, including depending on their age. For example, small font and a crowded test form will make the assessment harder for students and may therefore not be able to collect accurate information on their performance.
To find out more about designing assessment instruments, move to #Intermediate.
INTERMEDIATE
“You should look at this section if you already know about the basic design of anassessment instrument but would like to know more details about its components.”
The design of assessment instruments is an important part of any assessment programme. If assessment instruments are poorly designed, they may not be able to collect accurate information on student skills and knowledge. The key parameters for assessment instrument design are coverage and time and these need to be carefully balanced in order to meet the needs of an assessment programme.
Assessment instrument design always starts with the assessment framework which should identify key elements in instrument design such as how many items, what balance of items, what type of items and testing time. In addition, a number of other considerations are important to consider, including:
Number of items-The most common mistakes in assessment instrument design are to use too few items to measure a particular construct and/or to have an imbalance of items. For example, including one item to measure a learning outcome will not provide a valid indication of whether or not a student has achieved that learning outcome.
Similarly, if–for example–algebra and geometry are equally important in a mathematics curriculum, an assessment instrument that includes 5 items on algebra and 20 on geometry will not be a valid measure of mathematical performance. Instead, the number of items should be defined in the assessment framework based on careful analysis of the relevant curriculum and / or learning outcomes.
Layout–To save printing costs it is common for as many assessment items as possible to be squeezed onto a page of an assessment instrument. They are often close together, use small font and have no illustrations. Sometimes they are printed very poorly. While this may save money, it will actually make it very difficult for the assessment to collect accurate information about students’ skills and knowledge, so will not really be a cost-saving.
Instead, the instrument should be designed to support students to respond correctly. This means using font size appropriate to student age–much bigger font is needed for younger than older students. It also means using attractive and informative diagrams and illustrations and clear print. When using digital assessment, the ability of students to make the font size bigger or smaller to suit them is desirable, particularly for students with vision limitations.
Linking booklets–While a single test booklet that all students take is fine in some assessment types, such as at the school or classroom level, it limits the number of constructs that can be assessed and the depth of coverage. Therefore, it is common in large scale assessments to use multiple test booklets that are linked together with the use of anchor items. This allows students to respond to different items but still have their performance compared but requires very careful design of components such as the number of anchor items to use.
Clusters–It is quite unusual for items to be placed into assessment instruments individually. Instead they are groups in clusters. These clusters may share the same stimulus or have something else in common. Designing assessment instruments includes deciding on the order of clusters. It is good practice to rotate these so that–for example–in one booklet students do cluster A first and cluster B last and in another booklet they do them in reverse. This helps avoid the fatigue effect in which students tend to perform least well on items at the end of the test.
To find out more about designing assessment instruments, move to #Advanced.
ADVANCED
“You should look at this section if you are already familiar how to design an assessment instrument and would like to know more details.”
Every assessment instrument must assess what it is designed to assess, i.e.it should include items that collect data to meet the purpose of the assessment. This means that it should include an appropriate balance of items to take into account time and length constraints while providing a robust measure of the construct being assessed.
An assessment instrument should be designed to help students not to make it difficult for them. This does not mean that it should only include easy items. Instead, the way it is laid out, its appearance, font size, the use of graphics, the balance of items and the organisation of items should support and encourage student performance, not hinder it.
Assessment instrument design tends to involve a range of experts including subject experts, graphic designers, statisticians and language experts. Key elements include the clarity of instructions given to students, the logical progression from one section to another, clear information on the time available for different sections and details about the weighting of different items. All of these components are equally important in paper and digital formats.
The inclusion of different item types is one key consideration that should be defined in the assessment framework. Determining the order in which they are included in the assessment instrument is an important part of instrument design. For example if a language test includes both reading and writing, which should come first, or should they be interspersed? How many words are students expected to write in what amount of time? These kinds of factors require consideration as each option has advantages and disadvantages that need to be weighed up.
Linked booklets with rotated clusters of items are common in large scales assessment design. Getting this right–so that it meets psychometric analysis requirements and does not unfairly penalise any students–requires very careful thought. An example of cluster rotation design for test booklets is shown below.
This example uses a Youden squares design but there are also many other designs that can be used. In this case each cluster is used in three booklets (A has been highlighted for illustration) and there are seven booklets in total. The clusters are also used in different orders, for example cluster A is first in booklet 1, in the middle in booklet 7 and at the end in booklet 5.
This approach can be used to link assessment instruments for grade levels. For example, if cluster A is at grade 5 level then it might be included in a grade 3, a grade 5 and a grade 7 assessment instrument. This means that the performance of students at these grades can be shown on the same performance scale. A different design can be used for vertical linking to link assessment instruments across different years, for example in 2019 and in 2020. This allows the performance of cohorts of students to be tracked over time.
About the guidance levels