Saturday, March 14, 2020

short paper test development Essays - Education, Psychometrics

short paper test development Essays - Education, Psychometrics Test Development: What Makes a Good Test Bonnie Perry January 6, 2019 Test Development: What Makes a Good Test There are various experts that claim a "good test" does not exist. A world that is devoid of tests, would be lacking proper diagnostics to measure the appropriateness of professionals such as doctors, lawyers or politicians (Cohen, 2017). According to Cohen (2017) a good test can not only be reliable but also valid . In the search for a valid, fair and reliable tests there is an intricate process that repetitively deviates and improves as time, society , new technology and cultural integration develops . The d evelopers of test use a sequence of steps to create "good" tests , commonly this involves five basic stages (Jacobs, 2004). These stages consist of : Conceptualization Construction try-out Analysis finalization (Cohen, 2017). Making a Good Test A good test begins with the conceptualization stage the developers of the test identify the objectives and outline the purpose of testing (DeVellis, 2016), a t this time a determination is made if there is a necessity for the test (Cohen, 2017). T he developers at that time will decide what is being test ed and why, and if it is beneficial to design the test . In t he conceptualization stage of a test developers are characterizing the purpose and necessity also revealing the construct that will be measured and the population used DeVellis (2016), also possible formats a test may undertake (Cohen, 2017). P ast and present research relating to the construct is examined to correctly describe the existing data and observations of the information being measured. Next, is the construction stage of test development takes place and consist of producing a bank of items (known as an item bank), choosing a format for the test and sorting test items in the item bank to construct the preferred structure of the test (Cohen, 2017). Also, in the construction phase developers define what type of scale will be used to score the test (DeVellis, 2016). Essentially, this phase incorporates all characteristics of the final style of the test. Once the construction phase is accomplished , developers working to produce a good test will progress to the try-out and analysis stages. The tryout stage is using a comparable population of what the test is intended for (Cohen, 2017). Continuing to the analysis phase, once the test is scored corresponding to the selected scale the scores are assessed, and test items are then considered to be good or determined to require revision or complete rejection (Cohen, 2017). When considering a good test item, it is found to be reliable, valid, and helps t o distinguish test takers (Cohen, 2017). A g ood test item usually is answered correctly by a high number of high -scorers and a moderately low number of low-scorers (Cohen, 2017). When the original test construction is complete, and data analyzed, item revision concurring with the data is collected then finalization of the test is done (Cohen, 2017). D evelopers will mold the original construction into what will be the final version by editing the format to increase its complete usefulness (DeVellis, 2016). T he revised version is used on a new comparable group of test-takers and the results are re-analyzed. Th is process is repeat ed until the test developers are satisfied the test has reached its maximum validity and reliability (Jacobs, 2004). Validity and Reliability For a test developer to b e li ev e their test is valid, reliable, and fair, what does a developer do to confirm their test has met all three measures? In order to answe r that question, it is essential to distinguish validity, reliability, and fairness as they relate to the test. Measuring exactly what is intended makes it valid and a high degree of precision and reliability makes it reliable (Cohen, 2017). In order to accomplish the v alidity of a test developers must ensure that; a) the test item sample s are adequately at the proper range of test takers needed to measure the test objective; b) scoring correctly replicates the behavior of the test-takers; and c) test scoring is comparable to additional test that measure