Research shows using assessments with precise measurement leads to better hiring practices. So why are some organizations still resistant?
By: James A. Scrivani, Ph.D., and Katey Foster, Ph.D.
Deciding who to select for a job or promotion is as complex as ever. Companies are currently facing an increase in the sheer volume of candidates due to evolving remote working opportunities with the Great Resignation, a booming HR assessment industry offering proprietary solutions, and an ever-present level of scrutiny on hiring from employers, plaintiffs’ law firms, and the federal (and state and local) government. What can talent management professionals do to ensure the right people are being chosen for the right role at the right time?
Assessments are regularly used by companies to predict who will be most successful on the job. They can vary in terms of their level of precision, but more precision is typically better. Despite the benefits of assessments with precise measurement, we’re often confronted with a fallacy from our hiring managers, business leaders, and sometimes even legal counsel: that standardized assessments can actually lead to increased legal scrutiny rather than deterring it. However, this isn’t necessarily true. Imprecise measurement and unstandardized hiring processes are more likely to evoke legal challenges (Highhouse, Doverspike, & Guion, 2016; Williams, Schaffer, & Ellis, 2013; Terpstra, Mohamed, & Kethley, 1999). Here, we want to share the best practices regarding the importance of precision in the selection process while debunking a few popular myths
3 Components of Assessment Tools
First, let’s briefly cover the fundamentals. The foundation of fair and effective hiring assessments resides within two critical topics: reliability and validity.
Reliable assessments are consistent, meaning the same or similar results are produced time after time. For instance, when we take our temperature with a thermometer, we expect the temperature displayed to be very close or the same when we take it again moments later. In this way, the tool is reliable. Valid assessments, meanwhile, measure what they’re designed to measure. Using the same example, if our temperature is actually 98.6 degrees Fahrenheit, a valid thermometer will display 98.6 (and an invalid one would display any other temperature). An assessment can’t be measuring the right thing (validity) without being consistent (reliable). When assessments used in a hiring process are both reliable and valid, they provide a precise measurement that ensures organizations a.) hire the best candidates, and b.) hire them in a legally compliant manner.
A third critical component to assessment is standardization, which occurs when every candidate is subject to the exact same experience. This ensures candidates are held to the same expectations and evaluated in a similar way. Standardized processes help recruiters and hiring managers avoid conscious and unconscious biases. For instance, highly structured interviews minimize the possible influence of applicant demographic characteristics on hiring decisions (McCarthy, Van Iddekinge, & Campion, 2010).
In order to reap the benefits of reliable, valid, and standardized assessment tools, the use of precise measurement is key. Assessments with precise measurement are constructed to measure the set of competencies and experiences needed to succeed in a job, giving recruiters and hiring managers the data they need to select the best candidates. These types of assessments are designed by conducting a job analysis to identify the critical knowledge, skills, abilities, and other characteristics needed to succeed in the job (Highhouse et al., 2016). This ensures the candidates are being evaluated only on the criteria relevant for the job. When this is present, performance on the assessment(s) can be used to identify the best candidates.
The Perks of Precise Measurement
Let’s examine the benefits of precise measurement and standardized hiring processes with a real-world example. Taylor, a hiring manager, has posted several openings for a job that has a high volume of candidates. In the past, the hiring process was cumbersome and involved a lot of staffing hours to conduct interviews of many candidates, some who were not qualified for the job. To ease that burden, Taylor wants to use a test to help inform the hiring decision. The test could measure physical abilities in the construction industry, coding skills in the tech space, or more broadly it could be a personality test used across industries. Regardless of industry or type of test, Taylor has some potential options to consider.
Option #1: The test is an “additional data point” (less precise).
One of Taylor’s concerns with implementing a hiring test is that qualified candidates may be improperly screened out by the test. Therefore, Taylor wants to use the test only as an “additional data point” for making the decision—so, not setting a cut score (passing score) for the test to weed out lower scoring applicants. For example, whether a candidate lifted 5 pounds or 50 pounds, scored a 10% or a 100% on a coding test, or scored a particular way on a personality test, the candidate would not be removed from the applicant pool. Taylor would like the test results to be considered by hiring managers in conjunction with all other data points, such as interviews. Taylor, while well-meaning, incorrectly thinks this approach will drive inclusion in the process as well as protect the company from legal risk because official hard lines are not drawn on who to select.
Option #2: The test is an established hurdle (more precise).
Taylor implements a valid test with a reasonable passing score based on appropriate research. Taylor uses this test early in the selection process as a gating technique or hurdle—i.e., only candidates who meet or exceed the passing score will advance to the next stage of the assessment process, while those who don’t meet that standard are removed from the process (see Figure 1).
This results in data for the hiring manager to consider on all candidates so the qualified candidates can be clearly identified. This approach should ultimately result in an improved quality of hire (and less concern for Taylor), assuming the test has been appropriately developed and validated for use with the target job, candidates are evaluated in the same standardized way, and the results are used consistently by hiring managers. Taylor’s concerns about setting an official cut score could be alleviated by knowing that the test was determined to be measuring critical capabilities for the target job in a reliable way.
Figure 1: Talent Acquisition Funnel
So, what can you learn from Taylor’s situation?
Takeaway #1: Precision and validation go hand in hand.
Consult I-O psychologists and employment lawyers to decide how best to establish the validity of an assessment in accordance with the professional and legal guidelines and standards*. Consider outside counsel if you don’t have resources internally as the legal landscape is ever-evolving. Any expert worth their salt will tell you that a thorough job analysis identifying the knowledge, skills, abilities, and other characteristics needed to perform the job effectively should be the first step in any assessment engagement. A proper job analysis provides the foundation for conducting a validation study according to professional and legal guidelines and standards. Validation studies evaluate and document whether the assessment predicts job performance and/or measures important and requisite KSAOs, in addition to establishing appropriate scoring methodology for the test.
* Professional and legal standards and guidelines include the EEOC’s Uniform Guidelines on Employee Selection Procedures (1978), the Society for Industrial-Organizational Psychology’s (SIOP) Principles for the Validation and Use of Personnel Selection Procedures (2018), and the American Psychological Association’s (APA) Standards for Educational and Psychological Testing (2014).
Takeaway #2: More precision helps you communicate more effectively with the business.
Helping the business understand the benefits of valid assessment may take some time, but ultimately it will pay off. Precision allows you to pinpoint each step of the selection process to determine where, and potentially why, candidates are falling out of the funnel. This allows for better and faster course corrections. When talking to the business, speed and responsiveness are critical, so be ready to guide them and be a part of this conversation (Church, Scrivani, & Paynter, 2019). Remember that assessments add cost, time, and complexity to a business process—therefore, work with your leaders on finding the right balance for your organization.
Takeaway #3: More precision can drive inclusion.
This may be surprising because a common preconceived notion on this topic is that setting an official passing score for a test will result in only advancing the few “perfect” candidates that fit the preexisting cookie cutter mold of people that were successful in the past at an organization. There are two issues with this assumption.
The first is that this is a potential case of conflating minimum job requirements with preferred job requirements. Minimum job requirements are the critical
knowledge, skills, abilities, and other characteristics a candidate needs to do the job on day one without training. Using an assessment to measure minimum qualifications will result in a wider variety of candidates that can do the job. Secondly, how you score the test is just as important as the test itself. For example, compensatory scoring (i.e., weighting assessment components to maximize validity while minimizing adverse impact) can be implemented where appropriate to “screen in” more candidates than “screen out.”
A great example of this approach is with personality tests, as Church and Ezama point out in “6 Truths About Using Personality Data for Talent Decisions” (2019), where many combinations of components of an assessment can result in a favorable outcome. Furthermore, and related to Takeaway #1, any assessment must include an analysis of adverse impact against protected groups. Precision allows us to pinpoint any areas that may produce adverse impact prior to implementation. This, in turn, allows us to examine possible ways of mitigating it before the test is actually used.
Takeaway #4: More precision means acknowledging and establishing cut scores up front.
Cut scores exist even if you haven’t officially established one. Using a test score solely as “an additional data point” doesn’t mean the test or hiring decision is free of adverse impact. Instead, the score of the lowest scoring selected candidate becomes an effective passing score for examining adverse impact. Imagine that Taylor used the “additional data point” approach and ultimately selected the candidate who scored highest on the test. No other candidates met this passing standard, even if its application wasn’t intended. Like Taylor, you may be using information that is inadvertently weeding out qualified people. For example, using GPA to screen for candidates from campus is an assessment with a cut score! Don’t be afraid of assessments but build a process to use them consistently and monitor them properly.
Takeaway #5: More precision in measurement is better than less.
While this takes more work upfront, as described above, you’ll be able to make better decisions and ultimately get better outcomes. The efficacy of assessments depends on their consistent use and precision. For instance, personality assessments are better at predicting job performance if they measure traits relevant to the job for which they’re being used to select candidates (Tett & Christiansen, 2007)—e.g., conscientious accountants, extraverted salespeople, etc.
Other assessments like situational judgment tests can vary in their ability to predict performance based on how instructions are written (McDaniel, Hartman, Whetzel, & Grubb, 2007) or what constructs they were designed to measure (Christian, Edwards, & Bradley, 2010). Interviews that are designed with a high degree of standardization and structure (e.g., all candidates are asked the same job-relevant questions and are evaluated using a defined anchored rating scale reflective of knowledge, skills, abilities, and other characteristics needed on the job), greatly increase the ability to predict performance (Huffcutt & Culbertson, 2011).
For all the reasons above, it’s wise for organizations to implement a mentality of aiming toward a high level of precision of measurement when using assessments that have been established as valid and reliable measures of job-related characteristics. Doing so in the correct and appropriate way is likely to pay dividends in terms of the quality of an organization’s hires.
James A. Scrivani, Ph.D., is the Global Head of Assessment & Development at Novartis.
Katey Foster, Ph.D., is a Director of Solutions Delivery at APTMetrics where she works with Fortune® 100 clients across industries to design and implement fair, valid, and legally defensible HR processes.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing (US). (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Church, A.H. and Ezama, S. (2019). 6 Truths About Using Personality Data for Talent Decisions. Talent Quarterly
Church, A. H., Scrivani, J. A., & Paynter, B. M. (2019). When External Trends and Internal Practice Collide: Is there an App for that? Organization Development Review, 51(2), 48–54.
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63(1), 83-117.
Highhouse, S., Doverspike, D., & Guion, R. M. (2016). Essentials of personnel assessment and selection (2nd ed.). Routledge.
Huffcutt, A. I., & Culbertson, S. S. (2011). Interviews. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology, Vol. 2. Selecting and developing members for the organization (pp. 185-203). American Psychological Association.
McCarthy, J. M., Van Iddekinge, C. H., & Campion, M. A. (2010). Are highly structured job interviews resistant to demographic similarity effects? Personnel Psychology, 63(2), 325-359.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb III, W. L. (2007). Situational judgment tests, response instructions, and validity: A meta-analysis. Personnel Psychology, 60(1), 63-91.
Society of Industrial and Organizational Psychology. (2018). Principles for the Validation and Use of Personnel Selection Procedures (5th ed.). Bowling Green, OH: Society of Industrial and Organizational Psychology.
Terpstra, D. A., Mohamed, A. A., & Kethley, R. B. (1999). An analysis of federal court cases involving nine selection devices. International Journal of Selection and Assessment, 7(1), 26-34.
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60(4), 967-993.
U.S. Equal Employment Opportunity Commission, Civil Service Commission, Department of Justice, & Department of Labor (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 38290-38315.
Williams, K.Z., Schaffer, M. M. and Ellis, L. E. (2013). Legal risk in selection: An analysis of processes and tools. Journal of Business Psychology, 28, 401-410.