Construct Validity Interrater Agreement

Facial validation is at best a very weak type of evidence that a measurement method measures what it is supposed to do. One of the reasons for this is that it is based on people`s intuitions about human behavior, which are often false. The fact is that many well-established measures in psychology work quite well despite the lack of facial validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality traits and disorders by deciding whether each of the more than 567 different statements applies to them — many of these statements have no obvious connection to the construction they measure. For example, the articles «I like detective or mystery stories» and «The sight of blood doesn`t scare me or make me sick» both measure the repression of aggression. In this case, it is not the participants` literal answers to these questions that are of interest, but whether the pattern of participants` responses to a number of questions matches those of individuals who tend to suppress their aggression. In assessing the criterion validity of an indicator, stronger relationships indicate greater validity. In statistics, inter-rater reliability (also referred to by different similar names such as Inter-Rater agreement, inter-rater concordance, inter-observer reliability, etc.) is the degree of consistency between evaluators. It is an assessment of homogeneity or consensus in the assessments of different judges.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided evidence of the prevalence of discrimination by showing that human scores were not correlated with certain other variables. For example, they found only a weak correlation between people`s need for cognition and a measure of their cognitive style – the extent to which they tend to think analytically, breaking down ideas into smaller parts or holistically versus the «whole.» Nor did they find a correlation between the need for cognition and measurement of their fear of examination and their tendency to react in a socially desirable way. All these weak correlations indicate that the ratio reflects a conceptually autonomous construction. The criteria may include other dimensions of the same construction. For example, one would expect that new measures of fear of examination or physical risk would be positively correlated with existing measures of the same buildings. This is called convergent validity. Value is the extent to which the scores in an indicator represent the variable for which they are intended. But how do researchers make this judgment? We have already taken into account one factor that they take into account: reliability. If a measurement has good test reliability and internal consistency, researchers should be more confident that the results are what they are supposed to do.

However, there must be more, because a measure can be extremely reliable, but it cannot be valid. Imagine as an absurd example someone who believes that the length of people`s index reflects their self-esteem and therefore tries to measure self-esteem by holding a rule to people`s indexes. Although this measure has very good reliability in terms of testing, it would not be valid at all. The fact that one person`s index finger is one centimeter higher than another`s would not indicate the self-esteem you had. External validity is an umbrella term that describes the extent to which research findings can be generalized to other people, situations, and time periods. These three components can be subdivided into their own forms of validity: the common probability of convergence is the simplest and least robust measure. It is estimated as a percentage of the time during which evaluators agree on a nominal or categorical evaluation system.