Brief Lecture Notes for Unit 4
This unit has to do with the assessment of abnormality, both in research and in clinical contexts. First, we'll examine data types in general, then we'll take a detailed look at assessment tools and methods for the measurement and classification of personality and behavior, whether normal or abnormal.
Types of data
Four types of data utilized by psychological researchers are L-data, O-data, T-data, and S-data. Note how these form the convenient acronym LOTS, since they enable us to collect LOTS of data.
1. L-data (or Life-data) are physical artifacts of a person's life history... the kinds of tangible evidence that (say) a private detective, biographer, or even a historian or archaeologist could uncover. Examples might be physical possessions, public documents (birth certificates, school records, medical records, tax records), journals or diaries, newspaper clippings about a person.
2. O-data (or Other-data) represent evaluations, impressions, perceptions, observations, and ratings provided or made by third parties who know the individual personally in a real-world context... such as parents, siblings, teachers, peers. Formal observations made by trained scientific researchers as part of the formal research enterprise are not O-data, but represent a form of T-data, below.
3. T-data (or Test-data) represent formal scientific observations of a person's behavior (whether in a lab setting or a real-life field setting) by trained scientific observers utilizing objective standards of measurement and data recording. Results of objective tests (meaning those for which right and wrong answers exist, like tests of skill, aptitude, competence, or knowledge) also count as T-data. However, results of subjective measures like paper-and-pencil personality tests, for which there are no right or wrong answers as such, do not comprise T-data, but represent a form of S-data, below.
4. S-data (or Self-data) represent a person's own self-evaluations, self-assessments, self-ratings, or self-perceptions, including formalized self-audits or self-reports such as might be obtained by structured interviews or structured personality inventories or questionnaires. Unstructured self-reports generated in the past, such as diaries and journals, are usually regarded as forms of L-data, however.
Why do we need more than one form of data? Because different kinds of data give us different kinds of insights into the person, since they are generated in different contexts, by different people, in different ways. They allow us to "triangulate" our impressions of the subjects of our research to get a well-rounded, full-orbed view of the persons we are interested in. Each form of data has some strengths and weaknesses associated with it, so a lopsided overreliance on one form of data collection to the exclusion of the others is usually, in the long run, a bad idea.
Most of the assessment methods and tools outlined below involve the collection of S-data or T-data. However, L-data and O-data, particularly the latter, have their place in clinical contexts as well.
A taxonomy of assessment tools and methods
We can classify assessment tools, methods, and techniques in two different ways. Each provides three distinct types or categories. The two dimensions are independent or orthogonal, yielding a 3 x 3 matrix (nine possible types of assessment devices). The first dimension has to do with how the tool is developed. The second has to do with how the resulting tool is structured or organized.
Three types of tool development
1. The rational-theoretical approach
In this approach, which might be characterized as the "armchair approach", we start with a theory, a model, a concept, an idea, or a set of expectations about how we think people might think, feel, or behave. We then generate items or elements that make sense on the basis of our theoretical expectations -- in practice, making up items out of one's own head. This is certainly the simplest way to generate an assessment tool. It places an emphasis on conceptual/theoretical consistency and the use of logical reasoning to devise meaningful elements. A good example of a nonclinical personality inventory developed in this way is the Myers-Briggs Type Indicator.
2. The empirical keying approach
In this approach, which might be characterized as the "data driven approach", we begin by identifying appropriate criterion groups. For instance, if we are trying to develop a personality inventory to measure schizophrenic tendencies, we might identify a group of hospitalized schizophrenics and compare them to two different kinds of control groups: hospitalized individuals with no history of schizophrenia who are suffering from unrelated disorders (such as major clinical depression), and individuals with no identified mental illness who are generally comparable in age, gender, ethnicity, educational level, and so forth. We then present a wide range of possible response items and see which of those items reliably discriminate between the schizophrenics and the other groups. The emphasis is on empirical discriminability and the elimination of statistical classification errors. A good example of a nonclinical personality inventory developed in this way is the California Personality Inventory.
3. The factor analytic approach
In this approach, which might be characterized as the "number crunching approach", statistical methods are used to isolate groups of items or elements that are intercorrelated (that measure closely related constructs). The emphasis is on internal cohesion of the resulting assessment tool. (Click here for more technical details about the nature of factor analysis.) A good example of a nonclinical personality inventory developed in this way is the Sixteen Personality Factor Questionnaire.
Three types of resulting tools
1. Aptitude and achievement tests
These measure defined skills, abilities, capabilities, or aptitudes, and hence represent T-data. An IQ test is an example of this sort of test. Technically, the term "aptitude test" (as opposed to "achievement test") refers to an attempt to measure the capacity for future learning (as opposed to the measurement of learning after the fact). Thus, the SAT is an aptitude test (attempts to measure a student's potential for doing college work), while the final exam in this course will be an achievement test. As both involve T-data, there are right and wrong answers. These play the least significant role in clinical contexts, though the measurement of intellectual and cognitive capabilities is often a part of a full-orbed intake assessment.
2. Self-report inventories
These utilize S-data in a structured format (e.g., multiple choice), for instance, personality inventories that ask questions like:
If the pay were the same, I would rather have a job as a:
a. salesperson
b. research scientist
As can be seen from the above example, a forced choice or constrained format is typically used, though sometimes a middle of the road ("not sure" or "can't decide") response alternative is offered. Overuse of this category is discouraged, however, since a person who answers "not sure" to all items is, by definition, revealing nothing useful about his or her personality. Note the inherent ambiguity in the use of these responses: if a person says "not sure", does this mean that she would love both jobs? Or that she would hate both jobs? Or that she has no feelings either way about either job? Or that she doesn't understand the question? Or that she isn't interested enough in the question to bother thinking about it?
3. Projective measures
Projective measures also use self-report, but in an opposite fashion from the self-report inventory. Responses are unstructured and unconstrained, and items are often deliberately ambiguous. For instance, a subject is shown a picture and asked to tell a story about it; or is shown an inkblot and asked to describe what she sees; or is asked to complete an open-ended statement such as:
When I fail at a task, I usually...
Most people are...
Something I am ashamed of in my life is...
The biggest mistake anyone can make is...
Many "psychologically oriented" job interview questions are really disguised projective items. For instance, "Tell me about someone you admire" is really an indirect way of asking about your values, priorities, and self-perceptions (your ideal self).
On the exam, you will be expected to be able to classify assessment tools on both dimensions. For instance, the final exam in this course will be an achievement test developed using rational-theoretical methods (do you see why?)
Psychometric properties
There are literally thousands of "personality inventories" (often wrongly called "tests", though as defined above, a "test" involves right or wrong answers, hence the measurement of skills or knowledge) in existence (not counting those "self quizzes" that flood the pages of magazines like Seventeen). Well over 95% of them are not worth the paper on which they are printed. What makes the difference between a "good" (scientifically respectable) inventory and a poor one? Good inventories have certain kinds of defined psychometric properties including the following.
1. Reliability
A measurement tool is reliable if, when used to assess an individual, it provides similar information from one time to another. (Technically, this is one specific kind of reliability, called test-retest reliability. There are other forms of reliability: if psychometrics excite you, sign up for PSY 210.) See the case study (hyperlinked below) for sample data that illustrate the concept of test-retest reliability.
2. Validity
A measurement tool is reliable if it really measures what it claims or appears to measure: if the results really do predict or correlate with real-world behavior. (Technically, this is one specific kind of validity, called predictive validity. There are other forms of validity: see the above comment about PSY 210.) See the case study for evidence of predictive validity.
3. Standardization
A measurement tool is standardized if it can be administered in precisely the same way (e.g., using preset instructions and methods) to all subjects. Since the social context of assessment can strongly influence the results or outcome, standardization is important.
4. Norming
A measurement tool is normed if an individual's responses can be compared (statistically benchmarked) to those of an appropriate group of others: to a so-called norm group such as a random sample of American adults, a sample of middle managers, or a sample of hebephrenic schizophrenics. (In this class, comparing your exam results to the best score in the class is a rough attempt at norming, based on the assumption that if the highest score in the class is lower than expected, the exam was too hard, and if higher than expected, the exam was too easy.) See the case study for examples of norming.
5. Response bias and social desirability checks
People are often motivated to disguise or distort their true feelings, attitudes, and traits, either in a positive direction (e.g., if applying for a job or trying to gain admission to graduate school), or -- less frequently -- in a negative direction (e.g., if trying to plead guilty by reason of insanity when arrested on a felony charge). Hence, one (controversial and debatable) approach to this problem is to include items on the inventory that are designed to statistically distinguish between honest and dishonest responders, or even to statistically correct for response biases. Note that not all response biases are deliberate or conscious; they may represent a habitual, characteristic proneness to "oversell" or "undersell" oneself, to the self as much as to others. See the case study for an example of response bias checking.
6. Profile analysis information
In inventories that have multiple scales, more important than the individual scale scores may be the interaction between the scores, or the meaning of individual profiles (combinations of scores). For instance, 75% to 80% of college professors are low in Extraversion and also high on Openness on the Big Five; either trait in isolation probably adds little to a person's effectiveness in the academic role unless accompanied by the other. See the case study for an example of profile analysis.
For a sample case study illustrating the above concepts, click here.
The DSM-IV-R
The Diagnostic and Statistical Manual of Mental Disorders (4th edition, revised), or DSM-IV-R for short, is the official basis for psychodiagnostic classification in the clinical community. In this approach, an individual receives diagnoses on five different dimensions or axes:
Thus, a complete diagnosis might look something like this:
In this course, our primary emphasis, as expected, will be on Axes I and II, but remember that a complete diagnosis includes all five axes of the total diagnostic system.
Study Guide
1. How do L-, O-, T-, and S-data differ? Be able to recognize, or generate, examples of each.
2. How do the rational-theoretical, empirical keying, and factor analytic approaches to tool development differ? Be able to recognize examples of each.
3. How do aptitude/achievement tests, self-report inventories, and projective measures differ? Be able to recognize examples of each.
4. Explain each of the following psychometric properties and their importance or significance: reliability, validity, standardization, norming, response bias or social desirability checks, profile analysis. Be able to recognize and comment on them in the context of a case study such as that provided above.
5. Explain the five axes of the DSM-IV-R and how they are utilized to make a complete diagnosis.
On to Unit 5