History of the WISC IV
Richard Niolon, Ph.D.


You'll recall the development of adult intelligence tests from earlier in the semester I hope. Basically, the WISC came out in 1949 as a downward extension of an adult IQ test, the Wechsler Bellevue, and was revised in 1974 (WISC R) by Wechsler. The Psychological Corporation revised it again in 1991 (WISC III), and again in 2003 (WISC IV). While the WISC R was in use, some researchers discovered the two main factors, the Verbal IQ and the Performance IQ, might be supplemented by a third, labeled Freedom From Distractibility. The WISC III revisions attempted to strengthen this factor, and created a fourth one too.

Thus, the WISC III offered a FSIQ as a measure of g, a Verbal and a Performance IQ for those used to the WISC R, and four new factors. These were the Verbal Comprehension Index, Perceptual Organization Index, Freedom from Distractibility Index, and Processing Speed Index. Of note, while the Psychological Corporation believed the factor structure supported these four factors, Sattler had some doubts.


Basic Information

The WISC IV is an update of the WISC III, and contains 10 core subtests, and 5 additional subtests, that can be summed to four indexes, and one Full Scale IQ. The FSIQ can range from 40 at the lowest to 160 at the highest. Three subtests can be given in modified forms to allow for additional examination of processing abilities. Children tested with the WISC III and retested with the WISC-IV show about a 5 point drop in FSIQ, likely because of the new aspects of the test, and the novelty of some of the new items and subtests.

The test takes between 65 and 80 minutes to administer to most children, with more time required if the additional subtests are given or if the client is more intelligence, and less time required for clients suffering from Mental Retardation. It can be given to children as young as 6,0 and as old as 16,11. The test overlaps with the Wechsler Preschool and Primary Scale of Intelligence -- Third Edition for children between the ages of 6,0 and 7,3. Page 9 in the manual explains how to decide which test to give for this age range, but Sattler recommends always using the WISC IV. Similarly, the test overlaps with the Wechsler Adult Intelligence Scale -- Third Edition for children between the ages a 16, 0 and 16, 11. Pages 9 and 10 in the manual explain how to decide which test to give for this age range, but Sattler acknowledges they are probably equal with the WISC IV having a slight advantage. The FSIQ dropped 3 points for those tested with the WAIS III and then the WISC IV.

From the FAQ at wisc-iv.com….

I retested a gifted student using WISC–IV and the scores were lower than previously reported on WISC–III. Why is this?

While research has not been collected to determine the effect of practice on re-examination, research based on the WISC III indicated that one year was needed between administrations to avoid significant practice effects. Over the WISC IV test-retest period of 32 days, the VCI gained 2-3 points, PRI gained 3-6 points, WMI gained 1-5 points, and PSI gained 5-11 points. As a result, the FSIQ was 6 points higher overall, ranging from 4-8 depending on the age range.

While I do not recall guidelines for this in the previous version of the test, the WISC IV manual now specifies that if testing must be broken up into two sessions, they should be no more than one week apart. Use the first date of testing to compute the child's age.

Pages 9-18 discuss special considerations to take into account when testing for neuropsychological purposes and for testing children with special needs.

Pages 22-24 discuss the importance of rapport in testing children. While I have not seen research examining this in the WISC IV, in the previous version of the test, research showed that the difference between good and poor rapport could mean a difference of 10 points in final IQ scores.


A number of changes were built into the WISC IV, mainly based on new neurological models of cognitive functioning:

Fluid Reasoning is better assessed

The Perceptual Organization Index was renamed Perceptual Reasoning Index to reflect this greater incorporation of Fluid Reasoning, and to reflect the decreased reliance on speed, which better disentangles the PRI from Processing Speed

Working Memory us better assessed through changes made to one test and the addition of a new subtest. While this used to be called Freedom From Distractibility, the name of the index has been changed to be more consistent with the adult version of the test and current research.

Processing Speed is a cleaner factor, since the two main subtests are joined and a third new subtest is added.

Other updates include:

Updating of norms to account for population changes in IQ

Updated art

New content (about 44%); items showing cultural, SES, or regional bias were eliminated or reworded prior to the test publication

Updated items and items that were more age appropriate

Simplified instructions and teaching efforts, easier to use

Revised queries and language samples for answers

A WISC IV Integrated version that allows some multiple choice testing of children to see what they know but can not express


The WISC IV normative sample is based on 2,200 children from 11 age groups (each one year wide), with an equal number of males and females in each group, and an ethnic breakup that matches the March 2000 US Census data very closely. There were 5 levels of parental education, and 4 geographical areas covering the whole United States and Hawaii. Sattler classifies the sampling method as "excellent." Of note, there were only 1,100 in the norming sample for Arithmetic.

Below is some information about the Indexes of the test:

Indexes of the WISC IV

Verbal Comprehension Index

The VCI is a measure of verbal concept formation. It assesses children's ability to listen to a question, draw upon learned information from both formal and informal education, reason through an answer, and express their thoughts aloud. It can tap preferences for verbal information, a difficulty with novel and unexpected situations, or a desire for more time to process information rather than decide "on the spot." It's a good predictor of readiness for school and achievement orientation, but can be influenced by background, education, and cultural opportunities.

Perceptual Reasoning Index

The PRI is a measure of non-verbal and fluid reasoning. It assesses children's ability to examine a problem, draw upon visual-motor and visual-spatial skills, organize their thoughts, create solutions, and then test them. It can also tap preferences for visual information, comfort with novel and unexpected situations, or a preference to learn by doing.

Working Memory Index

The WMI is a measure of working memory. It assesses children's ability to memorize new information, hold it in short-term memory, concentrate, and manipulate that information to produce some result or reasoning processes. It is important in higher-order thinking, learning, and achievement. It can tap concentration, planning ability, cognitive flexibility, and sequencing skill, but is sensitive to anxiety too. It is an important component of learning and achievement, and ability to self-monitor.

Processing Speed Index

The PSI is a measure of processing speed. It assesses children's abilities to focus attention and quickly scan, discriminate between, and sequentially order visual information. It requires persistence and planning ability, but is sensitive to motivation, difficulty working under a time pressure, and motor coordination too. Cultural factors seem to have little impact on it. It is related to reading performance and development too. It is related to Working Memory in that increased processing speed can decrease the load placed on working memory, while decreased processing speed can impair the effectiveness of working memory.

Something to keep in mind:

VCI accounts for 62% of variance in g

PRI accounts of 45% of variance in g

WMI accounts for 43% of variance in g

PSI accounts for 23% of variance in g


Reliability for the WISC IV was examined by computing internal consistency values (split half correlations) or test-retest reliability. Two subtests improved in reliability compared to the older version of the test, likely due to the updates to the test. All the subtest values were recomputed as well using 16 special groups (ADHD, LDs, LD/ADHD, Mild and Mod MR…) and values were comparable. Some subscales dropped below .79, but 94% maintained .79 or better, with many increasing to .90 or better.

The Standard Error of Measurement was used to compute Confidence Intervals, to tell us the error range at 95% certainty associated with various scores.

For the subscales, it is 1 point generally, and 1-2 points for Coding and Cancellation

For the VCI, PRI, and WMI, it is 4 points, for PSI it is 5 points

For the FSIQ, it is 3 points

Test-retest reliability was computed based on 243 children across the 11 age groups, tested twice in 32 days on average (13 to 63). Results were at minimum .76, but most were in the .80s. Of note, these reliabilities are based on the whole sample, and at specific age ranges, Sattler points out, the numbers are sometimes shakier. Thus, treat subscale scores as less stable compared to index scores and the FSIQ. Interscorer reliability by experts was generally .98, with Comprehension dipping to .95. However, carelessness can drop this number drastically. Sattler quotes studies of graduate students and master's level clinicians finding about 8 errors per protocol, and errors in 34% to 42% of all protocols.


Validity for the WISC IV was assessed in a number of ways. Content Validity was established by reviewers and experts, as well as creating content similar to other, established tests to expand the evaluation base of the WISC IV. The response process was examined as well with multiple choice formats to detect common errors, having children explain their responses to highlight alternate acceptable answers, and altering stimuli as a result.

Convergent Validity was established by examining the subtest inter-correlations. While all subtests should tap g to some extent, and thus correlate positively to some extent, subscales on the same index should correlate most strongly with each other. After this, subtests from different scales should have stronger correlations if they both are strongly related to g. While you can check these intercorrelations in the Administration Manual, the Technical and Interpretive Manual sums intercorrelations across age groups in Table 5.1.

Discriminant Validity was established by conducting an updated factor analysis. While results are available by age, the Technical and Interpretive Manual sums the results for the whole norming sample in Table 5.3. Each of the subtests on an index have factor loadings above .60 with their own index and far lower loadings on the others. The only exception was Picture Concepts. Picture Concepts loads .45 on PRI, with .19 being the next highest loading anywhere else, and it typically shows the weakest loadings of any subtest on an index across the age groups. Sattler generally agrees with the new factor structure.

For Convergent Validity, correlations between the WISC IV and other Wechsler Tests seems most appropriate. Results of this are below:

WISC IV Validity Correlations




















































FSIQ - 4



FSIQ - 2



Sattler points out that the WPPSI III and WISC IV share some items, and so this could have inflated the correlation between them. Thus, the WISC IV and WISC III, despite the differences introduced, still correlate well. Point for Discussion: Why aren't the FSIQ correlations better?

Another way to establish Discriminant Validity is to correlate WISC IV scores with several achievement measures. This presumes IQ scores and achievement scores are related but still different things, which some debate. Tables 5.15 to 5.20 in the Technical and Interpretive Manual summarize this data. Again, WISC IV numbers and achievement tests correlate .6 to .8 with another Wechsler test (the WIAT II), but not as well with other tests (.20 to .50 is common). Sattler notes that the Psychological Corporation provides this data, and only correlated the WISC IV to test they publish, and so further research is needed.

Another way to establish Convergent Validity is to study the scores of different groups of children who are expected to gain very different scores. Some of this data is below:

Average Index and IQ Scores for Various Groups













Mild MR






Moderate MR






Reading LD






Reading and Written LD






Mathematics LD
























Thus, Gifted kids score higher than normal (1.6 SD higher), Mild and Moderate MR kids score lower (2 to 3 SDs lower), and LD kids have lower indexes (about .67 of a SD) and are on the borderline of average to low average. They compare several other groups in the Technical and Interpretive Manual in Tables 5.31 to 5.37 you can review if you like.

Sattler presents some data on demographic differences in scores:

Boys scored about 5 points higher than girls on Processing Speed

Euro-American children scored 11.5 points higher than African-American children

Euro-American children scored 10 points higher than Hispanic-American children

Asian-American children scored 3 points higher than Euro-American children

FSIQs of children of college-educated parents were 20 points higher than children of parents with only a grade school education, with VCI taking the lion's share of this (22 points), WMI and PRI taking large pieces (15 points), and PSI the smallest (7 points)

Children from the Northeast and Midwest scored about 4 points higher than children from the South and West


WMI and PSI are 4-5 points higher than VCI and PRI for African-American children

PRI, WMI, and PSI are 3-6 points higher than VCI for Hispanic-American children

PRI and PSI are 5 points higher than VCI and WMI for Asian-American children