Most people encounter the concept of an IQ score long before they understand what one actually means. The number gets invoked in casual conversation, academic context, and popular culture with a confidence that often outpaces its theoretical complexity. So: what does it actually measure, how is it calculated, and what are we really looking at when a score comes back?
A Brief History of Cognitive Testing
The formal history of intelligence testing begins in the early twentieth century with Alfred Binet, a French psychologist commissioned by the Paris school system to identify students who might need additional educational support. Binet, working with Théodore Simon, developed a series of tasks designed to measure a child's reasoning, memory, and problem-solving skills relative to age-based developmental norms.
Binet himself was characteristically cautious about what his test could conclude. He explicitly warned against treating the score as a fixed, inherited quantity — a warning that was largely ignored as the concept migrated to the United States. American psychologists, particularly those interested in mass testing during World War I, adapted these tools into the Army Alpha and Beta tests, which were administered to nearly two million recruits. The results were often misinterpreted in ways that reinforced pre-existing prejudices, a cautionary history worth keeping in mind.
Over the following decades, the tools were refined substantially. The Wechsler scales — developed by David Wechsler beginning in the 1930s — moved away from the single-score model and introduced subtests covering distinct cognitive domains. Today, professionally administered tests like the WAIS-IV and the Stanford-Binet 5 remain the gold standard for clinical cognitive assessment, involving trained examiners and standardized conditions.
"The test instrument is only as informative as the context in which its results are interpreted."
What Modern IQ Tests Actually Measure
Contemporary IQ tests are not trying to measure something called "intelligence" in any singular, metaphysical sense. They're measuring performance on a set of well-defined cognitive tasks — and inferring from that performance something about underlying mental abilities.
A modern test like the Wechsler Adult Intelligence Scale divides performance into four main index scores: Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed. Each index is composed of several subtests. Verbal comprehension tasks might include vocabulary questions and verbal analogies. Perceptual reasoning tasks typically involve matrix completion and block design. Working memory is assessed through digit span tasks and letter-number sequencing. Processing speed is measured via symbol search and coding exercises.
These indices are themselves derived from decades of factor-analytic research — statistical techniques that identify clusters of related performance. The underlying theory, broadly, is that performance on these distinct tests correlates in ways that suggest shared cognitive infrastructure, often discussed under the concept of "g" (general cognitive ability) first proposed by Charles Spearman.
How Scores Are Calculated
The number you receive is not a count of correct answers. It is a derived, scaled score that positions your raw performance relative to a reference population. This is a fundamentally important distinction that most popular discussions of IQ scores fail to emphasize clearly.
The process works roughly as follows. A large standardization sample is tested — typically thousands of people selected to be demographically representative of the population for which the test is designed. The raw scores from this sample are then used to establish a normative distribution. By convention, IQ tests are scaled so that the mean is 100 and the standard deviation is 15. This means that a score of 100 represents exactly average performance within the reference sample, a score of 115 represents one standard deviation above the mean, and a score of 85 represents one standard deviation below.
The shape of the distribution follows what's known as the normal distribution — the familiar bell curve. Under this model, approximately 68% of scores fall between 85 and 115, roughly 95% fall between 70 and 130, and scores above 130 or below 70 each represent approximately 2.5% of the population.
Critically, this means your score is always relative. If a new version of a test is introduced with a new standardization sample, score distributions are recalibrated. This is one of the reasons the same person can receive different scores on different tests — the normative samples differ, the subtests differ, and the measurement instrument itself is simply different.
Validity and What Tests Actually Predict
A common question is: what do IQ scores actually predict? The research here is clearer than critics sometimes acknowledge — but also narrower than proponents sometimes claim.
Cognitive ability scores are among the strongest known predictors of academic performance across educational levels. The correlation between IQ and academic achievement is consistently reported in the range of .50 to .70 across large meta-analyses, which represents a substantial relationship by psychological research standards. Similar correlations have been found with job performance, particularly in complex, cognitively demanding roles.
What IQ scores predict less reliably are outcomes shaped by factors the tests don't directly measure: personality traits, motivation, creativity in open-ended contexts, social and emotional competency, access to resources and opportunity, and the accumulated advantages or disadvantages that shape cognitive development itself. A score captures a cross-section of performance at a single point in time. It does not capture potential, identity, or future trajectory.
Online Versus Professionally Administered Tests
There are meaningful differences between professionally administered psychometric instruments and online assessments, including those offered by platforms like Zarmiquo. It's important to be honest about this gap.
Clinical tests are administered by trained psychologists in controlled environments, with standardized instructions, timed subtests, and — in some formats — examiners who observe and record qualitative behavioral data alongside quantitative scores. The validation research behind these instruments is extensive, and the norms are derived from carefully constructed samples.
Online assessments operate under different conditions. Testing environments vary, internet connectivity can affect performance, the examiner is absent, and standardization across platforms is inconsistent. These factors introduce measurement error that is difficult to quantify. Additionally, online tests vary considerably in how their normative samples were constructed — some with scientific rigor, others with minimal attention to validity.
This does not mean online assessments have no value. Used appropriately — as exploratory tools for personal understanding, for educational engagement, or for gaining familiarity with the types of cognitive tasks involved — they can be genuinely informative. What they should not be used for is making clinical judgments, supporting diagnoses, or drawing conclusions about fixed cognitive ability with high certainty.
Interpreting Your Results
If you've taken a cognitive assessment and received a score, the most important thing to understand is that the number represents a snapshot, not a ceiling. Research consistently demonstrates that cognitive performance is influenced by a wide range of modifiable factors: sleep quality, stress levels, nutrition, education, familiarity with test formats, and the specific test used.
Beyond the score itself, it's worth paying attention to any domain-level breakdown your assessment provides. Understanding that you score particularly well on spatial reasoning tasks but find working memory exercises more demanding tells you something more textured and actionable than a single composite number.
Finally: a score is not a label. The decades of research into cognitive ability have generated useful insights about how human minds work in general — but they have also consistently failed to produce anything like a complete description of any individual mind. Use the tool for what it's designed for. Don't ask it to tell you who you are.
This article was reviewed by our Assessment Research Lead and Content Development Specialist to ensure accuracy and responsible framing.