SAT Problem Solving and Data Analysis: The Complete Guide
Problem-Solving and Data Analysis is the SAT Math domain that most directly mirrors the quantitative reasoning you will use in college and in everyday life. While Algebra and Advanced Math test your ability to manipulate equations and functions, this domain tests your ability to work with real-world data: interpreting charts and graphs, calculating percentages and rates, understanding statistical measures, evaluating probability, and drawing valid conclusions from studies and surveys. These are the skills you will use when reading a news article about a scientific study, evaluating a financial decision, or interpreting data in any professional context.
This domain accounts for approximately 5 to 7 of the 44 total math questions on the Digital SAT. While this is fewer questions than Algebra or Advanced Math, the questions in this domain have a unique character that makes them worth dedicated preparation. They tend to be more reading-intensive than other math questions, requiring you to extract information from tables, graphs, and verbal descriptions before performing any calculations. Many students who are strong at pure algebra lose points here because they misread a graph, confuse correlation with causation, or set up a percentage calculation with the wrong base value.

This guide covers every topic in the Problem-Solving and Data Analysis domain in exhaustive detail. For each topic, you will find clear concept explanations, worked examples at multiple difficulty levels, common mistakes to avoid, trap answer patterns, and the fastest solution approaches. Master this guide and you will have the tools to handle every question this domain can throw at you.
Table of Contents
- Why This Domain Matters Despite Having Fewer Questions
- Ratios and Proportional Relationships
- Setting Up and Solving Proportions
- Part-to-Part and Part-to-Whole Ratios
- Scaling and Ratio Tables
- Rates and Unit Rates
- Unit Rate Calculations
- Comparing Rates
- Speed, Distance, and Time Using Rates
- Unit Conversions
- Percentages
- Basic Percentage Calculations
- Percent Increase and Percent Decrease
- Successive Percentage Changes
- Percentage Word Problems in Context
- Linear and Exponential Growth in Data Contexts
- Statistics: Measures of Center
- Mean
- Median
- Mode
- Choosing the Appropriate Measure
- The Effect of Outliers
- Statistics: Measures of Spread
- Range
- Standard Deviation (Conceptual)
- Comparing Distributions
- Probability
- Basic Probability
- Probability From Data
- Conditional Probability
- Independence
- Two-Way Frequency Tables
- Reading Two-Way Tables
- Joint, Marginal, and Conditional Frequencies
- Relative Frequency
- Data Displays: Reading and Interpreting
- Bar Graphs and Histograms
- Line Graphs
- Dot Plots
- Box Plots (Box-and-Whisker Plots)
- Scatter Plots and Lines of Best Fit
- Tables
- Data Collection and Study Design
- Random Sampling and Generalization
- Random Assignment and Causation
- Observational Studies vs Experiments
- Bias in Data Collection
- Margin of Error
- Common Traps in This Domain
- Desmos and Calculator Strategies
- Score-Level Strategies
- The Complete Study Plan
- Frequently Asked Questions
Why This Domain Matters Despite Having Fewer Questions
With only 5 to 7 questions, you might be tempted to deprioritize Problem-Solving and Data Analysis in favor of the higher-volume Algebra and Advanced Math domains. That would be a strategic mistake for three reasons.
First, these questions are often among the most time-consuming on the test because they require careful reading and interpretation of data displays. If you are unprepared, you might spend three or four minutes on a single question trying to decipher a two-way table or a complex scatter plot. That wasted time hurts your performance on every other question in the module.
Second, many of the skills in this domain overlap with other domains. Ratio and percentage skills appear in Algebra word problems. Growth model interpretation connects to the exponential functions in Advanced Math. Data display reading is required for some Reading and Writing questions as well. Mastering this domain has ripple effects.
Third, the questions in this domain are highly learnable. The concepts are concrete and practical, the question types follow predictable patterns, and the skills improve quickly with practice. A focused study period of two to three weeks can bring you from struggling with these questions to answering them confidently and accurately.
Ratios and Proportional Relationships
Ratios express the relationship between two quantities. They appear throughout the SAT, both in dedicated ratio questions and as components of word problems in other domains.
Setting Up and Solving Proportions
A proportion is an equation stating that two ratios are equal: a/b = c/d. To solve, cross-multiply: ad = bc.
Worked Example (Basic Proportion):
If 5 notebooks cost $12, how much do 8 notebooks cost?
Set up the proportion: 5/12 = 8/x
Cross-multiply: 5x = 96
x = 19.20
Eight notebooks cost $19.20.
Worked Example (Scale Drawing):
On a map, 2 inches represents 35 miles. If two cities are 5.5 inches apart on the map, what is the actual distance between them?
Proportion: 2/35 = 5.5/x
Cross-multiply: 2x = 192.5
x = 96.25 miles
Worked Example (Recipe Scaling):
A recipe for 4 servings requires 2.5 cups of flour. How much flour is needed for 10 servings?
Proportion: 4/2.5 = 10/x
Cross-multiply: 4x = 25
x = 6.25 cups
Common Mistake: Setting up the proportion with mismatched units. In the proportion a/b = c/d, the units must be consistent: if a and c represent the same type of quantity (like notebooks), then b and d must also represent the same type (like dollars). A common error is flipping one of the ratios, resulting in an answer that is the reciprocal of the correct answer.
Part-to-Part and Part-to-Whole Ratios
A ratio can express a part-to-part relationship or a part-to-whole relationship, and the SAT tests whether you can distinguish between them and convert from one to the other.
Part-to-part: The ratio of boys to girls is 3:5. This means for every 3 boys, there are 5 girls.
Part-to-whole: The ratio of boys to total students is 3:8 (since 3 + 5 = 8). Boys make up 3/8 of the total.
Worked Example:
In a class, the ratio of students who prefer math to students who prefer science is 4:7. If there are 44 students in the class, how many prefer math?
Total parts: 4 + 7 = 11
Each part represents: 44 / 11 = 4 students
Math preference: 4 * 4 = 16 students
Science preference: 7 * 4 = 28 students
Check: 16 + 28 = 44. Correct.
Worked Example (Three-Part Ratio):
A fruit basket contains apples, oranges, and bananas in the ratio 2:3:5. If there are 60 fruits total, how many oranges are there?
Total parts: 2 + 3 + 5 = 10
Each part: 60 / 10 = 6
Oranges: 3 * 6 = 18
Common SAT Pattern: The question gives you a part-to-part ratio but asks for a part-to-whole quantity (or vice versa). Students who do not convert between the two types will use the wrong denominator.
Scaling and Ratio Tables
Ratio tables organize proportional relationships and help you find equivalent ratios quickly.
Worked Example:
A paint mixture uses red and blue paint in a 3:2 ratio. Complete the table:
Red: 3, 6, 9, 12, 15 Blue: 2, 4, 6, 8, 10
If you need 21 cups of total paint, how much of each color?
Total ratio parts: 3 + 2 = 5. Each part = 21/5 = 4.2 cups. Red = 3(4.2) = 12.6 cups. Blue = 2(4.2) = 8.4 cups.
Rates and Unit Rates
A rate is a ratio that compares two quantities with different units. Common examples include speed (miles per hour), price (dollars per pound), density (people per square mile), and productivity (items per hour).
Unit Rate Calculations
A unit rate expresses the rate per one unit of the denominator quantity. To find a unit rate, divide the numerator quantity by the denominator quantity.
Worked Example:
A car travels 240 miles using 8 gallons of gas. What is the car’s fuel efficiency in miles per gallon?
Unit rate: 240 / 8 = 30 miles per gallon.
Worked Example (Price Comparison):
Store A sells 5 pounds of rice for $8.75. Store B sells 3 pounds of rice for $5.10. Which store offers a better price per pound?
Store A: $8.75 / 5 = $1.75 per pound
Store B: $5.10 / 3 = $1.70 per pound
Store B is cheaper per pound.
Worked Example (Work Rate):
Machine A produces 150 widgets in 3 hours. Machine B produces 200 widgets in 5 hours. Which machine is faster?
Machine A rate: 150 / 3 = 50 widgets per hour
Machine B rate: 200 / 5 = 40 widgets per hour
Machine A is faster.
If both machines work together, how many widgets do they produce per hour? 50 + 40 = 90 widgets per hour.
How long would it take both machines working together to produce 450 widgets? 450 / 90 = 5 hours.
Comparing Rates
The SAT frequently asks you to compare rates from different contexts or to determine which of several options is the best value. The approach is always to convert to the same unit rate and then compare.
Worked Example:
Three phone plans offer different data rates:
Plan A: 5 GB for $30 per month Plan B: 8 GB for $44 per month Plan C: 12 GB for $60 per month
Which plan offers the best value per GB?
Plan A: $30 / 5 = $6.00 per GB Plan B: $44 / 8 = $5.50 per GB Plan C: $60 / 12 = $5.00 per GB
Plan C offers the best value per gigabyte.
However, if the question asks which plan is cheapest for someone who only needs 5 GB, Plan A is cheapest at $30 (Plan B and C cost more even though their per-GB rate is lower). This distinction between “best rate” and “lowest total cost for a specific quantity” is a common SAT trap.
Speed, Distance, and Time Using Rates
The fundamental relationship d = rt (distance equals rate times time) appears frequently. The three rearrangements are: d = rt, r = d/t, and t = d/r.
Worked Example:
A runner completes a 10-kilometer race at an average speed of 8 kilometers per hour. How long does the race take?
t = d/r = 10/8 = 1.25 hours = 1 hour 15 minutes.
Worked Example (Average Speed for a Round Trip):
A driver goes from City A to City B (120 miles) at 60 mph and returns at 40 mph. What is the average speed for the entire round trip?
This is a common trap. The average speed is NOT (60 + 40)/2 = 50 mph. You must calculate total distance divided by total time.
Time going: 120/60 = 2 hours. Time returning: 120/40 = 3 hours.
Total distance: 240 miles. Total time: 5 hours. Average speed: 240/5 = 48 mph.
The average speed is 48 mph, not 50 mph. The driver spends more time at the slower speed, which pulls the average below the simple midpoint.
Unit Conversions
Unit conversion questions require you to multiply by conversion factors to change from one unit to another. The key principle is that multiplying by a conversion factor (like 12 inches / 1 foot) does not change the value because the numerator and denominator are equal.
Worked Example (Single Conversion):
Convert 5 miles to feet. (1 mile = 5,280 feet)
5 miles * 5,280 feet/mile = 26,400 feet.
Worked Example (Multi-Step Conversion):
A car travels at 60 miles per hour. What is its speed in feet per second?
60 miles/hour * 5,280 feet/mile * 1 hour/3,600 seconds = 60 * 5,280 / 3,600 = 88 feet per second.
Worked Example (Metric Conversion):
A solution contains 250 milligrams per liter. What is the concentration in grams per liter?
250 mg/L * 1 g/1,000 mg = 0.25 g/L.
SAT Conversion Tip: The SAT provides any non-standard conversion factors within the question. You do not need to memorize obscure conversions. However, you should know basic relationships: 1 hour = 60 minutes = 3,600 seconds, 1 foot = 12 inches, 1 mile = 5,280 feet, 1 kilogram = 1,000 grams, 1 liter = 1,000 milliliters.
When setting up unit conversions, write out the units explicitly and cancel them. This prevents the common error of multiplying when you should divide or vice versa. If you want feet per second and you start with miles per hour, the miles must cancel (miles in numerator and denominator) and the hours must cancel, leaving feet in the numerator and seconds in the denominator.
Dimensional Analysis: The Foolproof Conversion Method
Dimensional analysis is a systematic approach that guarantees correct unit conversions. You set up a chain of fractions where each fraction equals 1 (the numerator and denominator represent the same quantity in different units), and the units you want to eliminate appear in both the numerator of one fraction and the denominator of another.
Worked Example (Complex Conversion):
A factory produces 480 widgets per hour. Express this rate in widgets per minute.
480 widgets/hour * 1 hour/60 minutes = 480/60 = 8 widgets per minute.
The “hour” in the numerator of the first fraction cancels with “hour” in the denominator of the conversion factor.
Worked Example (Area Conversion):
A room is 12 feet by 15 feet. What is the area in square inches?
Area in square feet: 12 * 15 = 180 square feet.
Convert: 180 ft^2 * (12 in/1 ft)^2 = 180 * 144 = 25,920 square inches.
Note that when converting area (square units), you must square the conversion factor. When converting volume (cubic units), you must cube it. This is a common error on the SAT: students multiply by 12 instead of 144 when converting square feet to square inches.
Worked Example (Density Conversion):
A substance has a density of 2.5 grams per cubic centimeter. Convert this to kilograms per cubic meter.
2.5 g/cm^3 * (1 kg/1000 g) * (100 cm/1 m)^3
= 2.5 * (1/1000) * (1,000,000)
= 2.5 * 1000
= 2,500 kg/m^3
The cubic centimeter conversion requires cubing the factor: (100 cm/m)^3 = 1,000,000 cm^3/m^3.
Currency and Rate Conversions
The SAT occasionally includes problems involving currency exchange rates or converting between different rate units.
Worked Example:
If 1 euro = 1.08 US dollars, how many euros can you get for $500?
$500 * (1 euro / $1.08) = 500/1.08 ≈ 462.96 euros.
Common Trap: Multiplying instead of dividing (or vice versa). Ask yourself: should the answer be larger or smaller than 500? Since each euro costs more than a dollar, you should get fewer euros than the number of dollars you have, so the answer should be less than 500. This sanity check prevents the common error of computing 500 * 1.08 = 540.
Percentages
Percentage questions are among the most common in this domain and appear in many forms. The underlying concept is that a percentage is a fraction out of 100: 25% means 25/100 = 0.25.
Basic Percentage Calculations
There are three fundamental percentage calculations, and you should be able to perform all three quickly.
Finding a percentage of a number: What is 15% of 80? Multiply: 0.15 * 80 = 12.
Finding what percentage one number is of another: 12 is what percent of 80? Divide and multiply by 100: (12/80) * 100 = 15%.
Finding the whole given a percentage and a part: 12 is 15% of what number? Divide: 12 / 0.15 = 80.
Worked Example (Applied Context):
A shirt originally priced at $45 is on sale for 20% off. What is the sale price?
Discount: 0.20 * $45 = $9
Sale price: $45 - $9 = $36
Alternatively: Sale price = 0.80 * $45 = $36 (multiplying by 1 minus the discount rate).
Worked Example (Tax Calculation):
A meal costs $32.50 before tax. If the tax rate is 8.5%, what is the total cost?
Tax: 0.085 * $32.50 = $2.7625 ≈ $2.76
Total: $32.50 + $2.76 = $35.26
Alternatively: Total = 1.085 * $32.50 = $35.26.
Percent Increase and Percent Decrease
Percent change measures how much a quantity has changed relative to its original value. This is one of the most commonly tested percentage concepts on the SAT.
The Formula: Percent change = ((new value - original value) / original value) * 100
For percent increase, the new value is larger, and the result is positive.
For percent decrease, the new value is smaller, and the result is negative (or you express it as a positive decrease).
The Critical Detail: The denominator is always the ORIGINAL value, not the new value and not the average. This is the most common error students make on percentage change questions, and the SAT deliberately includes the answer you would get by using the wrong denominator.
Worked Example (Percent Increase):
A store’s revenue increased from $200,000 to $250,000. What was the percent increase?
Percent increase = (250,000 - 200,000) / 200,000 * 100 = 50,000/200,000 * 100 = 25%
Worked Example (Percent Decrease):
A population decreased from 15,000 to 12,000. What was the percent decrease?
Percent decrease = (15,000 - 12,000) / 15,000 * 100 = 3,000/15,000 * 100 = 20%
Common Trap: If asked “the population decreased from 15,000 to 12,000, what is the percent decrease?” the trap answer is 25% (calculated as 3,000/12,000 * 100 using the new value as the denominator instead of the original). Always use the original value as the base.
Worked Example (Finding the Original Value):
After a 30% increase, a stock is worth $91. What was its original value?
Original * 1.30 = 91. Original = 91 / 1.30 = $70.
Worked Example (Finding the New Value):
A population of 8,000 decreased by 15%. What is the new population?
New = 8,000 * 0.85 = 6,800.
Successive Percentage Changes
When multiple percentage changes are applied sequentially, you multiply the growth/decay factors rather than adding the percentages. This is counterintuitive and is a frequent source of errors.
Worked Example:
A price increases by 20% and then decreases by 20%. Is the final price equal to the original?
No. Let the original price be $100. After a 20% increase: $100 * 1.20 = $120. After a 20% decrease: $120 * 0.80 = $96.
The final price is $96, which is 4% less than the original $100. The overall change factor is 1.20 * 0.80 = 0.96, representing a 4% decrease.
Why this happens: The 20% increase is applied to a base of $100, adding $20. But the 20% decrease is applied to a base of $120 (the increased price), subtracting $24. The decrease removes more than the increase added because it operates on a larger base.
Worked Example (Multiple Increases):
A population grows by 10% in the first period and 15% in the second period. What is the overall percentage growth?
Overall factor: 1.10 * 1.15 = 1.265
Overall percentage growth: 26.5% (not 25%, which would be the incorrect result of simply adding 10% + 15%).
Worked Example (Discount Then Tax):
An item is originally $80. A 25% discount is applied, followed by 9% sales tax. What is the final price?
After discount: $80 * 0.75 = $60. After tax: $60 * 1.09 = $65.40.
Overall factor: 0.75 * 1.09 = 0.8175. Final price: $80 * 0.8175 = $65.40.
Percentage Word Problems in Context
The SAT embeds percentage calculations in real-world scenarios that require careful reading.
Worked Example (Survey Data):
In a survey of 500 people, 65% said they prefer coffee. Of those who prefer coffee, 40% drink it black. How many people drink black coffee?
Coffee drinkers: 0.65 * 500 = 325
Black coffee drinkers: 0.40 * 325 = 130
Worked Example (Percent of a Percent):
A school has 1,200 students. 60% participate in extracurricular activities. Of those who participate, 25% are in sports. What percentage of all students are in sports?
Sports students: 0.60 * 0.25 = 0.15 = 15% of all students.
Number: 0.15 * 1,200 = 180 students.
Worked Example (Tip Calculation):
A restaurant bill before tax is $85. Tax is 7%, and the customer wants to leave a 20% tip on the pre-tax amount. What is the total amount paid?
Tax: $85 * 0.07 = $5.95. Tip: $85 * 0.20 = $17.00. Total: $85 + $5.95 + $17.00 = $107.95.
Note that the tip is calculated on the pre-tax amount ($85), not on the after-tax amount. The SAT might test whether you apply the tip to the correct base.
Linear and Exponential Growth in Data Contexts
This topic bridges Problem-Solving and Data Analysis with the function concepts from Algebra and Advanced Math. In this domain, the focus is on recognizing growth patterns in data and interpreting growth models in real-world contexts.
Linear growth in data: When values increase by a constant amount per equal time interval, the pattern is linear. In a table, the first differences (differences between consecutive values) are constant. In a graph, the data follows a straight line.
Exponential growth in data: When values increase by a constant percentage per equal time interval, the pattern is exponential. In a table, the ratios between consecutive values are constant. In a graph, the data follows a curve that accelerates upward (for growth) or levels off toward zero (for decay).
Worked Example (Identifying Growth Type From a Table):
Period: 0, 1, 2, 3, 4 Value: 100, 115, 130, 145, 160
First differences: 15, 15, 15, 15 (constant). This is linear growth with a rate of 15 per period.
Period: 0, 1, 2, 3, 4 Value: 100, 120, 144, 172.8, 207.36
Ratios: 1.2, 1.2, 1.2, 1.2 (constant). This is exponential growth with a factor of 1.2 (20% growth per period).
Common SAT Question: A table or graph is presented, and you must determine whether the relationship is linear, exponential, or neither, and then identify the equation that models the data.
Interpreting Growth Models in Real-World Contexts
The SAT presents growth models within contexts like population growth, depreciation, investment returns, and biological processes. You need to connect the mathematical model to the real-world scenario.
Worked Example (Linear Model Interpretation):
A city’s population is modeled by P(t) = 50,000 + 1,200t, where t is the number of periods since measurement began.
What was the initial population? 50,000 (the constant term, which is the value when t = 0).
What is the rate of growth? 1,200 people per period (the coefficient of t, which is the slope).
How many periods will it take for the population to reach 80,000?
80,000 = 50,000 + 1,200t. So 30,000 = 1,200t. t = 25 periods.
Worked Example (Exponential Model Interpretation):
A car’s value is modeled by V(t) = 28,000(0.82)^t, where t is the number of periods since purchase.
What was the purchase price? $28,000 (the coefficient, the value when t = 0).
What is the depreciation rate? 18% per period (because 1 - 0.82 = 0.18, meaning the car retains 82% of its value each period and loses 18%).
What is the value after 5 periods? V(5) = 28,000(0.82)^5 = 28,000(0.3707) ≈ $10,381.
When will the car’s value drop below $5,000? Solve 28,000(0.82)^t = 5,000. This gives (0.82)^t = 5,000/28,000 = 0.1786. Taking logarithms: t = ln(0.1786)/ln(0.82) ≈ 8.7 periods. On the SAT, you would use Desmos to graph the function and find where it crosses 5,000.
Worked Example (Comparing Linear and Exponential):
Company A’s revenue grows linearly: R_A(t) = 100,000 + 20,000t Company B’s revenue grows exponentially: R_B(t) = 100,000(1.12)^t
Both start at $100,000. In the short term, Company A grows faster ($20,000 per period versus about $12,000 in the first period for Company B). But exponential growth accelerates over time. Eventually Company B surpasses Company A.
At what point does Company B surpass Company A? Set them equal: 100,000 + 20,000t = 100,000(1.12)^t. This cannot be solved algebraically (it mixes linear and exponential terms), but graphing both functions in Desmos reveals the intersection.
This type of comparison question tests whether you understand the fundamental difference between linear and exponential growth: linear growth adds a constant amount, while exponential growth multiplies by a constant factor, so exponential growth eventually dominates regardless of the initial rates.
Recognizing Growth Patterns in SAT Questions
The SAT signals which type of growth to expect through specific language:
“Increases by [amount] per period” signals linear growth (constant additive change). “Increases by [percent] per period” signals exponential growth (constant multiplicative change). “Doubles every [time period]” signals exponential growth with base 2. “Half-life of [time period]” signals exponential decay with base 1/2. “Constant rate of change” signals linear. “Constant percent change” signals exponential.
Recognizing these signals allows you to set up the correct model immediately without testing whether the data is linear or exponential.
Statistics: Measures of Center
Statistics questions on the SAT focus on your understanding of how data is summarized and what different statistical measures tell you. You will never need to perform complex calculations; the emphasis is on interpretation and conceptual understanding.
Mean
The mean (average) is calculated by summing all values and dividing by the number of values.
Mean = (sum of all values) / (number of values)
Worked Example:
Find the mean of: 12, 15, 18, 22, 33
Sum: 12 + 15 + 18 + 22 + 33 = 100
Mean: 100 / 5 = 20
Worked Example (Finding a Missing Value):
The mean of five numbers is 24. Four of the numbers are 18, 22, 26, and 30. What is the fifth number?
Sum of all five = 24 * 5 = 120. Sum of known four = 18 + 22 + 26 + 30 = 96.
Fifth number = 120 - 96 = 24.
Worked Example (Weighted Mean):
A student’s grade is based on tests (60% weight) and homework (40% weight). If the test average is 82 and the homework average is 95, what is the overall grade?
Weighted mean: 0.60(82) + 0.40(95) = 49.2 + 38.0 = 87.2
The overall grade is 87.2, not the simple average of 82 and 95 (which would be 88.5). The weighted mean accounts for the different importance of each component.
Worked Example (Combining Groups):
Class A has 20 students with a mean test score of 75. Class B has 30 students with a mean test score of 85. What is the combined mean for all 50 students?
Total score for Class A: 20 * 75 = 1,500 Total score for Class B: 30 * 85 = 2,550 Combined total: 1,500 + 2,550 = 4,050 Combined mean: 4,050 / 50 = 81
Note that the combined mean (81) is NOT the simple average of 75 and 85 (which would be 80). It is closer to 85 because Class B has more students, so it contributes more weight to the combined average.
SAT Question Pattern: “If the mean of n numbers is m, what is the sum?” The sum equals n * m. This relationship is used in many questions where you know the mean and need to work backward to find a missing value or a total.
Worked Example (Effect of Adding a Value):
The mean of 10 numbers is 15. If the number 26 is added to the set, what is the new mean?
Original sum: 10 * 15 = 150. New sum: 150 + 26 = 176. New count: 11. New mean: 176 / 11 = 16.
The mean increased from 15 to 16 because the added value (26) is above the original mean.
General Principle: Adding a value above the mean increases the mean. Adding a value below the mean decreases the mean. Adding a value exactly equal to the mean does not change it. The SAT tests this principle directly.
Worked Example (Removing a Value):
The mean of 8 numbers is 20. If one number (the value 36) is removed, what is the new mean?
Original sum: 8 * 20 = 160. New sum: 160 - 36 = 124. New count: 7. New mean: 124 / 7 ≈ 17.7.
Removing a value above the mean decreases the mean. Removing a value below the mean increases the mean.
Computing Mean From Frequency Tables and Histograms
The SAT often presents data in a frequency table or histogram rather than as a list of values. Computing the mean from these formats requires a specific approach.
Worked Example:
Score: 60, 70, 80, 90, 100 Frequency: 3, 5, 8, 6, 3
Mean = (603 + 705 + 808 + 906 + 100*3) / (3+5+8+6+3)
= (180 + 350 + 640 + 540 + 300) / 25
= 2,010 / 25 = 80.4
Multiply each value by its frequency, sum the products, and divide by the total frequency.
For histograms, use the midpoint of each interval as the representative value, multiply by the frequency (height of the bar), and proceed as above. This gives an approximation since you do not know the exact values within each interval.
Median
The median is the middle value when data is arranged in order. If the number of values is odd, the median is the single middle value. If the number is even, the median is the average of the two middle values.
Worked Example (Odd Number of Values):
Find the median of: 7, 3, 12, 5, 9
Arrange in order: 3, 5, 7, 9, 12
Middle value: 7
Median = 7
Worked Example (Even Number of Values):
Find the median of: 4, 8, 11, 15, 20, 26
Arrange in order (already sorted). Two middle values: 11 and 15.
Median = (11 + 15) / 2 = 13
Worked Example (From a Frequency Table):
A survey of 25 households recorded the number of pets per household:
Pets: 0, 1, 2, 3, 4 Frequency: 5, 8, 6, 4, 2
Total households: 5 + 8 + 6 + 4 + 2 = 25. The median is the 13th value (the middle of 25).
Count from the beginning: positions 1-5 have 0 pets, positions 6-13 have 1 pet. The 13th value is 1.
Median = 1 pet.
Mode
The mode is the value that appears most frequently. A dataset can have no mode (all values appear once), one mode, or multiple modes.
The SAT rarely asks for the mode directly but might reference it in the context of comparing statistical measures.
Choosing the Appropriate Measure
The SAT tests whether you understand when each measure of center is most appropriate.
The mean is best when the data is roughly symmetric and has no extreme outliers. It uses every data point and is the most commonly cited average.
The median is best when the data is skewed or has extreme outliers. The median is resistant to outliers because it depends only on the middle position, not on the actual values of extreme data points.
Worked Example:
The salaries at a small company are: $35,000, $38,000, $40,000, $42,000, $45,000, and $500,000 (the CEO).
Mean: ($35,000 + $38,000 + $40,000 + $42,000 + $45,000 + $500,000) / 6 = $700,000 / 6 ≈ $116,667
Median: ($40,000 + $42,000) / 2 = $41,000
The mean ($116,667) is heavily influenced by the CEO’s salary and does not represent a typical employee’s salary. The median ($41,000) is a much better representation of a typical salary at this company.
SAT Question Pattern: “Which measure of center best represents the typical value in this dataset?” If the data has outliers or is skewed, the answer is the median. If the data is roughly symmetric, the mean and median will be similar, and either is appropriate.
The Effect of Outliers
An outlier is a data point that is significantly different from the rest of the data. The SAT tests your understanding of how adding or removing an outlier affects the mean and median.
Effect on the mean: Adding a high outlier increases the mean. Adding a low outlier decreases the mean. The effect can be substantial because the mean incorporates every data value.
Effect on the median: Adding an outlier has minimal or no effect on the median because the median depends on position, not value. In a dataset of 20 values, the median is determined by the 10th and 11th values; changing the largest or smallest value does not affect those positions.
Worked Example:
Dataset: 10, 12, 14, 16, 18. Mean = 14. Median = 14.
Add outlier 100: Dataset becomes 10, 12, 14, 16, 18, 100. New mean = 170/6 ≈ 28.3. New median = (14 + 16)/2 = 15.
The mean jumped from 14 to 28.3 (a massive change), while the median only shifted from 14 to 15 (a minimal change). This illustrates why the median is resistant to outliers.
Statistics: Measures of Spread
Measures of spread describe how dispersed the data is. The SAT tests two measures: range and standard deviation.
Range
The range is the difference between the maximum and minimum values: range = max - min.
Worked Example:
Dataset: 5, 8, 12, 15, 22. Range = 22 - 5 = 17.
The range is simple to calculate but sensitive to outliers. A single extreme value can dramatically increase the range without changing the overall distribution of the data.
Standard Deviation (Conceptual)
The SAT never asks you to calculate a standard deviation. Instead, it tests your conceptual understanding of what standard deviation measures and how it compares between datasets.
Standard deviation measures how spread out the data values are from the mean. A larger standard deviation means the data points are more dispersed. A smaller standard deviation means the data points are more clustered around the mean.
Key Conceptual Points:
If all values in a dataset are identical (like 5, 5, 5, 5, 5), the standard deviation is 0 because there is no spread.
Adding a constant to every value in the dataset shifts the mean but does not change the standard deviation (the spread remains the same).
Multiplying every value by a constant multiplies the standard deviation by the absolute value of that constant.
If a dataset has values tightly clustered around the mean, its standard deviation is small. If the values are widely spread, the standard deviation is large.
Comparing Distributions
The SAT might present two datasets (in tables, dot plots, histograms, or verbal descriptions) and ask you to compare their centers and spreads.
Worked Example:
Dataset A: 48, 49, 50, 51, 52 (mean = 50, tightly clustered) Dataset B: 30, 40, 50, 60, 70 (mean = 50, widely spread)
Both datasets have the same mean (50), but Dataset B has a larger standard deviation because the values are more spread out from the mean.
Worked Example (Visual Comparison):
Two dot plots are shown. Plot A has most dots clustered between 8 and 12 with a few at 5 and 15. Plot B has dots spread fairly evenly from 2 to 18.
Plot B has a larger standard deviation because the data is more evenly dispersed across a wider range.
SAT Question Pattern: “Which dataset has a greater standard deviation?” Look at the spread of the data, not the center. Two datasets can have the same mean but very different standard deviations, or different means but the same standard deviation.
Probability
Probability questions on the SAT are based on counting and ratios, not complex probability theory. The fundamental concept is:
Probability = (number of favorable outcomes) / (total number of possible outcomes)
Probability is always between 0 and 1 (or 0% and 100%). A probability of 0 means the event is impossible. A probability of 1 means the event is certain.
Basic Probability
Worked Example:
A bag contains 4 red marbles, 3 blue marbles, and 5 green marbles. What is the probability of randomly selecting a blue marble?
Total marbles: 4 + 3 + 5 = 12
P(blue) = 3/12 = 1/4 = 0.25
Worked Example (Complement):
Using the same bag, what is the probability of NOT selecting a green marble?
P(not green) = 1 - P(green) = 1 - 5/12 = 7/12
The complement rule (P(not A) = 1 - P(A)) is useful when it is easier to calculate the probability of the event not happening.
Worked Example (Multiple Conditions):
A bag contains 4 red, 3 blue, and 5 green marbles. If two marbles are selected one after another without replacement, what is the probability that both are red?
P(first red) = 4/12 = 1/3
| P(second red | first was red) = 3/11 (one red marble has been removed) |
P(both red) = (4/12) * (3/11) = 12/132 = 1/11
Worked Example (“At Least One” Problems):
What is the probability of getting at least one head when flipping a coin three times?
The complement of “at least one head” is “no heads” (all tails).
P(all tails) = (1/2)^3 = 1/8
P(at least one head) = 1 - 1/8 = 7/8
The “at least one” pattern is important: it is almost always easier to calculate the complement (none) and subtract from 1 than to calculate all the individual cases directly.
Worked Example (Probability From a Graph):
A histogram shows the distribution of scores on a quiz:
Score 1-2: 5 students Score 3-4: 12 students Score 5-6: 18 students Score 7-8: 10 students Score 9-10: 5 students
Total students: 50
P(score 5 or above) = (18 + 10 + 5) / 50 = 33/50 = 0.66
P(score below 5) = (5 + 12) / 50 = 17/50 = 0.34
Alternatively: P(below 5) = 1 - P(5 or above) = 1 - 0.66 = 0.34.
Probability From Data
Many SAT probability questions present data in a table or chart and ask you to calculate the probability of a randomly selected item meeting certain criteria.
Worked Example:
A survey of 200 people recorded their favorite season:
Spring: 45, Summer: 70, Fall: 55, Winter: 30
If a person is selected at random from the survey, what is the probability they prefer Fall?
P(Fall) = 55/200 = 11/40 = 0.275
If a person is selected at random and they do not prefer Summer, what is the probability they prefer Spring?
Non-Summer total: 200 - 70 = 130
| P(Spring | not Summer) = 45/130 = 9/26 ≈ 0.346 |
This is a conditional probability: given the condition “not Summer,” we restrict our denominator to only the 130 people who do not prefer Summer.
Worked Example (Expected Value Concept):
A game costs $5 to play. You roll a standard die: if you roll a 6, you win $20. Otherwise, you win nothing. Should you play?
P(win) = 1/6, P(lose) = 5/6
Expected value of playing: (1/6)($20) + (5/6)($0) - $5 = $3.33 - $5 = -$1.67
The expected value is negative, meaning on average you lose $1.67 per game. You should not play if you want to maximize your money over many games.
While the SAT does not use the term “expected value” formally, it sometimes presents scenarios where you need to weigh probabilities against outcomes to make a decision.
Conditional Probability
| Conditional probability is the probability of an event given that another event has already occurred. It is denoted P(A | B), read as “the probability of A given B.” |
| P(A | B) = P(A and B) / P(B) |
In practice on the SAT, conditional probability questions usually involve two-way tables, where you restrict your attention to a specific row or column.
Worked Example:
A two-way table shows students’ grade (10th or 11th) and lunch choice (pizza or salad):
| Pizza | Salad | Total | |
| 10th | 45 | 30 | 75 |
| 11th | 35 | 40 | 75 |
| Total | 80 | 70 | 150 |
What is the probability that a randomly selected student chose pizza, given that they are in 10th grade?
| P(Pizza | 10th grade) = 45/75 = 3/5 = 0.60 |
Notice that the denominator is 75 (the total number of 10th graders), NOT 150 (the overall total). The “given” condition restricts you to only the 10th graders.
What is the probability that a randomly selected student is in 11th grade, given that they chose salad?
| P(11th | Salad) = 40/70 = 4/7 ≈ 0.571 |
The denominator is 70 (total salad choosers), because the given condition is “chose salad.”
Independence
| Two events are independent if the occurrence of one does not affect the probability of the other. Mathematically, A and B are independent if P(A | B) = P(A), or equivalently, P(B | A) = P(B). |
Worked Example (Testing for Independence):
Using the table above, is grade independent of lunch choice?
P(Pizza) = 80/150 = 0.533
| P(Pizza | 10th) = 45/75 = 0.60 |
| Since P(Pizza | 10th) is not equal to P(Pizza), the events are not independent. Being in 10th grade affects the probability of choosing pizza. |
If the events were independent, every conditional probability would equal the corresponding marginal probability. On the SAT, you demonstrate independence by showing this equality holds, or demonstrate dependence by showing it does not.
Two-Way Frequency Tables
Two-way tables are one of the most frequently tested data display types on the SAT. They organize data into categories along two dimensions.
Reading Two-Way Tables
Worked Example (Complete Table):
A survey asked 300 adults about their exercise habits and sleep quality:
| Good Sleep | Poor Sleep | Total | |
| Exercise Regularly | 120 | 30 | 150 |
| Don’t Exercise | 60 | 90 | 150 |
| Total | 180 | 120 | 300 |
From this table, you can extract many pieces of information:
120 people exercise regularly and report good sleep. 90 people do not exercise and report poor sleep. 150 people exercise regularly (regardless of sleep quality). 180 people report good sleep (regardless of exercise).
Joint, Marginal, and Conditional Frequencies
Joint frequency: The count in a specific cell (the intersection of a row and column). Example: 120 people exercise regularly AND have good sleep.
Marginal frequency: The total for a row or column (found in the margins of the table). Example: 150 people exercise regularly (total for that row).
Conditional frequency: The frequency within a specific subgroup. Example: Of the 150 people who exercise regularly, 120 have good sleep. The conditional frequency of good sleep among regular exercisers is 120/150 = 80%.
Relative Frequency
Relative frequency expresses a count as a proportion of a total. It can be calculated relative to the grand total, a row total, or a column total.
Relative to grand total: 120/300 = 40% of all respondents exercise and have good sleep.
Relative to row total: 120/150 = 80% of regular exercisers have good sleep.
Relative to column total: 120/180 = 66.7% of those with good sleep are regular exercisers.
The SAT tests whether you can identify the correct total to use as the denominator. A question asking “What proportion of people with good sleep exercise regularly?” uses the column total (180) as the denominator, not the grand total or the row total.
Common Two-Way Table Question Types
The SAT asks several distinct types of questions about two-way tables. Knowing what each type asks helps you identify the correct calculation immediately.
Type 1: Simple probability from the table. “If a person is selected at random, what is the probability they exercise regularly?” This uses the grand total as the denominator: 150/300 = 0.50.
Type 2: Conditional probability. “If a person is selected at random from those who have good sleep, what is the probability they exercise regularly?” This restricts to the “Good Sleep” column: 120/180 = 2/3. The denominator is the column total, not the grand total.
Type 3: Joint probability. “What is the probability that a randomly selected person exercises regularly AND has poor sleep?” This uses a specific cell divided by the grand total: 30/300 = 1/10.
Type 4: Relative frequency within a row. “What fraction of people who exercise regularly have good sleep?” This restricts to the “Exercise Regularly” row: 120/150 = 4/5. The denominator is the row total.
Type 5: Comparing groups. “Is a higher proportion of exercisers or non-exercisers reporting good sleep?” Exercisers with good sleep: 120/150 = 80%. Non-exercisers with good sleep: 60/150 = 40%. A higher proportion of exercisers report good sleep.
The critical skill is identifying which total serves as the denominator. The words “given that,” “among,” “of those who,” and “if selected from” all signal conditional probability, where the denominator is a subgroup total rather than the grand total.
Constructing Two-Way Tables From Information
Some SAT questions provide verbal information and ask you to construct or complete a two-way table, then answer questions from it.
Worked Example:
A school has 400 students. 60% are in the science club. Of those in the science club, 75% also participate in math club. Of those NOT in the science club, 40% participate in math club. Construct the two-way table.
Science club: 0.60 * 400 = 240 students Not in science club: 400 - 240 = 160 students
Science AND math: 0.75 * 240 = 180 Science but NOT math: 240 - 180 = 60 Not science AND math: 0.40 * 160 = 64 Not science and NOT math: 160 - 64 = 96
| Math Club | Not Math | Total | |
| Science Club | 180 | 60 | 240 |
| Not Science | 64 | 96 | 160 |
| Total | 244 | 156 | 400 |
Now you can answer any question about this table. For example: What fraction of math club members are also in the science club? 180/244 = 45/61 ≈ 73.8%.
Missing Value Problems
The SAT sometimes presents a partially completed two-way table and asks you to find missing values using the relationships between cells, row totals, and column totals.
Strategy: Remember that every row total equals the sum of the cells in that row, every column total equals the sum of the cells in that column, and the grand total equals the sum of all row totals (or all column totals). Use these relationships to set up equations and solve for missing values.
How This Domain Connects to Real-World Decision Making
Problem-Solving and Data Analysis is the most practically applicable SAT domain. The skills you build here transfer directly to everyday decision-making and professional contexts.
Understanding percentages helps you evaluate financial products (comparing interest rates, understanding loan terms, calculating discounts and taxes), interpret news (unemployment rose by 2 percentage points versus 2 percent, which are very different things), and make informed purchasing decisions.
Understanding statistics helps you evaluate claims in media and advertising (is that “clinically proven” product backed by rigorous evidence or a small, biased study?), interpret medical information (what does it mean when a test has a 5% false positive rate?), and make data-driven decisions at work.
Understanding probability helps you assess risk (should you buy insurance for a specific scenario?), evaluate games and strategies (is this investment worth the risk?), and understand randomness (why does a “hot streak” in basketball not prove a player’s shot percentage has changed?).
Understanding study design helps you be a critical consumer of information. When you read that “a study found coffee prevents cancer,” you can ask: Was it an observational study or an experiment? How was the sample selected? Were confounding variables controlled? This critical thinking is one of the most valuable intellectual skills the SAT tests.
Strategies for Reading Data Displays Accurately
Data interpretation errors are among the most preventable mistakes on the SAT. Here is a systematic approach for reading any data display.
Step 1: Read the title. The title tells you what the data represents. Do not skip it.
Step 2: Read the axis labels and units. Identify what each axis measures and what units are used. Pay special attention to labels like “in thousands” or “per capita” that change the scale.
Step 3: Check the scale. Does the axis start at zero or at some other value? A y-axis that starts at 50 instead of 0 can make small differences appear dramatic.
Step 4: Identify what the question asks. Before reading specific values, know exactly what information you need.
Step 5: Read the specific values carefully. Use a finger or pencil to trace from the data point to the axis to read values accurately.
Step 6: Sanity-check your answer. Does the answer make sense in context? If you calculated that 110% of respondents prefer coffee, something is wrong.
This systematic approach adds about 10 seconds per question but dramatically reduces misreading errors, which are among the most frustrating types of mistakes because you knew the math but read the graph wrong.
Data Displays: Reading and Interpreting
The SAT presents data in multiple formats. For each type, you need to be able to read values accurately, identify trends, and answer questions that require interpretation.
Bar Graphs and Histograms
Bar graphs display categorical data as rectangular bars whose heights represent frequencies or values. Each bar represents a distinct category.
Histograms look similar to bar graphs but display continuous numerical data grouped into intervals (bins). The bars touch each other because the data is continuous. The height of each bar represents the frequency of values in that interval.
Key Differences:
Bar graphs: categories on the x-axis, bars do not touch, order of bars can change without affecting meaning.
Histograms: numerical intervals on the x-axis, bars touch, order is fixed (numerical sequence).
Common Mistakes:
Reading the wrong axis (confusing the category axis with the frequency axis).
Misreading the scale (if the y-axis starts at a value other than zero, differences between bars appear exaggerated).
For histograms, confusing the height of a bar with the individual data values (the height represents how many values fall in the interval, not the values themselves).
Line Graphs
Line graphs display data points connected by lines, typically showing how a quantity changes over time. The x-axis usually represents time and the y-axis represents the measured quantity.
When interpreting line graphs, focus on the overall trend (increasing, decreasing, or flat), the rate of change (steepness of the line), and specific values at particular points.
SAT Question Pattern: “Between which two consecutive time periods did the value increase the most?” Look for the steepest upward segment of the line.
Dot Plots
Dot plots display individual data points as dots above a number line. Each dot represents one observation. They are useful for small datasets and allow you to see the distribution shape, identify the mode, and spot outliers.
Box Plots (Box-and-Whisker Plots)
Box plots summarize a dataset using five key values: the minimum, the first quartile (Q1), the median (Q2), the third quartile (Q3), and the maximum.
The “box” extends from Q1 to Q3, and the line inside the box marks the median. The “whiskers” extend from Q1 to the minimum and from Q3 to the maximum.
Key Values You Can Read From a Box Plot:
Minimum: The left end of the left whisker. Q1: The left edge of the box. Median: The line inside the box. Q3: The right edge of the box. Maximum: The right end of the right whisker. Interquartile Range (IQR): Q3 - Q1, the width of the box.
What You Cannot Read From a Box Plot:
The mean (box plots do not show the mean). The exact data values (you know the five-number summary but not individual points). The number of data points (unless separately stated).
Worked Example:
A box plot shows: minimum = 20, Q1 = 35, median = 50, Q3 = 65, maximum = 90.
What is the IQR? 65 - 35 = 30. What is the range? 90 - 20 = 70. What percentage of the data falls between 35 and 65? 50% (the interquartile range always contains the middle 50% of the data). What percentage of the data is below 50? 50% (by definition, the median splits the data in half).
Scatter Plots and Lines of Best Fit
Scatter plots display two-variable data as points on a coordinate plane. They show the relationship (if any) between the two variables.
Correlation Types:
Positive correlation: as one variable increases, the other tends to increase (points trend upward from left to right).
Negative correlation: as one variable increases, the other tends to decrease (points trend downward from left to right).
No correlation: no clear trend between the variables (points are randomly scattered).
Lines of Best Fit:
A line of best fit (also called a regression line or trend line) is the straight line that best represents the general trend of the data. The SAT tests your ability to interpret the slope and y-intercept of this line in context.
Worked Example:
A scatter plot shows the relationship between hours studied (x-axis) and exam score (y-axis). The line of best fit is y = 5.8x + 42.
Slope interpretation: For each additional hour studied, the predicted exam score increases by 5.8 points.
Y-intercept interpretation: A student who studies 0 hours would be predicted to score 42 on the exam.
Prediction: What score would be predicted for a student who studies 7 hours? y = 5.8(7) + 42 = 40.6 + 42 = 82.6.
Residuals (Conceptual): A residual is the difference between an actual data point and the predicted value from the line of best fit. Residual = actual - predicted. If a student who studied 7 hours actually scored 88, the residual is 88 - 82.6 = 5.4 (the student scored 5.4 points above the prediction).
A positive residual means the actual value is above the line. A negative residual means the actual value is below the line. A residual of zero means the point is exactly on the line.
Correlation Strength: The SAT might describe correlation as strong, moderate, or weak. A strong correlation means the data points cluster tightly around the trend line. A weak correlation means the points are more scattered. The correlation coefficient (r) quantifies this: r close to 1 or -1 indicates strong correlation, r close to 0 indicates weak or no correlation. You will not need to calculate r, but you should be able to identify strong versus weak correlation visually.
Nonlinear Trends: Not all scatter plots show linear relationships. Some show curves (exponential, quadratic, logarithmic). If a scatter plot clearly follows a curve, a linear line of best fit would be inappropriate. The SAT might present such a scatter plot and ask whether a linear model is appropriate, or which type of model (linear, exponential, quadratic) best fits the data. Look at the overall shape: if the data curves upward at an increasing rate, exponential is likely. If it follows a U-shape or inverted U, quadratic is likely.
Worked Example (Scatter Plot Interpretation):
A scatter plot shows the relationship between temperature (x-axis) and ice cream sales (y-axis). The line of best fit has the equation y = 15x - 200, where y is sales in dollars and x is temperature in degrees.
What does the slope mean? For each 1-degree increase in temperature, ice cream sales increase by $15.
What does the y-intercept mean? At a temperature of 0 degrees, the model predicts sales of -$200. Since negative sales do not make sense, the y-intercept is not meaningful in this context. This is a common SAT observation: the y-intercept of a regression line is sometimes outside the range of meaningful data and should not be interpreted literally.
If the temperature is 80 degrees, what are the predicted sales? y = 15(80) - 200 = 1200 - 200 = $1,000.
If actual sales at 80 degrees were $1,150, what is the residual? 1,150 - 1,000 = $150 (positive residual, above the prediction).
Advanced Box Plot Interpretation
The SAT sometimes presents two box plots side by side and asks you to compare the distributions.
Worked Example:
Box Plot A: min=20, Q1=30, median=45, Q3=55, max=70 Box Plot B: min=10, Q1=35, median=50, Q3=60, max=90
Comparing centers: Distribution B has a higher median (50 vs 45). Comparing spreads: Distribution B has a larger range (80 vs 50) and a larger IQR (25 vs 25). Actually the IQRs are the same (both 25), but the range is much larger for B due to the more extreme minimum and maximum values.
What percentage of values in Distribution A fall between 30 and 55? Exactly 50% (from Q1 to Q3, the IQR always contains the middle 50%).
What percentage of values in Distribution A are above 45? Exactly 50% (above the median).
Common Box Plot Mistake: Assuming that a wider box means more data points. The width of the box (IQR) represents the spread of the middle 50%, not the quantity of data. Two box plots can have the same number of data points but very different IQRs.
Another Common Mistake: Trying to determine the mean from a box plot. Box plots show the median, not the mean. If a distribution is skewed (the median is not centered in the box), the mean and median differ, and the box plot cannot tell you the mean.
Interpreting Histograms in Detail
Histograms are particularly important because they reveal the shape of a distribution, which informs decisions about appropriate statistical measures.
Distribution Shapes:
Symmetric (bell-shaped): The histogram is roughly mirror-image on both sides of the center. Mean and median are approximately equal. Both are appropriate measures of center.
Right-skewed (positively skewed): The histogram has a long tail extending to the right. The mean is pulled above the median by the extreme high values. The median is a better measure of center.
Left-skewed (negatively skewed): The histogram has a long tail extending to the left. The mean is pulled below the median by the extreme low values. The median is a better measure of center.
Uniform: All bars are approximately the same height. Values are evenly distributed across the range.
Bimodal: The histogram has two distinct peaks. This might indicate two subgroups within the data.
Worked Example (Histogram Analysis):
A histogram of test scores shows:
50-59: 2 students 60-69: 5 students 70-79: 12 students 80-89: 15 students 90-99: 6 students
Total: 40 students. The distribution is left-skewed (the longer tail extends toward lower scores, and most values are clustered at the higher end).
Approximately what percent scored below 70? (2 + 5)/40 = 7/40 = 17.5%.
What is the modal class (the interval with the highest frequency)? 80-89, with 15 students.
Can you determine the exact median from this histogram? No, only an approximate range. The median is the average of the 20th and 21st values. Counting cumulatively: positions 1-2 are in 50-59, positions 3-7 are in 60-69, positions 8-19 are in 70-79, positions 20-34 are in 80-89. Both the 20th and 21st values fall in the 80-89 interval, so the median is in this interval (but we cannot determine the exact value without the individual data points).
Tables
Tables present data in rows and columns. Reading tables accurately requires identifying the correct row and column for the information you need and paying attention to units and labels.
Common Mistakes:
Reading the wrong row or column. Confusing the header labels. Misinterpreting the units (the table might show values in thousands, meaning 45 in the table represents 45,000).
Data Collection and Study Design
The SAT tests basic concepts about how data is collected and what conclusions can be drawn from different types of studies.
Random Sampling and Generalization
Random sampling means every member of the population has an equal chance of being selected for the study. When a sample is randomly selected from a population, the results can be generalized to that population.
If the sample is not random (for example, if only volunteers respond to a survey), the results may not be representative of the broader population. This limitation on generalizability is frequently tested.
Worked Example:
A researcher wants to understand the eating habits of college students in a state. The researcher surveys 200 students at a single university. Can the results be generalized to all college students in the state?
No. The sample was drawn from only one university, not randomly from all colleges in the state. The results can only be generalized to students at that university (and even then, only if the 200 students were randomly selected within the university).
Random Assignment and Causation
Random assignment means participants in an experiment are randomly placed into treatment and control groups. This is different from random sampling (which is about how participants are selected from the population).
When random assignment is used, differences in outcomes between groups can be attributed to the treatment (cause and effect). Without random assignment, you can only identify associations, not causal relationships.
Worked Example:
A study randomly assigns 100 participants to either a new study method or a traditional study method. The new method group scores 15% higher on a subsequent test. Can we conclude the new method caused the improvement?
Yes. Because participants were randomly assigned, the groups should be similar in all other respects. Any difference in outcomes can be attributed to the different study methods. This is a true experiment with random assignment.
If instead, participants chose which method to use (no random assignment), perhaps more motivated students chose the new method, and the 15% difference could be due to motivation rather than the method itself. Without random assignment, we can only say the new method is associated with higher scores, not that it caused them.
Observational Studies vs Experiments
An observational study collects data without manipulating any variables. The researcher simply observes and records what happens. An experiment involves the researcher deliberately manipulating a variable (the treatment) and observing the effect.
Observational studies can identify associations but not causation. Experiments with random assignment can establish causation.
SAT Question Pattern: A study description is provided, and you must identify which conclusion is valid. The trap answers typically overstate what the study can conclude (claiming causation from an observational study or generalizing beyond the sampled population).
Evaluating Conclusions: A Systematic Approach
The SAT presents study descriptions and asks you to evaluate four possible conclusions. Here is a systematic approach for these questions.
Step 1: Identify the sampling method. Was the sample randomly selected from the population? If yes, results can be generalized to that population. If no, results apply only to the specific group studied.
Step 2: Identify the assignment method. Were participants randomly assigned to groups? If yes, the study can establish cause-and-effect. If no, it can only establish association.
Step 3: Evaluate each answer choice against these two criteria. Eliminate any choice that claims generalization beyond the sampled population or causation without random assignment.
Worked Example:
A researcher randomly selects 200 adults from a city and surveys them about their coffee consumption and sleep quality. The researcher finds that people who drink more coffee report poorer sleep quality.
Which conclusion is valid?
A) Coffee consumption causes poor sleep quality among adults in this city. B) There is an association between coffee consumption and sleep quality among adults in this city. C) Coffee consumption causes poor sleep quality among all adults. D) There is an association between coffee consumption and sleep quality among all adults.
Analysis: The sample was randomly selected from the city, so results can be generalized to adults in that city. However, this is an observational study (no random assignment), so causation cannot be established. The correct answer is B.
Answer A is wrong because there is no random assignment (cannot claim causation). Answer C is wrong for two reasons (no random assignment AND cannot generalize beyond the city). Answer D is wrong because the sample is from one city, not all adults.
Worked Example (True Experiment):
A researcher randomly selects 100 college students from a university and randomly assigns them to two groups. Group A uses a new study technique for two weeks; Group B uses their usual technique. Group A scores significantly higher on a subsequent exam.
Which conclusion is valid?
A) The new technique causes higher exam scores for all college students. B) The new technique causes higher exam scores for students at this university. C) The new technique is associated with higher exam scores for all college students. D) The new technique is associated with higher exam scores for students at this university.
Analysis: The sample was randomly selected from the university (generalize to university, not all students). Random assignment was used (can claim causation). The correct answer is B.
The Three-Step Framework Summary:
Random sampling + random assignment: Causal conclusions generalizable to the population.
Random sampling + no random assignment: Associations generalizable to the population.
No random sampling + random assignment: Causal conclusions limited to the studied group.
No random sampling + no random assignment: Associations limited to the studied group.
Confounding Variables
A confounding variable is a third variable that influences both the independent and dependent variables, creating a false impression of a direct relationship. The SAT tests your understanding of confounding.
Worked Example:
A study finds that cities with more libraries also have lower crime rates. Does this mean libraries reduce crime?
Not necessarily. Wealthier cities tend to have both more libraries and lower crime rates. Wealth is a confounding variable that explains both observations. The libraries themselves may have nothing to do with crime reduction.
Worked Example:
Students who eat breakfast score higher on tests. Does breakfast cause higher scores?
Possibly, but confounding variables could include: overall health habits (students who eat breakfast may also get more sleep), socioeconomic status (families that can provide breakfast may also provide other educational advantages), and motivation (students who wake up early enough for breakfast may be more disciplined overall).
Without controlling for these variables through random assignment, we cannot attribute the higher scores to breakfast alone.
Sample Size and Confidence
The SAT tests the conceptual relationship between sample size and the reliability of results.
A larger sample size generally produces more reliable estimates (smaller margin of error, more precise confidence intervals). A smaller sample size produces less reliable estimates (larger margin of error).
However, increasing sample size does not fix bias. A biased sample of 10,000 people is no more representative than a biased sample of 100. The sampling method matters as much as the sample size.
Worked Example:
Poll A surveys 500 randomly selected voters: 52% support the measure (margin of error 4.4%).
Poll B surveys 2,000 randomly selected voters: 49% support the measure (margin of error 2.2%).
Which poll is more reliable? Poll B, because it has a larger sample size and smaller margin of error.
Can we conclude the measure will definitely pass or fail? No. Poll A gives a range of 47.6% to 56.4%. Poll B gives a range of 46.8% to 51.2%. Both ranges include values on both sides of 50%, so neither poll can conclusively predict the outcome.
Bias in Data Collection
Bias occurs when the data collection method systematically favors certain outcomes over others. The SAT tests several types of bias:
Selection bias: The sample does not represent the population (e.g., surveying only people at a gym about exercise habits).
Response bias: The way questions are worded or the way data is collected influences the responses (e.g., a question like “Don’t you agree that exercise is important?” encourages positive responses).
Nonresponse bias: People who do not respond to a survey may differ systematically from those who do (e.g., a mail survey about internet usage will miss people who primarily use the internet and do not check mail).
Voluntary response bias: When people choose to participate, those with strong opinions are overrepresented (e.g., online product reviews tend to be either very positive or very negative).
Margin of Error
The margin of error quantifies the uncertainty in a survey result. The SAT tests it at a conceptual level: you will not calculate a margin of error but need to understand what it means.
If a poll finds that 55% of respondents support a proposal with a margin of error of 3%, this means the true population proportion is likely between 52% and 58%.
Key Conceptual Points:
A larger sample size produces a smaller margin of error (more data means more precision).
A smaller sample size produces a larger margin of error (less data means more uncertainty).
The margin of error applies to the entire population from which the sample was drawn, not to other populations.
Worked Example:
A poll of 500 voters finds that 48% support Candidate A with a margin of error of 4%. Does this mean Candidate A is losing?
Not necessarily. The true support is likely between 44% and 52%. Since this range includes values both below and above 50%, the poll cannot conclusively determine whether Candidate A is winning or losing. The result is within the margin of error.
Common Traps in This Domain
The wrong denominator trap. Percentage change questions that use the new value instead of the original value as the denominator. Conditional probability questions that use the grand total instead of the subgroup total. Always identify the correct base before calculating.
The correlation-causation trap. A scatter plot shows a positive correlation between ice cream sales and drowning incidents. Does ice cream cause drowning? No. Both are caused by a third variable (hot weather). The SAT tests whether you can distinguish between association and causation.
The misread axis trap. Reading values from the wrong axis or misinterpreting the scale. Always check axis labels and units before reading values.
The “sounds reasonable” trap. In study design questions, an answer choice might sound like a reasonable conclusion in the real world but is not supported by the specific study described. The SAT tests whether you limit your conclusions to what the study’s methodology actually supports.
The outlier effect trap. Questions about how adding or removing a data point affects the mean, median, or range. Students often forget that the median is resistant to outliers while the mean is sensitive to them.
The successive percentage trap. Assuming that a 20% increase followed by a 20% decrease returns to the original value. It does not.
Desmos and Calculator Strategies
For this domain, the standard calculator is generally more useful than the Desmos graphing calculator. Most questions involve arithmetic operations (percentages, proportions, averages) rather than graphing.
Use the calculator for multi-step percentage calculations, unit conversions with large numbers, finding means from large datasets, and verifying proportion setups through cross-multiplication.
Desmos is useful for scatter plot questions where you need to enter data and find a line of best fit, or for visualizing the relationship between two variables. However, this situation is less common than in the Algebra and Advanced Math domains.
Specific Calculator Tips for This Domain:
When computing means from large datasets, use the calculator to sum products (value times frequency) and divide by total count. This is faster and less error-prone than adding long lists of numbers mentally.
For multi-step percentage problems (like successive discounts or tax-then-tip calculations), compute the chain of multiplication factors on the calculator in one continuous calculation rather than computing intermediate values. For example, for a 20% discount then 8% tax on $85: enter 85 * 0.80 * 1.08 = $73.44 in one step rather than finding $68 first and then computing 8% of $68.
For unit conversion chains, enter the entire chain as one multiplication: 60 * 5280 / 3600 = 88 feet per second. This prevents rounding errors that accumulate when you compute intermediate values.
For probability questions involving fractions, use the calculator to verify that your fraction simplifies correctly or to convert between fraction and decimal form for comparison with answer choices.
Score-Level Strategies
Below 550
Focus on ratios, basic percentages, and reading simple data displays (bar graphs, tables). Master the three basic percentage calculations. Practice calculating means and medians. Skip study design and margin of error questions initially. Build comfort with reading graphs and tables accurately by practicing the systematic reading approach described above.
At this level, the most impactful skill to develop is reading data displays without errors. Many below-550 students lose points not because they cannot do the math but because they misread a graph or use the wrong number from a table. Slow down, label what you are reading, and double-check before calculating.
550 to 650
Add percent change (with correct denominator), unit conversions, two-way tables, and basic probability. Practice conditional probability with two-way tables. Learn the conceptual understanding of standard deviation. Begin studying bias and study design concepts.
At this level, focus on the denominator question: for every percentage, probability, or proportion question, consciously identify what the denominator should be before computing. This single habit eliminates the most common error type in this domain.
650 to 750
Master all topics including successive percentage changes, weighted means, conditional probability, box plots, scatter plot interpretation, and study design. Focus on avoiding the traps described above. Practice timed question sets to build speed on data interpretation questions.
At this level, study design questions become important. Learn the three-step framework for evaluating conclusions (sampling method, assignment method, conclusion validity) and practice applying it to novel scenarios. These questions appear in the harder portions of each module and are worth significant points.
750 to 800
These questions should be automatic. Focus on precision: reading axes correctly, using the right denominator, and limiting conclusions to what the data supports. The points you lose here are almost always due to misreading or carelessness, not conceptual gaps.
At this level, develop the habit of spending an extra 5 seconds on every data display question to verify that you have read the correct values. This small time investment pays for itself by preventing the one or two misreading errors per test that can cost 10 to 20 points.
Comprehensive Multi-Step Worked Examples
The hardest questions in this domain combine multiple concepts into a single problem. Practicing multi-step problems builds the integration skills needed for Module 2.
Worked Example (Combining Percentages and Statistics):
A company has 50 employees. The mean salary is $60,000. The company hires 10 new employees at a mean salary of $45,000. What is the new mean salary for all 60 employees?
Total salary before hiring: 50 * $60,000 = $3,000,000 Total salary of new hires: 10 * $45,000 = $450,000 Combined total: $3,000,000 + $450,000 = $3,450,000 New mean: $3,450,000 / 60 = $57,500
If the question then asks: “What is the percent decrease in mean salary?” the answer is (60,000 - 57,500)/60,000 * 100 = 2,500/60,000 * 100 ≈ 4.17%. The base for percent decrease is the original mean ($60,000), not the new mean.
Worked Example (Combining Probability and Two-Way Tables):
A survey of 500 people categorized respondents by age group (under 30 or 30+) and preferred news source (online or print).
| Online | Total | ||
| Under 30 | 180 | 45 | 225 |
| 30+ | 120 | 155 | 275 |
| Total | 300 | 200 | 500 |
Part A: What proportion of respondents prefer online news? 300/500 = 0.60
Part B: Among respondents under 30, what proportion prefer online? 180/225 = 0.80
Part C: Among online news preferrers, what proportion are under 30? 180/300 = 0.60
| Part D: Are age group and news preference independent? P(online) = 0.60. P(online | under 30) = 0.80. Since 0.80 does not equal 0.60, the events are not independent. Age group is associated with news source preference. |
Part E: If a respondent is selected at random from the 30+ age group, what is the probability they prefer print? 155/275 ≈ 0.564.
This single table generates five different questions, each testing a distinct aspect of data analysis. On the SAT, you might see only one question from such a table, but being prepared for all possible question types ensures you can handle whatever is asked.
Worked Example (Combining Rates and Unit Conversions):
A machine produces 840 widgets per hour. Each widget weighs 0.25 kilograms. The widgets are packed into boxes that hold 20 kilograms each.
How many widgets fill one box? 20 kg / 0.25 kg per widget = 80 widgets.
How many boxes does the machine fill per hour? 840 widgets/hour / 80 widgets/box = 10.5 boxes per hour.
If the machine runs for 8 hours per day, how many full boxes are produced daily? 10.5 * 8 = 84 boxes per day.
If each box sells for $150, what is the daily revenue? 84 * $150 = $12,600.
This problem chains four calculations together. Each step is simple, but the overall problem requires careful tracking of units and intermediate values. On the SAT, you might be given the first few pieces of information and asked for the final value.
Worked Example (Combining Scatter Plots and Prediction):
A scatter plot shows the relationship between advertising spending (x, in thousands of dollars) and monthly sales (y, in thousands of units). The line of best fit is y = 0.8x + 2.
Part A: If the company spends $15,000 on advertising (x = 15), what is the predicted monthly sales? y = 0.8(15) + 2 = 14. Predicted sales: 14,000 units.
Part B: If actual sales were 16,000 units when spending $15,000, what is the residual? 16 - 14 = 2 (thousand units). The actual sales exceeded the prediction by 2,000 units.
Part C: According to the model, how much additional advertising spending is needed to increase sales by 4,000 units (4 thousand)? Since the slope is 0.8, each additional thousand dollars of advertising produces 0.8 thousand additional sales. To get 4 thousand more sales: 4/0.8 = 5 thousand dollars of additional advertising ($5,000).
Part D: Is it reasonable to use this model to predict sales if advertising spending is $100,000 (x = 100)? The prediction would be y = 0.8(100) + 2 = 82 thousand units. However, this extrapolation far beyond the range of the original data may not be valid. The linear relationship observed within the data range may not hold at extreme values. On the SAT, the correct answer would note that the prediction is unreliable because it requires extrapolation.
The Complete Study Plan
Week 1: Ratios, Rates, and Percentages
Study proportions, unit rates, percentage calculations, percent change, and successive percentage changes. Practice 20+ questions emphasizing correct setup and avoiding common traps.
Week 2: Statistics
Study mean, median, mode, range, standard deviation (conceptual), outlier effects, and comparing distributions. Practice with dot plots, histograms, and box plots. 20+ questions.
Week 3: Probability and Two-Way Tables
Study basic probability, conditional probability, independence, and all aspects of two-way tables. Practice extracting information from tables and calculating conditional frequencies. 20+ questions.
Week 4: Data Displays and Study Design
Study all data display types, scatter plots, lines of best fit, residuals, random sampling, random assignment, bias, and margin of error. Practice interpreting graphs and evaluating study conclusions. 20+ questions.
Week 5: Integration and Timed Practice
Take mixed question sets under timed conditions. Focus on reading accuracy and avoiding traps. Analyze every error and revisit weak areas.
Ongoing Maintenance
After completing the five-week study plan, continue practicing Problem-Solving and Data Analysis as part of your broader SAT preparation. Include 5 to 8 questions from this domain in each mixed practice session to maintain fluency with data interpretation, percentage calculations, and probability. Pay particular attention to two-way tables and study design questions, as these are the topics most likely to fade without regular practice.
When reviewing practice tests, pay special attention to any data interpretation errors. These errors often feel especially frustrating because the student understood the math but misread the data. Track how often you make reading errors versus calculation errors. If reading errors are your primary weakness, dedicate practice time specifically to the systematic reading approach described in this guide: reading titles, checking axes, verifying scales, and identifying the specific information needed before calculating.
Remember that this domain, despite its smaller question count, offers some of the most learnable points on the SAT. A student who invests focused preparation time in percentages, two-way tables, and data display reading can reliably answer all 5 to 7 questions correctly, adding 50 to 70 points to their Math score with a relatively modest study investment.
Frequently Asked Questions
How many Problem-Solving and Data Analysis questions are on the SAT? Approximately 5 to 7 of the 44 total Math questions come from this domain.
What is the most commonly tested topic in this domain? Percentages (especially percent change) and two-way frequency tables are the most frequently appearing topics.
Do I need to calculate standard deviation on the SAT? No. The SAT only tests standard deviation at a conceptual level. You need to understand what it measures and be able to compare the spread of two datasets, but you will never need to compute it.
What is the most common mistake on percentage change questions? Using the wrong denominator. Percent change uses the original value as the denominator, not the new value. The SAT includes the incorrect answer calculated with the wrong denominator among the choices.
How do I handle conditional probability questions? Identify the “given” condition and restrict your denominator to only the group that satisfies that condition. If the question asks “probability of A given B,” the denominator is the count of B, not the total.
What is the difference between a bar graph and a histogram? Bar graphs display categorical data (separate bars for each category). Histograms display continuous numerical data grouped into intervals (bars touch). Bar order can change; histogram order is fixed.
Can observational studies establish causation? No. Observational studies can identify associations but cannot establish cause-and-effect relationships. Only experiments with random assignment can establish causation.
What does margin of error mean? It quantifies the uncertainty in a survey result. A margin of error of 3% means the true population value is likely within 3 percentage points above or below the survey result.
How do I find the median from a frequency table? Add the frequencies to find the total count. The median is at position (total + 1)/2 for odd totals or the average of the two middle positions for even totals. Count through the frequencies cumulatively until you reach the median position.
What makes a sample biased? A sample is biased when its composition systematically differs from the population. This can happen through selection bias, voluntary response, nonresponse, or non-random sampling.
How do I interpret the slope of a line of best fit? The slope represents the predicted change in the y-variable for each unit increase in the x-variable. Interpret it in the context of the specific variables being measured.
What is a residual? A residual is the difference between an actual data point and the predicted value from the line of best fit (residual = actual - predicted). Positive residuals are above the line; negative residuals are below.
Should I use the mean or median to describe a typical value? Use the median when the data has outliers or is skewed. Use the mean when the data is roughly symmetric with no extreme values.
How do I compare the spread of two datasets? Look at the range and the general dispersion of values. A dataset with values clustered tightly around the center has a smaller standard deviation than one with values spread widely.
What is the complement rule in probability? P(not A) = 1 - P(A). This is useful when it is easier to calculate the probability of the event not happening than the event happening.
How do I know if two events are independent? Two events are independent if P(A|B) = P(A). If the probability of A changes depending on whether B has occurred, the events are not independent.
What is the interquartile range (IQR)? IQR = Q3 - Q1. It measures the spread of the middle 50% of the data. It is more resistant to outliers than the range.
How important is this domain for my overall score? With 5 to 7 questions, it contributes approximately 50 to 70 points to your Math score. These points are highly learnable, making this domain one of the most efficient areas to study for quick improvement.