A new artificial intelligence system can solve SAT geometry questions as well as the average American 11th-grade student.
This system, called GeoS, uses a combination of computer vision to interpret diagrams, natural language processing to read and understand text, and a geometric solver to achieve 49 percent accuracy on official SAT test questions.
If these results were extrapolated to the entire Math SAT test, the computer roughly achieved an SAT score of 500 (out of 800), the average test score for 2015.
Researchers from Allen Institute for Artificial Intelligence (AI2) and the University of Washington computer science and engineering department shared a paper on the findings at the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) in Lisbon, Portugal.
How Geos Works
To achieve the results, GeoS solved unaltered SAT questions that it had never seen before and that required an understanding of:
- Implicit relationships
- Ambiguous references
- The relationships between diagrams and natural-language text
“Unlike the Turing Test, standardized tests such as the SAT provide us today with a way to measure a machine’s ability to reason and to compare its abilities with that of a human,” says Oren Etzioni, CEO of AI2.
“Much of what we understand from text and graphics is not explicitly stated, and requires far more knowledge than we appreciate. Creating a system to be able to successfully take these tests is challenging, and we are proud to achieve these unprecedented results.”
“We are excited about GeoS’s performance on real-world tasks,” says Ali Farhadi, senior research manager for Vision at AI2 and UW assistant professor of computer science and engineering. “Our biggest challenge was converting the question to a computer-understandable language. One needs to go beyond standard pattern-matching approaches for problems like solving geometry questions that require in-depth understanding of text, diagram, and reasoning.”
GeoS is the first end-to-end system that solves SAT plane geometry problems. It does this by first interpreting a geometry question by using the diagram and text in concert to generate the best possible logical expressions of the problem, which it sends to a geometric solver to solve. Then it compares that answer to the multiple-choice answers for that question.
This process is complicated by the fact that SAT questions contain many unstated assumptions.
For example, in the SAT problem at right, there are several unstated assumptions, such as the fact that lines BD and AC intersect at E, that “circle O has a radius of 5” is the same as “circle O radius equals 5” and that the drawing may or may not be to scale.
GeoS had a 96 percent accuracy rate on questions it was confident enough to answer, which is an important dimension of learning. Today, GeoS can solve plane geometry questions; AI2 is moving to solve the full set of SAT math questions in the next three years.
As part of AI2’s commitment to sharing its research for the common good, all data sets and software are available for other researchers to use.