BT2450 – The Test

LEGEND
  Correct answer by Chess Tiger   Correct answer by PocketChess
  Correct answer by ChessGenius (ARM)   Same & incorrect answer by multiple programs
  Correct answer by ChessGenius (68K)   ??? = Program crashed while analyzing

  Chess Tiger ChessGenius (ARM) ChessGenius (68K) PocketChess
(BT) ELO 2114 2244 1812 1647
# ANSWER EVAL TIME EVAL TIME EVAL TIME EVAL TIME
1 Nxg7 Nxg7 117 Nxg7 36 Qb3 900 Qb3 900
2 Bxb6 Bxb6 342 Bxb6 53 Bxb6 62 Be3 900
3 Re6 Re6 331 Re6 28 Re6 87 Nh3 900
4 Qf7 Rb1 900 Qf7 35 Rf1 900 Qb1 900
5 Ka6 Ka6 69 Kc5 900 Kb4 900 Bf3+ 900
6 e3 e3 347 e3 0 e3 81 Re8 900
7 Rd6 Rd6 53 Rd6 0 Rc2 900 Nc4 900
8 Rxc6+ Rxc6 102 Rxc6+ 23 Rxc6+ 2 Rxc6+ 236
9 g5 g5 1 cxd4 900 Rf8 900 Bxe5 900
10 Rxg7+ Bh6 900 Rxg7+ 1 c4 900 c4 900
11 Qxh2+ Qxh2 536 Qxh2+ 1 Qh5 900 Qh5 900
12 Qe4 Qe4 44 Qf5 900 Qf5 900 b6 900
13 Nb4 Ne3 900 Nb4 63 Ne3 900 Qc5 900
14 Rxh7 Rxh7 93 Rxh7 1 Qxb5 900 Qxb5 900
15 Rg6 Rg6 1 Rg6 0 Rg6 1 Rg6 332
16 g6 g6 7 g6 0 g6 327 ??? 900
17 Qxf4 Qxf4 68 Qxf4 3 Rxf4 900 Rxf4 900
18 d6 a5 900 d6 18 d6 380 Kxd4 900
19 f3 Bxf1 900 f3 33 Bxf1 900 Bxf1 900
20 Ra2 Ra2 106 Ra2 35 Ng5 900 Ng5 900
21 Re6 Re6 226 Re6 4 Qe4 900 Qe4 900
22 a3 a3 13 a3 35 cxd5 900 Bxf6 900
23 Qf6 Qf6 63 Qf6 2 Qf6 192 Bh6 900
24 g6 Ra1 900 Ra1 900 Ra5 900 Ra5 900
25 Nd3 Kc5 900 Nd3 289 Kc5 900 g2 900
26 f5 f5 2 Re5 900 Qe2 900 f5 546
27 e6 e6 356 e6 2 Qxe4 900 ??? 900
28 Ne4 Ne4 1 Ne4 0 Ne4 1 Ne4 238
29 Ke1 Ke1 17 Qd8 900 Ke1 1 Ke1 240
30 f4 d4 900 f4 125 Rg8+ 900 Rg8+ 900

Hubert Bednorz and Fred Toennissen developed two test suites to measure the tactical capability of chess engines: BT2450 and its predecessor, BT2630.

Each test suite contains 30 positions, and the BT2450 test can be seen here. A chess engine is given 15 minutes (900 seconds) to analyze each position.

If a position is solved, the solution time is recorded in seconds. It doesn't count as a solution if the engine finds the move and then changes its mind. If the engine finds the move, changes its mind then finds the move again, that second time is used. Any solution that is not found scores as 900 seconds. The 30 times are averaged and subtracted from 2450 to give the (BT) ELO rating.

Again, these tests were constructed to measure tactical capability: not necessarily positional capability. So the (BT) ELO rating may not be a true representation of its true ELO rating.

Side Note

ChessGenius doesn't have a custom "Time per move" option, but it does have an "Analyze Game" setting. So I set the time controls to "Game in 60 minutes" and got the stopwatch ready. I used the small board view so I could watch the main line of thinking. As soon as it locked on to the right answer, I marked that time.

PocketChess, on the other hand, did have a custom "Time per move" option. So after loading each position, I reset the time controls to 15 minutes per move then asked for a Hint. Unlike ChessGenius, you can't see PocketChess' main line. So I timed the little progress clock and presumed (maybe erring to its benefit, but probably erring slightly to its detriment) when that clock hit 12:00 (completely filled) that it reached its final decision. If a "Main line" option is added something in the future, I'll be happy to re-test it and hopefully get more accurate results.

The Results

After running these tests, I was surprised by a couple of things. First, I was surprised to see the difference in ratings. After seeing the games and ratings against Chessmaster, I expected ChessGenius to do a little better. There were a couple of positions I fully expected ChessGenius to solve, but it didn't happen. Either way, Chess Tiger cleaned house, solving 22 of the 30 problems.

My second surprise came from PocketChess. On two of these tests, PocketChess simply crashed. For reasons unknown to me, it simply gave a "Fatal Exception" causing a Reset. On #16, it thought for around 6 minutes before crashing; on #27 it thought for around 8.5 minutes before crashing. Tinyware is aware of this problem.