From some of my interactions with Claude Code1 so far I’ve noticed that it seems to have a pretty glaring issue: it doesn’t really have a good sense of time-tracking built into it. Because of this, Claude2 misses good choices that would help it finish some tasks significantly faster.
The general form of those specific interactions of mine is essentially this:
- There is an executable that performs a sequence of independent operations, and in that sequence Claude is being prompted to add something new (or to fix something).
- Claude’s first attempt at adding the new operation fails with a runtime error.
- Claude’s second attempt at adding the new operation fails with a runtime error.
- And so on until Claude’s n-th attempt when all errors have been fixed and the thing runs properly.
- The problem: between the first and last attempt, Claude does not get the idea of skipping the head of the sequence of operations and starting with its newly-added operation, even though Claude has access to the information that the head of the sequence takes a long time to succeed and the newly-added operation fails very fast when it’s finally reached. Although you can achieve such a behavior with various prompts, this kind of (basic) time awareness should be in Claude Code by default.
To give a concrete example, consider this unit-test-like scenario. You have the following source file which runs a bunch of tests:
|
|
The header:
|
|
Example output:
|
|
So every test except the 9-th passes after 10 seconds. The 9-th test passes after one second if you give it the correct number, otherwise it fails.
The goal is to have Claude find the correct number (i.e. 4200).
Here’s the prompt (using Opus 4.6 Max):
Make the test suite in
main.cpppass. You can run the tests with this command:g++ main.cpp-std=c++17 && ./a.out.You don’t have permission to inspect the binary or the
../hidden/tests.hppheader. You can only compile and run the./a.outbinary and make changes tomain.cppuntil all tests pass. Don’t try to cause compilation errors to find out more about the header. Don’t assume that the process is hanging if it takes a long time to finish. It just takes a while to complete.Do all
./a.outinvocations in the background and think while they’re running.Note: the tests are independent and can be run in any order.
(These restrictions are in the prompt to prevent Claude from ‘cheating’ and finding the answer in one go by looking at all of the conditions directly, instead of finding it by running the tests and handling each error it encounters, like it would have to do in practice.)
Even if the last line basically spells it out as ‘Hey, you can just run
test_9() first if the other tests take a significant amount of time’, Claude
doesn’t quite budge. It keeps blindly running all of the tests in the same
sequence, and the only thing it modifies from one attempt to the next is the
value being passed to test_9().
What this means is that Claude takes around 9-10 minutes to finish what should be a 1-minute task at most.
On the other hand, GPT-5.4-xhigh does a better job. It has some decent time awareness built into it and while it runs the tests, it has thoughts like this, even without the part of the prompt that explicitly says the tests are independent and can be run in any order:
I’m thinking about how to streamline our testing process by temporarily focusing on just
test_9[…]
GPT-5.4-xhigh isolates test_9() after a few full attempts, though, so it still takes
around 4-5 minutes to finish.
There is a limit to Claude’s patience, however. For example, if you add 20
tests in total and make test_19() the one that tries to guess the number,
after about 10 minutes of trying to run all of the tests in each ./a.out
invocation, it finally realizes that it can temporarily delete the first 18
tests and focus on test_19() until it passes. Still, this should be way more
aggressive. Given the crazy ad-hoc things that Claude does and the creative
ways in which it tries to bypass your instructions sometimes, it’s surprising
that something as basic as this doesn’t immediately trigger a ‘gotta go fast’
sense.
Another change that makes it focus on the failing test is this (coupled with
the appropriate renaming in ../hidden/tests.hpp, which I’ll omit):
|
|
As someone who isn’t familiar with LLM internals, my first guess would be that using larger numbers for those last tests can make the sequence seem ’longer’ even though the number of tests is the same. And, in turn, it seeming longer can make Claude realize that it can focus on a shorter sequence, as per the description in the original prompt.
Still, even if using 20-digit numbers for my test names would make Claude more likely to iterate faster, this isn’t something that I should need to gamble on.
How hard is it to add some form of robust time-tracking such that these types of scenarios can be dealt with in a more human-like manner (preferably mimicking a human who wants to iterate really fast while solving the error)? Also – is GPT’s time-tracking ability really as good as such an ability can be achieved? Very unlikely.
I’m looking forward to seeing how far this capability can be improved.
-
Version 2.1.77 was used for the prompts in this post. ↩︎
-
Specifically referring to Claude Code here, not Claude in general. If you’re using Anthropic’s API directly, you can plug some kind of time awareness into the system. But my point is that this seems like it should be a fundamental ability of Claude Code, not something that you need to add. ↩︎