←HomeAbout→ Claude Code's poor time awareness

From some of my interactions with Claude Code1 so far I’ve noticed that it seems to have a pretty glaring issue: it doesn’t really have a good sense of time-tracking built into it. Because of this, Claude2 misses good choices that would help it finish some tasks significantly faster.

The general form of those specific interactions of mine is essentially this:

To give a concrete example, consider this unit-test-like scenario. You have the following source file which runs a bunch of tests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include "../hidden/tests.hpp"

int main() {
    test_1();
    test_2();
    test_3();
    test_4();
    test_5();
    test_6();
    test_7();
    test_8();
    test_9(1);
    test_10();
}

The header:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// ../hidden/tests.hpp

#include <cstdio>
#include <stdexcept>
#include <unistd.h>

template<int LABEL>
void pass_after_10_seconds() {
    fprintf(stderr, "started test %d ...\n", LABEL);
    sleep(10);
    fprintf(stderr, "test %d succeeded\n", LABEL);
}

void (*test_1)() = &pass_after_10_seconds<1>;
void (*test_2)() = &pass_after_10_seconds<2>;
void (*test_3)() = &pass_after_10_seconds<3>;
void (*test_4)() = &pass_after_10_seconds<4>;
void (*test_5)() = &pass_after_10_seconds<5>;
void (*test_6)() = &pass_after_10_seconds<6>;
void (*test_7)() = &pass_after_10_seconds<7>;
void (*test_8)() = &pass_after_10_seconds<8>;

void test_9(int input) {
    fprintf(stderr, "started test 9 ...\n");
    sleep(1); // Sleep only for one second

    if (input <= 0)       throw std::logic_error("test_9: argument must be positive");
    if (input < 1000)     throw std::logic_error("test_9: argument must be at least 1000");
    if (input >= 10000)   throw std::logic_error("test_9: argument must be less than 10000");
    if (input % 2 != 0)   throw std::logic_error("test_9: argument must be even");
    if (input % 3 != 0)   throw std::logic_error("test_9: argument must be divisible by 3");
    if (input % 7 != 0)   throw std::logic_error("test_9: argument must be divisible by 7");
    if (input % 100 != 0) throw std::logic_error("test_9: argument must be a multiple of 100");
    if (input % 9 == 0)   throw std::logic_error("test_9: argument must not be divisible by 9");
    if (input <= 4000)    throw std::logic_error("test_9: argument must be greater than 4000");
    if (input >= 5000)    throw std::logic_error("test_9: argument must be less than 5000");

    fprintf(stderr, "test 9 succeeded\n");
}

void (*test_10)() = &pass_after_10_seconds<10>;

Example output:

1
2
3
4
5
6
7
8
9
started test 1 ...
test 1 succeeded
started test 2 ...
test 2 succeeded
...
started test 9 ...
terminate called after throwing an instance of 'std::logic_error'
  what():  test_9: argument must be at least 1000
Aborted                    (core dumped) ./a.out

So every test except the 9-th passes after 10 seconds. The 9-th test passes after one second if you give it the correct number, otherwise it fails.

The goal is to have Claude find the correct number (i.e. 4200).

Here’s the prompt (using Opus 4.6 Max):

Make the test suite in main.cpp pass. You can run the tests with this command: g++ main.cpp-std=c++17 && ./a.out.

You don’t have permission to inspect the binary or the ../hidden/tests.hpp header. You can only compile and run the ./a.out binary and make changes to main.cpp until all tests pass. Don’t try to cause compilation errors to find out more about the header. Don’t assume that the process is hanging if it takes a long time to finish. It just takes a while to complete.

Do all ./a.out invocations in the background and think while they’re running.

Note: the tests are independent and can be run in any order.

(These restrictions are in the prompt to prevent Claude from ‘cheating’ and finding the answer in one go by looking at all of the conditions directly, instead of finding it by running the tests and handling each error it encounters, like it would have to do in practice.)

Even if the last line basically spells it out as ‘Hey, you can just run test_9() first if the other tests take a significant amount of time’, Claude doesn’t quite budge. It keeps blindly running all of the tests in the same sequence, and the only thing it modifies from one attempt to the next is the value being passed to test_9().

What this means is that Claude takes around 9-10 minutes to finish what should be a 1-minute task at most.

On the other hand, GPT-5.4-xhigh does a better job. It has some decent time awareness built into it and while it runs the tests, it has thoughts like this, even without the part of the prompt that explicitly says the tests are independent and can be run in any order:

I’m thinking about how to streamline our testing process by temporarily focusing on just test_9 […]

GPT-5.4-xhigh isolates test_9() after a few full attempts, though, so it still takes around 4-5 minutes to finish.

There is a limit to Claude’s patience, however. For example, if you add 20 tests in total and make test_19() the one that tries to guess the number, after about 10 minutes of trying to run all of the tests in each ./a.out invocation, it finally realizes that it can temporarily delete the first 18 tests and focus on test_19() until it passes. Still, this should be way more aggressive. Given the crazy ad-hoc things that Claude does and the creative ways in which it tries to bypass your instructions sometimes, it’s surprising that something as basic as this doesn’t immediately trigger a ‘gotta go fast’ sense.

Another change that makes it focus on the failing test is this (coupled with the appropriate renaming in ../hidden/tests.hpp, which I’ll omit):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
diff --git a/src/main.cpp b/src/main.cpp
index 4d02c69..3acbd7b 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -9,6 +9,6 @@ int main() {
     test_6();
     test_7();
     test_8();
-    test_9(1);
-    test_10();
+    test_19(1); // Give the failing test a bigger label
+    test_20();  // This one too.
 }

As someone who isn’t familiar with LLM internals, my first guess would be that using larger numbers for those last tests can make the sequence seem ’longer’ even though the number of tests is the same. And, in turn, it seeming longer can make Claude realize that it can focus on a shorter sequence, as per the description in the original prompt.

Still, even if using 20-digit numbers for my test names would make Claude more likely to iterate faster, this isn’t something that I should need to gamble on.

How hard is it to add some form of robust time-tracking such that these types of scenarios can be dealt with in a more human-like manner (preferably mimicking a human who wants to iterate really fast while solving the error)? Also – is GPT’s time-tracking ability really as good as such an ability can be achieved? Very unlikely.

I’m looking forward to seeing how far this capability can be improved.


  1. Version 2.1.77 was used for the prompts in this post. ↩︎

  2. Specifically referring to Claude Code here, not Claude in general. If you’re using Anthropic’s API directly, you can plug some kind of time awareness into the system. But my point is that this seems like it should be a fundamental ability of Claude Code, not something that you need to add. ↩︎