Coding Agents Grow Up

For years, programming something new followed a predictable, exhausting rhythm: write some code, hit a wall, and then disappear into a forest of documentation and StackOverflow tabs to find the trick to get it working. In 2025, that era ended for me.

Today, AI code assistants take so much drudgery out of development and debugging that the work has become mostly gratifying and rarely frustrating — the opposite of how programming was in the Before Times.

Between my work and my side interests, I often feel like I live in Visual Studio Code (VSCode — a popular open-source development environment). Early in 2025 I subscribed to GitHub Copilot, which integrates AI coding assistants into VSCode. At $10/month it’s a phenomenal bargain that offers software developers an easy way to loop in the latest models from OpenAI, Google, and Anthropic. Now when I’m trying something new (like this little project I did for fun) I can mostly stay in VSCode and work with an AI assistant that has mastered all of the documentation.

As mentioned elsewhere, my most significant side project in 2025 was helping Ukrainians develop an open-source ballistic calculator (pyballistic) in Python and Cython. I polished that off at the end of September.  By that point I had begun to spend more time with Github Copilot, and the capabilities of its latest models gave me enough confidence and support to tackle what would have previously been an absurdly ambitious project for my day job: an Excel Real-Time Data (RTD) server for the Interactive Brokers API.  (This RTD server feeds live market data, positions, and orders directly into Excel using native Excel formulas.) After two months of working seven days a week on that, I had a beautiful piece of software so solid (and validated every build by over 800 unit tests) that I had begun to use it in live trading operations.

Early Childhood Development

Watching these models mature over the last year has been like watching a child grow up.

Tell a child, “Clean your room.” First they’ll spend more time arguing than it would take to just do it. When they finally declare the task “done,” you might find a few toys picked up but most of the mess still there. Emphasize that “clean your room” means everything and you might find the floor clean but everything shoved under the bed.

Claude v3 was notorious for hacking shortcuts. Ask it to fix a failing test and it might just replace the test logic with a “return true;” statement. Claude v3.5 wouldn’t be so brazen, but it was still prone to hack the example rather than the task. GPT-4 and Gemini v2 would enthusiastically announce completion without checking their work. Like the child who picks up two toys and concludes that his room must be clean, even though the mess is visible from outside the door.

The teens came quickly: Claude v3.7 and its contemporaries would often spend more effort arguing that its failure was actually success than it would have taken to do the work correctly.

More recent models have become more likely to keep checking and working until they succeed. Performance of the latest models is still wildly variable: a model that astonishes me with its apparent skill one day may choke on something relatively simple the next. But they are getting more consistent. And they are definitely getting more intelligent.

What is intelligence?  It becomes easy to see when you’re doing hard work with different models. One of the neat things about Copilot is that you can choose to watch the model at work. They all think “out loud,” meaning you can read their chain of thought to understand how and why they do things. When it’s not having an off day, Claude Opus is intelligent.  Given a problem:

  • It can more reliably identify what matters.
  • It has a better sense of what to look at and what to ignore.
  • It produces better assessments of what’s possible and makes better plans to get there.
  • It knows when to persist and when to change directions.

These are some of the things that separate a junior developer from a more experienced one. They are also qualities that characterize more intelligent people.

Let me show you. Have you ever wondered what it’s like debugging software? Well, debugging is one thing the newer models can usually do as well as a good human programmer. In fact, they can do it better because they can run the process faster and interact with the code more directly. Below I pasted a transcript of Claude working to find and fix a tricky bug in my RTD server. This could just as well have been a transcript of my thoughts if I had to debug it. But whereas this would have been a draining hour+ distraction for me, the Claude instance cranked this out in minutes.


Transcript of Claude Debugging in VSCode

… Now I see the issue. Let me check how the order list is populated – specifically the GetOrAddOrderListTopic and HandleOpenOrderEnd methods:

Read file:OrderManager.cs, lines 165 to 225

I see – when a new order list topic is created, it seeds with BuildOrderListCsv() only if m_ordersSnapshotCompleted is true. Let me check what HandleOpenOrderEnd does:

Searched for text HandleOpenOrderEnd, 7 results. Read file:OrderManager.cs, lines 410 to 490

Now let me look at PublishOrderListsIfAny and BuildOrderListCsv:

Read file:OrderManager.cs, lines 880 to 960

Now I understand the flow. Let me check the test more closely to understand the potential race condition:

Read OrderBehaviorTests.cs, lines 1795 to 1845

I see a potential issue! The test subscribes to the order list AFTER calling HandleOpenOrder, but BEFORE calling HandleOpenOrderEnd. This is a problematic sequence because:

  1. HandleOpenOrder adds orders to m_orders
  2. GetOrAddOrderListTopic is called – it sees m_ordersSnapshotCompleted = false and m_ordersRequested = false, so it seeds with empty string and calls EnsureOrdersRequested()
  3. HandleOpenOrderEnd sets m_ordersSnapshotCompleted = true and calls PublishOrderListsIfAny()

But wait – the issue is m_ordersRequested. Let me check:

Searched for text m_ordersRequested, 6 results. Read OrderManager.cs, lines 193 to 223

I see. When GetOrAddOrderListTopic is called:

  • m_ordersSnapshotCompleted = false (orders have been added but HandleOpenOrderEnd hasn’t been called)
  • m_ordersRequested = false (we haven’t called EnsureOrdersRequested yet)

Since !m_ordersRequested is true, the condition if (m_ordersSnapshotCompleted || !m_ordersRequested) is TRUE, so it calls BuildOrderListCsv() immediately!

So the test should work… Let me think about this more carefully.

Actually, looking at the test, the issue is:

  1. HandleOpenOrder is called 3 times for different orders
  2. GetOrAddOrderListTopic is called – it seeds with BuildOrderListCsv() because !m_ordersRequested is true
  3. HandleOpenOrderEnd is called – it calls PublishOrderListsIfAny() which updates the list again

The problem is that BuildOrderListCsv() checks ShowFilledOrders which uses m_showFilledOrdersOverride ?? ShowFilledOrdersConfig. The test uses showFilledOrdersOverride: false, so that should be fine.

Wait – the error message says the string is EMPTY! That means BuildOrderListCsv returned an empty string. This could happen if:

  1. m_orders is empty (but we just added 3 orders)
  2. All orders are filtered out

Let me look at HandleOpenOrder:

Searched for text public List<int> HandleOpenOrder, 2 results. Read OrderManager.cs, lines 260 to 330

I found it! Look at line 314-316:

// Mark list as changed only if this is a new order
if (isNewOrder)
{
    // Update order list topics immediately (don't wait for next polling cycle)
    updated.AddRange(PublishOrderListsIfAny());
}

When HandleOpenOrder adds a NEW order, it calls PublishOrderListsIfAny(). But when the test calls HandleOpenOrder for the first time, there are NO order list topics yet (m_orderListTopics is empty), so PublishOrderListsIfAny() does nothing.

Then when GetOrAddOrderListTopic is called, if !m_ordersRequested is true, it seeds with BuildOrderListCsv(). This should work…

But wait – the intermittent failure suggests a race condition or ordering issue. Let me re-read line 198:

if (m_ordersSnapshotCompleted || !m_ordersRequested)
{
    var csv = BuildOrderListCsv(descriptor);

So it seeds if EITHER:

  1. m_ordersSnapshotCompleted is true (we’ve finished the initial snapshot), OR
  2. !m_ordersRequested is true (we haven’t requested orders yet – implies we’re using pre-existing data)

The test sequence:

  1. Create OrderManager – m_ordersRequested = false, m_ordersSnapshotCompleted = false
  2. Call HandleOpenOrder 3 times – orders are added to m_orders
  3. Call GetOrAddOrderListTopicm_ordersSnapshotCompleted = false, m_ordersRequested = false
    • Condition !m_ordersRequested is TRUE → calls BuildOrderListCsv() → should return “12345” (only open order)
    • Then calls EnsureOrdersRequested() which… let me check what that does

Read OrderManager.cs, lines 775 to 835

I see! EnsureOrdersRequested() sets m_ordersRequested = true and m_currentPollCycleUtc = now.

So the sequence in the test:

  1. Create OrderManager – m_ordersRequested = false, m_ordersSnapshotCompleted = false, m_currentPollCycleUtc = DateTime.MinValue
  2. Call HandleOpenOrder 3 times – orders added to m_orders, but m_currentPollCycleUtc is still DateTime.MinValue so orders are NOT marked as “seen in poll”
  3. Call GetOrAddOrderListTopic:
    • Condition !m_ordersRequested is TRUE → calls BuildOrderListCsv() → should return “12345”
    • Then calls EnsureOrdersRequested() → sets m_ordersRequested = true, m_currentPollCycleUtc = now
  4. Call HandleOpenOrderEnd:
    • Sets m_ordersSnapshotCompleted = true
    • Check for disappeared orders: if (m_currentPollCycleUtc > DateTime.MinValue) is TRUE (just set by EnsureOrdersRequested!)
    • For each order, check if (orderInfo.LastSeenInPollUtc < m_currentPollCycleUtc)
      • All orders have LastSeenInPollUtc = FirstSeenUtc (set in constructor to DateTime.UtcNow)
      • But those were set BEFORE m_currentPollCycleUtc was set!
    • So ALL orders appear to have “disappeared” and get marked as needing completion query
    • The “Submitted” order gets its status changed to… wait, let me check MarkAsNeedsCompletionQuery

Read OrderInfo.cs, lines 180 to 220

Read OrderInfo.cs, lines 215 to 245

Now I understand the race condition! Here’s what happens:

  1. Orders are added via HandleOpenOrder. Their LastSeenInPollUtc is set to the time of creation.
  2. When GetOrAddOrderListTopic is called, it:
    • Seeds the list with BuildOrderListCsv() (returns “12345” correctly)
    • Then calls EnsureOrdersRequested() which sets m_currentPollCycleUtc = DateTime.UtcNow
  3. When HandleOpenOrderEnd is called:
    • It sees m_currentPollCycleUtc > DateTime.MinValue (true!)
    • For the “Submitted” order (permId 12345), it checks orderInfo.LastSeenInPollUtc < m_currentPollCycleUtc
    • Due to timing, LastSeenInPollUtc (set during HandleOpenOrder) may be BEFORE m_currentPollCycleUtc (set during EnsureOrdersRequested)
    • So the order appears to have “disappeared” and MarkAsNeedsCompletionQuery is called
    • Since Remaining == 0 (not set in test), it sets NeedsCompletionQuery = true
    • Wait, actually the order has Remaining > 0 (100 shares, 0 filled), so it just sets NeedsCompletionQuery = true without changing status

Hmm, that doesn’t change the status. But the test is checking if “12345” appears in the list after HandleOpenOrderEnd, and the assertion is failing with an empty string!

Let me check what happens in PublishOrderListsIfAny():

Read OrderManager.cs, lines 865 to 895

Let me think about this more carefully. The test:

  1. Seeds the list topic with BuildOrderListCsv() which returns “12345”
  2. Calls HandleOpenOrderEnd() which calls PublishOrderListsIfAny()
  3. PublishOrderListsIfAny() calls BuildOrderListCsv() again – this should still return “12345”

Unless… there’s something odd happening with the order processing. Let me look at the test constructor call again:

var order1 = new Order { OrderId = 12345, PermId = 12345, Account = "TEST", Action = "BUY", TotalQuantity = 100 };
orderManager.HandleOpenOrder(12345, contract, order1, new OrderState { Status = "Submitted" });

Wait – let me check if there’s an issue with empty strings in the initial seeding. The test reads listTopic.TopicValue?.ToString() ?? "" AFTER calling HandleOpenOrderEnd().

Actually, I think I found the issue! Let me re-read the GetOrAddOrderListTopic flow:

var entry = m_orderListTopics.GetOrAdd(key, _ =>
{
    wasAdded = true;
    return new OrderListTopic(descriptor, topicId);
});

created = wasAdded;

if (!wasAdded)
{
    entry.Topic.AddTopicId(topicId);
}
else
{
    // Seed with current order list
    if (m_ordersSnapshotCompleted || !m_ordersRequested)
    {
        var csv = BuildOrderListCsv(descriptor);
        entry.Topic.SetTopicValue(csv);
        entry.LastValue = csv;
    }
    else
    {
        entry.Topic.SetTopicValue(string.Empty);
        entry.LastValue = string.Empty;
    }
    
    EnsureOrdersRequested();
}

If m_ordersRequested has been set to true by some other code path before this call, then the condition !m_ordersRequested is FALSE, and m_ordersSnapshotCompleted is also FALSE, so we go to the else branch and seed with empty string!

But when would m_ordersRequested be set before the test calls GetOrAddOrderListTopic? Let me check if there’s something in the test environment or if another test could affect this…

Actually, this is a state leakage issue between tests! If a previous test left m_ordersRequested = true somehow… but wait, each test creates a NEW OrderManager instance, so that’s not it.

Let me look for something else. Is there perhaps a Configuration issue? Let me check what happens with ShowFilledOrdersConfig:

Read OrderManager.cs, lines 60 to 90

The test uses showFilledOrdersOverride: false so that’s not the issue.

Let me check IsTerminalStatus:

Searched for text private static bool IsTerminalStatus, 1 result. Read OrderManager.cs, lines 855 to 865

OK, “Submitted” is NOT a terminal status. Let me think about this…

Actually, I wonder if the flakiness is related to test parallelism. Let me check if there’s anything that could be shared:

Wait – I just realized something! The test reads listTopic.TopicValue which is set by SetTopicValue. But the test doesn’t call anything that would UPDATE the topic value after the initial seeding.

Actually no – HandleOpenOrderEnd calls PublishOrderListsIfAny() which should update it.

Let me check if GetOrAddOrderListTopic returns the wrong topic for some reason. Actually, let me create a more detailed trace of what should happen:

  1. new OrderManager() – fresh instance
  2. HandleOpenOrder(12345, contract, order1, new OrderState { Status = "Submitted" }) – order1 added
  3. HandleOpenOrder(12346, contract, order2, new OrderState { Status = "Filled" }) – order2 added
  4. HandleOpenOrder(12347, contract, order3, new OrderState { Status = "Cancelled" }) – order3 added
  5. GetOrAddOrderListTopic(...) – creates new entry, seeds with BuildOrderListCsv() → “12345”
  6. HandleOpenOrderEnd() – calls PublishOrderListsIfAny() which calls BuildOrderListCsv() → should still be “12345”

The issue must be that BuildOrderListCsv() is returning empty string at some point. Let me look at BuildOrderListCsv again:

foreach (var kvp in m_orders)
{
    var permId = kvp.Key;
    var orderInfo = kvp.Value;
    
    // Skip terminal orders unless ShowFilledOrders is enabled
    if (!ShowFilledOrders && IsTerminalStatus(orderInfo.Status))
    {
        continue;
    }
    ...
    permIds.Add(permId);
}

So the loop should add permId 12345 (Submitted = not terminal). Unless… wait, I wonder if there’s an issue with Configuration.GetCurrent().

When running tests in parallel, if one test uses Configuration.Override() but doesn’t properly dispose it, it could affect another test. But the override is [ThreadStatic], so it should be per-thread…

Actually, looking at the test again, it uses showFilledOrdersOverride: false which is a constructor parameter, NOT a Configuration override. So it should work correctly.

Let me check if there’s something in HandleOpenOrderEnd that might reset the orders:

Read OrderManager.cs, lines 415 to 495

I see – HandleOpenOrderEnd calls MarkAsNeedsCompletionQuery() on orders that appear to have disappeared. But that doesn’t change their Status unless Remaining==0.

Actually wait – I just realized the issue! Look at lines 440-441:

if (orderInfo.LastSeenInPollUtc < m_currentPollCycleUtc)

The orders have LastSeenInPollUtc set to their FirstSeenUtc (in the OrderInfo constructor). But m_currentPollCycleUtc is set by EnsureOrdersRequested() which is called AFTER the orders are created.

Due to timing, LastSeenInPollUtc (time when HandleOpenOrder was called) could be BEFORE m_currentPollCycleUtc (time when EnsureOrdersRequested was called, milliseconds later).

But this still doesn’t explain the empty string! MarkAsNeedsCompletionQuery sets NeedsCompletionQuery = true but doesn’t change the Status unless Remaining == 0.

Let me look at what the test actually sets:

var order1 = new Order { OrderId = 12345, PermId = 12345, Account = "TEST", Action = "BUY", TotalQuantity = 100 };
orderManager.HandleOpenOrder(12345, contract, order1, new OrderState { Status = "Submitted" });

The Order has TotalQuantity = 100 but no explicit Filled or Remaining values. Let me check what the OrderInfo constructor does with these:

Looking at OrderInfo constructor:

public OrderInfo(int orderId, Contract contract, Order order, OrderState orderState)
{
    OrderId = orderId;
    Contract = contract;
    Order = order;
    OrderState = orderState;
    Status = orderState?.Status ?? string.Empty;
    PermId = order?.PermId ?? 0;
    FirstSeenUtc = DateTime.UtcNow;
    LastUpdateUtc = FirstSeenUtc;
    LastSeenInPollUtc = FirstSeenUtc;
}

It doesn’t set Filled or Remaining – those default to 0. So in MarkAsNeedsCompletionQuery:

if (Remaining == 0)
{
    Status = "Filled";

This changes the Status from “Submitted” to “Filled”!!! That’s the bug!

When the timing is such that EnsureOrdersRequested sets m_currentPollCycleUtc to a time AFTER the orders were created, the orders appear to have “disappeared” and get marked. Since Remaining == 0 (default), the Status is changed to “Filled”.

Then BuildOrderListCsv skips it because ShowFilledOrders = false and IsTerminalStatus("Filled") = true.

This is a race condition in the test setup. The fix is to ensure orders are marked as “seen” in the poll cycle BEFORE HandleOpenOrderEnd is called.

One thought on “Coding Agents Grow Up

  1. Fascinating to me, a lay person. Have you watched “The Thinking Game” on YouTube? It’s a wonderful documentary about the development of DeepMind and its founder.

Leave a Reply

Your email address will not be published. Required fields are marked *