#10 — What Changed in Our Daily AI Workflow - B.O.R.I.S

In a less-structured episode, the hosts of Agentic AI in DevOps compare day-to-day workflows: what they actually run, what they have stopped doing, and how their habits have shifted since the early autumn. Gonçalves keeps the classical engineering ritual — linting, tests, coverage targets — and bakes it into a skill so the AI cannot skip past quiet bugs. Samoylov has rewired the harness with hooks for audit logs, command-failure journals, and a daily "dead code" sweep, and tracks a new personal metric: average working hours without a human. Devyatkin argues that the second shift at 9 PM is the wrong time to drive an agent — the sharpest decisions belong to a fresh 5 AM mind — and shells Claude Code out to Codex to cross-review its own plans. The closing exchange wanders into whether AI is "the last thing humans will invent," and the hosts disagree: imagination, accidents, and champagne all live outside the training data.

Spotify Apple Podcasts RSS

Summary

In a less-structured episode, the hosts of Agentic AI in DevOps compare day-to-day workflows: what they actually run, what they have stopped doing, and how their habits have shifted since the early autumn. Gonçalves keeps the classical engineering ritual — linting, tests, coverage targets — and bakes it into a skill so the AI cannot skip past quiet bugs. Samoylov has rewired the harness with hooks for audit logs, command-failure journals, and a daily “dead code” sweep, and tracks a new personal metric: average working hours without a human. Devyatkin argues that the second shift at 9 PM is the wrong time to drive an agent — the sharpest decisions belong to a fresh 5 AM mind — and shells Claude Code out to Codex to cross-review its own plans. The closing exchange wanders into whether AI is “the last thing humans will invent,” and the hosts disagree: imagination, accidents, and champagne all live outside the training data.

Key Topics

Classical engineering practice as AI guardrails

Gonçalves’ headline change is that he checks AI-generated code less by hand — not because the output got better, but because he stopped trusting raw output and built a skill around it. The skill writes the code, runs the linter, fixes what comes back, writes the tests, runs them, and chases a coverage target (he names 98% as his personal bar). The point he keeps hammering: a lot of teams dropped these practices the moment AI started shipping code that looked plausible on screen. Plausible is not the same as correct; he describes early surprises where the agent moved on while leaving hidden bugs behind.

When a session starts spinning — going in circles, repeating itself — he treats that as a context problem rather than a model problem: stop, clean up, hand the same task to a fresh session.

Hooks as audit log and learning loop

Samoylov runs a set of hooks that turn the AI harness into something more accountable. The simplest is a post-command hook that captures every bash invocation: which agent, what command, when. The motivation is twofold — investigation evidence when an agent goes off the rails, and a way to disambiguate blame when multiple people and multiple agents share a repo. “Whose agent actually made the mistake” is a real question once teams scale this up.

The second hook fires on command failure. Failures land in a learning journal, and a daily routine folds the recurring ones back into the relevant skill. He cites the bash-on-macOS-vs-Linux drift as a typical case: the agent reaches for the GNU flag and bounces off BSD, and rather than re-explain that in every session, the skill gets patched once.

Daily routines: dead code, docs, dependencies

The same loop powers a morning routine that runs before he starts working: scan for dead code with language-specific tools (he notes AI tends to write functions that nothing calls), check library and SDK versions, refresh documentation, security-check the diff from the previous day. By the time he sits down, the project is already in a known state — he can see where things actually are rather than re-deriving it from a cold read.

Repository uplift as living documentation

Devyatkin frames a problem most platform engineers will recognise: a zoo of repositories, each with Dockerfiles, Terraform, Helm in slightly different shapes, and “how we do things” being a moving target with little written down. Before AI, the response was to walk through repos by hand and lift them to the current standard. Writing that standard up from scratch is the hard part — “you don’t really know from which end to hold the stick.”

His current approach: walk through the first repository the way he would have walked through it himself, but instruct the AI to run each step. Check Terraform module versions. Locate the private modules. Look at the Dockerfiles, are there standardised base images, where do they live. Then turn that walk into a skill, and at the end of every later run, ask the agent: did you learn anything new here, contribute it back. Because the skill lives on the file system rather than being pulled from a marketplace, edits are immediate.

Samoylov takes the slicing further — one big skill is too coarse. Dockerfiles get their own sub-skill, with checks for things like non-root user; Terraform gets another. Gonçalves agrees and adds the angle the team keeps returning to: this is also how you bring a junior engineer up to a senior baseline without forcing them to memorise tribal knowledge.

The personal reference library

Gonçalves describes a separate technique for personal style. His development skill points at a references/ folder full of snippets: this is how he structures Terraform, this is how he handles Docker, this is a piece of an old project that does the pattern he wants reused. The skill’s instructions are mostly references — “if you’re working with Docker, look at this file.” The effect is that AI-generated code stops feeling like a stranger’s project on his disk: he opens the file a year later and the variable naming and structure are still where he expects them.

Don’t forget to commit

A small but recurring failure mode: with AI moving fast, people stop committing. They build, build, build, then write one giant commit at the end. Worse, the agent edits the same file repeatedly and earlier good state is lost. Gonçalves bakes the commit step into the developer skill — once the work is done, commit, with the commit gated on human approval. By that point the linter and tests have already run, so the approval is a final eyeballing rather than a full review.

The hosts agree this explains a chunk of the “I burned a whole day on AI and then had to revert everything” complaints they hear from skeptical developers: the agent did work, but it was never checkpointed.

Second shift vs 5 AM

Devyatkin spends evenings — roughly 9 PM to midnight — on what he calls a second shift: kids in bed, no meetings, three calm hours to make something happen. AI lets him do more in those three hours, which feels like the right move. He argues it isn’t. Driving an agent demands rapid, high-quality decisions, and a tired mind processes options worse than a rested one.

The plan he is testing is to flip the schedule: go to bed earlier, wake at 5 AM, and use the two hours before the kids get up for decision-heavy work with AI — at the top of mental capacity, when batching attention (he namechecks Tim Ferriss’ batch-your-email idea) actually pays off. He admits it has not stuck yet because he has been ill for three weeks and is sleeping nine hours instead. Samoylov: “Sleep nine hours in this economy, it’s good.”

Grill Me and cross-model review

If the decisions need to come up front, the AI has to extract them. Devyatkin uses the “Grill Me” pattern — a skill that turns the agent into a hard interviewer, peppering you with questions until there are no implicit assumptions left. He recommends trying the community version before writing your own.

His variation: from inside Claude Code, shell out to Codex mid-interview. Codex spawns researchers to widen the option space, then is asked for an independent take on the plan. The point is to use a second harness the way you would use a peer reviewer — fresh context, different model, no shared blind spots. He concedes the process is exhausting, but the output needs less checking afterwards.

Samoylov adds a sibling pattern: an “argue” mode where the AI is told to disagree instead of agree. “If you try to push some of your ideas through argue mode, it’s not possible — you can’t win.” Useful for stress-testing a position; less useful if you wanted validation.

Average working hours without human

Samoylov has started tracking a personal metric: average working hours without a human in the loop. The trigger was noticing how often the agent was pinging him for approval — “who is actually controlling who?” He started at roughly one minute. He is now at about one and a half hours, mostly by front-loading the plan, preparing the environment, and adding the tests Gonçalves described. The stretch goal is a full eight-hour day. Devyatkin reminds him the dangerous-permissions flag exists; the room treats that as a separate, more careful conversation.

Backlog as markdown, engineer as agent manager

Gonçalves keeps a backlog as markdown inside the project, managed by a skill that behaves like a product owner. A late-night idea becomes a backlog item with status needs refinement. Refinement happens in the morning — a deeper skill goes through each item, maps dependencies, surfaces conflicts. Once items are refined, agents pick them up in parallel git worktrees, each acting like a teammate with a contract.

This pushes the day toward project-management work. Devyatkin’s wry framing: “We might have to rename the podcast — Agentic for Project Managers.” The role of the senior engineer is drifting from “writing the Terraform” toward managing a team of agents — fun, but not what most of them signed up for. The hosts note that engineers historically did not love project-management work; that bargain may need to change.

Gonçalves layers it further: a spec generator skill at the top, then specialised sub-skills per architectural layer (data, API, agent), so new projects can be scaffolded and existing ones mapped through the same vocabulary.

Will AI invent?

The episode drifts into a question Samoylov raises: someone called AI “the last thing humans will invent” — meaning the next invention comes from AI itself. Devyatkin disagrees. “We haven’t invented the cure for cancer yet, just saying.” He recalls a test where a model was trained only on data available before 1918 and then asked to invent things that came later; he can’t remember the exact result, but it wasn’t a breakthrough.

Their shared framing: today’s generative models extrapolate from what already exists. Imagination, accidents, and lived experience sit outside that distribution. Samoylov notes how many real inventions came from mistakes — champagne being the running example. AI is currently a new capability layered onto human research, not a replacement for the part of research that surprises everyone.

Resources

Claude Code — the agentic coding harness Gonçalves and the team build their skills, hooks, and routines around.
OpenAI Codex CLI — the second harness Devyatkin shells out to for parallel research and cross-review of plans generated inside Claude Code.
Cursor — the AI-first editor Gonçalves started with before switching to Claude Code.
Claude Code skills documentation — the skill primitive the hosts use to encode workflows, references, and per-language sub-skills.
Claude Code hooks reference — the mechanism behind Samoylov’s audit logs and command-failure learning journal.
Git worktrees — the Git feature that lets multiple agents work in parallel branches of the same repo without stepping on each other, as in Gonçalves’ refined-backlog workflow.
The 4-Hour Workweek by Tim Ferriss — the batching idea Devyatkin borrows when arguing AI work belongs in a focused morning block, not a tired evening one.
Agentic AI in DevOps — insights archive — earlier episodes on review patterns, context management, and what agents can actually do, useful background for the workflows discussed here.

Join B.O.R.I.S Slack Playground

#10 — What Changed in Our Daily AI Workflow