The problem isn't the tools. It's what's under them.

Developed by engineering leaders. Run with any AI assistant in 25 minutes.

Works with:ClaudeChatGPTGeminiAny LLM

assessment-session

Assessment

How are automated tests managed on your team? What does your test suite cover, and how much do you trust it?

We have some Jest tests on the backend — maybe 15 or so — but mostly we test by running through things on our phones before a deploy.

Got it — so manual phone testing is the primary verification before production. Those 15 Jest tests — do they cover critical flows like authentication or payment processing, or are they more utility/helper tests?

Mostly the payment flow. The auth stuff isn’t really tested.

That’s helpful context. So your payment flow has coverage but authentication — which is security-critical — has none. When a

31 questions · ~25 minutes · scored against a calibrated rubric

Stay in the loop.

Assessment updates, new research, and what we're learning in the field.
Written by humans. Sent when it's worth your time.

// 01 — HOW_IT_WORKS

Three steps. One prompt. One clear picture.

(01)

Start the conversation

Paste the assessment prompt into your AI of choice. It walks you through 31 structured questions about your engineering practices, following up, pressing for specifics, and surfacing gaps you might have dismissed.

(02)

Get your report

Your responses are scored against our rubric and generate a detailed markdown report including red flags, critical minimums, and a detailed breakdown of each section.

(03)

See where you stand

Your scored report surfaces the gaps, including the ones your team didn’t know to look for. Share it, sit with it, or bring us in to help you act on it.

// 02 — THE_METHODOLOGY

A structured lens on what actually matters for Agentic Software Development.

4 categories · 13 areas · 31 assessment points

Artifacts

The code, docs, and tests your team produces

StandardsDocumentationRequirementsTestingArchitecture

Process

How work gets reviewed, secured, and measured

ReviewsTech DebtSecurityMetrics

Tooling

The pipeline and environment developers work in

CI/CDIDE

Culture

How your team owns decisions and shares knowledge

OwnershipDecisions

Each area is calibrated against our observations and Agentic Software Development best practices. The scoring rubric distinguishes between gaps you know about and blind spots you don’t — because “we’re weak here” and “we don’t even know” require different responses.

// 05 — WHAT_COMES_NEXT

Go as deep as you need.

Self-serve assessment

Free

Run the prompt with your team. Get a scored report with prioritized recommendations. No setup required. Works with Claude, ChatGPT, Gemini, or any major LLM. Your data stays in your conversation.

Start assessment_

Guided session

Recommended

Paid

We facilitate the assessment with your team and walk you through a prioritized roadmap. Best when your report surfaces uncomfortable findings and you're not sure where to start, or when you need someone to make the case to leadership.

Book a session_

Transformation

Enterprise

Full AI adoption strategy. Implementation support. Ongoing advisory. For teams that want hands-on guidance from assessment through rollout — including custom rubric calibration, team workshops, and quarterly progress reviews.

Let's talk_

// 06 — QUESTIONS

Frequently asked

Which LLMs does this work with?

The prompt works with any AI model that supports long system prompts and sustained multi-turn conversation. It’s been tested with Claude (Opus or Sonnet), ChatGPT (GPT-5.4 Thinking or GPT-5.4 Pro), and Gemini (3.1 Pro). Models with reasoning or thinking modes tend to produce better probing and more nuanced scoring. Results may vary with smaller models or modes optimized for speed over depth — ChatGPT’s Instant mode, for example, tends to rush through probing.

How long does the assessment take?

About 25 minutes for one person. For richer data, have 2–3 team members run it independently and synthesize results. This assessment reveal alignment gaps.

Is my assessment data shared with anyone?

No. Everything stays in your conversation. We never see your answers, your scores, or your report, unless you share it with us.

How accurate is the scoring?

Your answers are calibrated against our rubric of Software Engineering best practices. The LLM scores it consistently, but the guided session option exists for teams that want expert interpretation.

What if different team members get different scores?

That's a feature, not a bug. Different people see different realities. Those gaps are just as valuable as the scores themselves.

Find out where your team stands.

Free. 25 minutes. Any AI assistant.

Begin your assessment_Or book a guided session_