gtrak · gtrak · Feb 24, 2023 · Feb 26, 2023 · Feb 26, 2023 · Feb 26, 2023
diff --git a/blog/.astro/types.d.ts b/blog/.astro/types.d.ts
@@ -74,6 +74,20 @@ declare module 'astro:content' {
   collection: "blog",
   data: InferEntrySchema<"blog">
 },
+"2023-02-24-robust-risk-reduction-2.md": {
+  id: "2023-02-24-robust-risk-reduction-2.md",
+  slug: "2023-02-24-robust-risk-reduction-2",
+  body: string,
+  collection: "blog",
+  data: InferEntrySchema<"blog">
+},
+"2023-02-26-robust-risk-reduction-3.md": {
+  id: "2023-02-26-robust-risk-reduction-3.md",
+  slug: "2023-02-26-robust-risk-reduction-3",
+  body: string,
+  collection: "blog",
+  data: InferEntrySchema<"blog">
+},
 "first-post.md": {
   id: "first-post.md",
   slug: "first-post",

diff --git a/blog/src/content/blog/2023-02-21-robust-risk-reduction-1.md b/blog/src/content/blog/2023-02-21-robust-risk-reduction-1.md
@@ -1,9 +1,11 @@
 ---
-title: "Robust Risk Reduction - Part 1/-"
+title: "Robust Risk Reduction - Part 1/3"
 description: "Define the problem we're trying to solve"
 pubDate: "Feb 21 2023"
 ---
 
+This is post 1 in a series on mitigating tech debt.
+
 # Why
 
 A common issue that many engineering teams face is tension between the priority of tech debt and the priority of

diff --git a/blog/src/content/blog/2023-02-24-robust-risk-reduction-2.md b/blog/src/content/blog/2023-02-24-robust-risk-reduction-2.md
@@ -0,0 +1,85 @@
+---
+title: "Robust Risk Reduction - Part 2/3"
+description: "What are some possible strategies?"
+pubDate: "Feb 24 2023"
+---
+
+This is part 2 in a series of posts on mitigating tech debt, starting [here](/blog/2023-02-21-robust-risk-reduction-1).
+
+# What are some possible strategies to mitigate tech debt?
+
+Let's assume there exists a lot of accumulated tech debt in your system. It has these characteristics:
+
+- There is general consensus on the root of the problem. (If not, do the work to socialize the problem)
+- Proposed solutions are too big, risky, and intrusive for one person to just do. (It can still happen with
+  ample tests and type-systems, believe it or not)
+- It's actively hurting the product pipeline, and you can see how much better it would be if the problem was alleviated. (In general,
+  engineers play up the benefits, and downplay the costs, but knowing might not be enough to help you)
+
+What are some approaches to tackle it?
+What can go wrong with them?
+
+## The Traditional Project Plan (top-down)
+
+In this scenario, someone writes a document in your normal processes. It might be an RFC, an ADR, or a _Formal Technical Specification_
+providing _Context_ for a _Decision_ by _Key Stakeholders_.
+
+There are some challenges with this approach.
+
+### It's likely the decision will be _no_
+
+We already said the work was risky, large, and different from normal work your organization plans and executes. All these issues
+contribute to the likelihood of a reader saying "WTF am I looking at". If you go through the trouble, and the effort is dead
+on arrival once it hits a decision stage, you wasted a good bit of time, are likely a bit demoralized and will try less
+hard to do something similar going forward.
+
+Like the honey badger, the tech debt doesn't care. It still grows. Fixing the problem later will likely be harder, not easier.
+
+### If the decision is yes,
+
+it's often because not all the risks have been discovered or understood. Discovering them during implementation adds project scope. After
+a few weeks or months, you might be wondering if the "Sunk Cost Fallacy" applies to you. Continuing or turning back are both
+_not great_ outcomes.
+
+If you continue, you might eventually alleviate the tech debt, but it will be harder to use the normal process next time.
+
+### The existing system still exists
+
+Until you can replace existing functionality, all inbound work has extra conversations about whether it should be in the new
+system (more right) or the old system (expedience). These conversations will slow down very important things. If the decision is
+made to add to the old system, your new system plan from 3 months ago will not have taken the new functionality into account and
+will increase scope and timeline as a result.
+
+## Forgiveness Instead of Permission (bottom-up)
+
+You think you've got this. You might have one or two engineers onboard with a specific direction, but you know the standard
+process will never prioritize this kind of work. Why not skip it this time?
+
+You do a quick win proof of concept and hit all the low-hanging fruit. You get your one or two PR approvals and just YOLO-merge
+<look at me, I'm the captain now meme>. Over time it grows organically into _the new way to do things_, which is _clearly superior_.
+Even when it's looking good (often it doesn't, and you have to throw it out), you might be subject to a few issues.
+
+### Why are you working on this
+
+Occasionally, a piece of work you didn't anticipate might get big enough to warrant its own ticket or some other actual discussion. An
+event like that can make people aware of work you intended to keep under wraps and possibly cause some alarm, or at least some intrusive
+questions. At that time, you are likely not prepared to have the conversation because you won't know when it will come.
+
+Don't surprise your manager.
+
+### Feature creep
+
+You are going to be tempted or asked to add more functionality to the new code over time, often before it completely replaces the old
+code. Often those features will not quite fit with your initial vision. Eventually, you will have two old things to maintain
+instead of one.
+
+### Local minimum
+
+Over time, you will realize the shortcomings of your approach, and maybe you will think of a better one. You might feel that
+if you had treated the problem with the upfront respect it was due, you could have skipped some steps and landed on the later
+solution more directly. On the other hand, your new approach probably provided value along the way even in an incomplete state. It's not
+always so clear.
+
+## Is there another approach?
+
+In part 3, I'll attempt to propose one.
diff --git a/blog/src/content/blog/2023-02-26-robust-risk-reduction-3.md b/blog/src/content/blog/2023-02-26-robust-risk-reduction-3.md
@@ -0,0 +1,138 @@
+---
+title: "Robust Risk Reduction - Part 3/3"
+description: "What's a better strategy?"
+pubDate: "Feb 26 2023"
+---
+
+This is part 3 in a series of posts on mitigating tech debt, starting [here](/blog/2023-02-21-robust-risk-reduction-1).
+
+# What's a better strategy?
+
+We've seen in the previous post how tech debt by its nature might confound efforts to alleviate it. We struggled to come up with
+a viable way to address it that creates value without a lot of extra risk.
+
+Let's work through together what a good strategy might look like.
+
+## What are some characteristics of any good strategy to alleviate tech debt?
+
+### It's dynamic to changing circumstances
+
+This series of posts includes the word 'robust' in the title for this reason. A robust mechanism is something with a bit of flexibility.
+Think metal not porcelain. It bends, and it doesn't break, or at least it breaks much, much later than something brittle would. There
+should be no circumstance that will cause you to fully abandon the strategy, and it should stay relevant as variables (people, requirements)
+change.
+
+### It's self-reinforcing
+
+A good, sustainable, strategy to alleviate tech debt shouldn't require a lot of extra effort to get going or keep alive. Said another way,
+a self-reinforcing strategy will _cause_ developers and stakeholders to do the 'right thing' with minimal touchpoints, minimal confusion,
+and maximal clarity, and provide positive feedback to make it even more likely to choose the right path next time. Once the strategy is
+set in motion, a successful outcome should be a foregone conclusion, although you might not know the exact timeline.
+
+A strategy becomes sustainable when its incentives are aligned with the developers in the short-term as well as the long term, or the
+feedback can be automated. A relatable example for engineers is how PR comments change before and after the introduction of linters
+and formatters. After the initial discussions and work to get the codebase to conform, further iteration is automatic and low effort.
+The tools greatly reduce the amount of style nitpicking in code reviews.
+
+Pretty soon after the initial work, all code gets landed quicker and you realize the benefits of the automation. This automation is
+valuable for small teams and large ones.
+
+### It's easy to understand
+
+In order to reduce risk, it's critical to get everyone on the same page. It's likely that solving your architectural problem doesn't require
+you to create new fundamental concepts that are challenging to communicate, at least not until you've exhausted the set of well-known solutions. Try
+really hard to avoid the temptation to push the frontier too early.
+
+## What's an example strategy?
+
+Let me start this section by stating the obvious: I don't know the answer, and I'm writing this to attempt to work through my past
+experiences and possibly find a strategy that will work better next time. That said, I think there's value in showing my work. I can
+offer a few tips and guidelines to the next person and my future self.
+
+### Let's start from first principles
+
+How did we get in this situation? Fundamentally, tech debt is the result of a living, breathing system including your current codebase, its
+dependencies, your company's requirements, and your changing team's skills and knowledge. A systems approach is suitable to analyze it and
+make recommendations. Although I am by no means an expert in systems theory, I think I can apply some basic ideas to the problem.
+
+#### Create a model of the system
+
+A picture is worth a thousand words. It would be a useful exercise to model your dev process and artifacts with simple squares and arrows.
+
+We want to identify the inputs and outputs and general interactions between different parts. Once you have the relationships mapped out,
+you can have conversations about them. At a fundamental level, tech debt accumulates when 'the wrong thing' grows faster than it is
+addressed. Your systems diagram can be used to frame a conversation about key indicators. Once you have the list of what contributes
+to the growth of some specific tech debt, there are some techniques we can apply to fix it at different levels.
+
+#### Collect key metrics
+
+You might have a hunch and intuition about the nature of your tech debt. What do you need to validate it and add support? What would disprove
+it or reduce support? It's easy to find evidence to support your early conclusion due to confirmation bias, but it's worth fighting against it.
+It's common in statistical analysis to set out to "prove the null hypothesis," which is to invert your hunch and focus on all the things that
+might break your assumptions. If it turns out that you have disproved the null hypothesis, you've gathered a mountain of evidence to support the initial idea.
+
+Either way, it's likely the information you need to communicate is not totally clear and organized, and you might need to do some early work. It
+should be relatively uncontroversial to take some extra time during your work day to add automated metrics or do a one-off analysis. Some
+examples of useful information I have collected in the past include:
+
+- What projects are changed in the same PR? (scripts over git)
+- What kinds of tickets have the most PRs or largest diffs? (scripts over git)
+- What are the worst-case and common case CI times?
+
+#### Consider some common kinds of general solutions
+
+I can offer some general patterns.
+
+- Add friction to processes that accumulate tech debt
+- Reduce friction where you want an alternative to win
+- Create or extend a data model to reduce special-cases. You know you are doing this right if end up deleting much more code
+  than you add.
+
+#### Consider alternatives and risks
+
+You'll want to anticipate people's questions before having a larger conversation. The solution might involve work at different levels.
+
+If the solution is at the level of code or automation, you'll have to put a lot of consideration into a migration plan and how to enforce
+common conventions across code repositories and teams. Not everyone will welcome extra change or work on their side, and you likely don't
+want to do it all yourself. The changes might not have as much value in different sets of circumstances from your own, which might mean
+they never get consistently implemented.
+
+If the solution could involve team processes, it's often going to require a lot of discipline and agreement to add new steps, and process
+changes might have unintended consequences. For example, if you add more conversation to incoming tickets, planning meetings will grow and
+you are likely to lose capacity in the number of tickets your meeting can handle. An example of a process change I've tried in the past
+is to add a 'tax' on velocity in order to have developers work on at least one unfamiliar part of the system. Over time, it would spread
+knowledge and reduce bus factor, but it took some extra discipline. A key question to ask about a process change is 'when' or 'how often'
+it applies, and what kinds of scenarios it would miss.
+
+## How do you get there from here?
+
+There are some activities that should be done incrementally, but be careful not to fall into full incrementalism.
+
+### Keep notes
+
+The knowledge-gathering part at the front end of any kind of tech debt or risk reduction is critical, and it can happen over time at low
+cost. I encourage you to take plentiful notes and organize them in a kind of second-brain system. It doesn't have to be super high-tech.
+Early on in my career I used 'one big text file' to track everything, but more recently I have started using Notion. What matters is
+that it's easy to capture input, easy to search what you're looking for, and that it has a place to put more far-out ideas so you can
+start to notice common pain points and patterns.
+
+### Lower the activation energy for larger efforts
+
+In order to reduce the risk from a larger effort, you can chip away at debt over time as part of your normal work. for me, what this looks like
+is constant refactoring at PR-time. Each individual PR might be a bit bigger until sections of code settle in to a new pattern. Over time, quality
+improves. When working like this, it's key to have some agreement so people aren't working in different directions. You can even have
+_an actual conversation_ before falling into the 'local minimum' or 'feature creep' traps. Hopefully you like your coworkers.
+
+To avoid the 'why are you working on this' trap, it helps to have some kind of past artifact or set of agreed-upon guidelines to reference.
+
+Because these changes are applied incrementally as part of your normal work, areas of the code that are changed more often will become better, faster.
+The best predictor of the future is the past, and it's likely some specific kinds of work will happen much more often than others. You can take
+advantage of this tendency to understand the impact of a change in terms of when and how often it will apply.
+
+Once there is enough clarity to be able to define and finish a larger piece of work (but not too large), write it up in your normal process, then get
+it done in one shot.
+
+## Good luck!
+
+Seriously, it's hard out there, and I hope this helps. I want to live in a world where it's safe to assume we all want to build better. If these posts
+help, or you know of a better way, let me know!
diff --git a/blog/src/content/blog/first-post.md b/blog/src/content/blog/first-post.md
@@ -1,7 +1,7 @@
 ---
 title: "First post"
 description: "Lorem ipsum dolor sit amet"
-pubDate: "Feb 21 2023"
+pubDate: "Feb 20 2023"
 ---
 
 ## Hello World!