Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions blog/.astro/types.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,20 @@ declare module 'astro:content' {
collection: "blog",
data: InferEntrySchema<"blog">
},
"2023-02-24-robust-risk-reduction-2.md": {
id: "2023-02-24-robust-risk-reduction-2.md",
slug: "2023-02-24-robust-risk-reduction-2",
body: string,
collection: "blog",
data: InferEntrySchema<"blog">
},
"2023-02-26-robust-risk-reduction-3.md": {
id: "2023-02-26-robust-risk-reduction-3.md",
slug: "2023-02-26-robust-risk-reduction-3",
body: string,
collection: "blog",
data: InferEntrySchema<"blog">
},
"first-post.md": {
id: "first-post.md",
slug: "first-post",
Expand Down
4 changes: 3 additions & 1 deletion blog/src/content/blog/2023-02-21-robust-risk-reduction-1.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
---
title: "Robust Risk Reduction - Part 1/-"
title: "Robust Risk Reduction - Part 1/3"
description: "Define the problem we're trying to solve"
pubDate: "Feb 21 2023"
---

This is post 1 in a series on mitigating tech debt.

# Why

A common issue that many engineering teams face is tension between the priority of tech debt and the priority of
Expand Down
85 changes: 85 additions & 0 deletions blog/src/content/blog/2023-02-24-robust-risk-reduction-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: "Robust Risk Reduction - Part 2/3"
description: "What are some possible strategies?"
pubDate: "Feb 24 2023"
---

This is part 2 in a series of posts on mitigating tech debt, starting [here](/blog/2023-02-21-robust-risk-reduction-1).

# What are some possible strategies to mitigate tech debt?

Let's assume there exists a lot of accumulated tech debt in your system. It has these characteristics:

- There is general consensus on the root of the problem. (If not, do the work to socialize the problem)
- Proposed solutions are too big, risky, and intrusive for one person to just do. (It can still happen with
ample tests and type-systems, believe it or not)
- It's actively hurting the product pipeline, and you can see how much better it would be if the problem was alleviated. (In general,
engineers play up the benefits, and downplay the costs, but knowing might not be enough to help you)

What are some approaches to tackle it?
What can go wrong with them?

## The Traditional Project Plan (top-down)

In this scenario, someone writes a document in your normal processes. It might be an RFC, an ADR, or a _Formal Technical Specification_
providing _Context_ for a _Decision_ by _Key Stakeholders_.

There are some challenges with this approach.

### It's likely the decision will be _no_

We already said the work was risky, large, and different from normal work your organization plans and executes. All these issues
contribute to the likelihood of a reader saying "WTF am I looking at". If you go through the trouble, and the effort is dead
on arrival once it hits a decision stage, you wasted a good bit of time, are likely a bit demoralized and will try less
hard to do something similar going forward.

Like the honey badger, the tech debt doesn't care. It still grows. Fixing the problem later will likely be harder, not easier.

### If the decision is yes,

it's often because not all the risks have been discovered or understood. Discovering them during implementation adds project scope. After
a few weeks or months, you might be wondering if the "Sunk Cost Fallacy" applies to you. Continuing or turning back are both
_not great_ outcomes.

If you continue, you might eventually alleviate the tech debt, but it will be harder to use the normal process next time.

### The existing system still exists

Until you can replace existing functionality, all inbound work has extra conversations about whether it should be in the new
system (more right) or the old system (expedience). These conversations will slow down very important things. If the decision is
made to add to the old system, your new system plan from 3 months ago will not have taken the new functionality into account and
will increase scope and timeline as a result.

## Forgiveness Instead of Permission (bottom-up)

You think you've got this. You might have one or two engineers onboard with a specific direction, but you know the standard
process will never prioritize this kind of work. Why not skip it this time?

You do a quick win proof of concept and hit all the low-hanging fruit. You get your one or two PR approvals and just YOLO-merge
<look at me, I'm the captain now meme>. Over time it grows organically into _the new way to do things_, which is _clearly superior_.
Even when it's looking good (often it doesn't, and you have to throw it out), you might be subject to a few issues.

### Why are you working on this

Occasionally, a piece of work you didn't anticipate might get big enough to warrant its own ticket or some other actual discussion. An
event like that can make people aware of work you intended to keep under wraps and possibly cause some alarm, or at least some intrusive
questions. At that time, you are likely not prepared to have the conversation because you won't know when it will come.

Don't surprise your manager.

### Feature creep

You are going to be tempted or asked to add more functionality to the new code over time, often before it completely replaces the old
code. Often those features will not quite fit with your initial vision. Eventually, you will have two old things to maintain
instead of one.

### Local minimum

Over time, you will realize the shortcomings of your approach, and maybe you will think of a better one. You might feel that
if you had treated the problem with the upfront respect it was due, you could have skipped some steps and landed on the later
solution more directly. On the other hand, your new approach probably provided value along the way even in an incomplete state. It's not
always so clear.

## Is there another approach?

In part 3, I'll attempt to propose one.
138 changes: 138 additions & 0 deletions blog/src/content/blog/2023-02-26-robust-risk-reduction-3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: "Robust Risk Reduction - Part 3/3"
description: "What's a better strategy?"
pubDate: "Feb 26 2023"
---

This is part 3 in a series of posts on mitigating tech debt, starting [here](/blog/2023-02-21-robust-risk-reduction-1).

# What's a better strategy?

We've seen in the previous post how tech debt by its nature might confound efforts to alleviate it. We struggled to come up with
a viable way to address it that creates value without a lot of extra risk.

Let's work through together what a good strategy might look like.

## What are some characteristics of any good strategy to alleviate tech debt?

### It's dynamic to changing circumstances

This series of posts includes the word 'robust' in the title for this reason. A robust mechanism is something with a bit of flexibility.
Think metal not porcelain. It bends, and it doesn't break, or at least it breaks much, much later than something brittle would. There
should be no circumstance that will cause you to fully abandon the strategy, and it should stay relevant as variables (people, requirements)
change.

### It's self-reinforcing

A good, sustainable, strategy to alleviate tech debt shouldn't require a lot of extra effort to get going or keep alive. Said another way,
a self-reinforcing strategy will _cause_ developers and stakeholders to do the 'right thing' with minimal touchpoints, minimal confusion,
and maximal clarity, and provide positive feedback to make it even more likely to choose the right path next time. Once the strategy is
set in motion, a successful outcome should be a foregone conclusion, although you might not know the exact timeline.

A strategy becomes sustainable when its incentives are aligned with the developers in the short-term as well as the long term, or the
feedback can be automated. A relatable example for engineers is how PR comments change before and after the introduction of linters
and formatters. After the initial discussions and work to get the codebase to conform, further iteration is automatic and low effort.
The tools greatly reduce the amount of style nitpicking in code reviews.

Pretty soon after the initial work, all code gets landed quicker and you realize the benefits of the automation. This automation is
valuable for small teams and large ones.

### It's easy to understand

In order to reduce risk, it's critical to get everyone on the same page. It's likely that solving your architectural problem doesn't require
you to create new fundamental concepts that are challenging to communicate, at least not until you've exhausted the set of well-known solutions. Try
really hard to avoid the temptation to push the frontier too early.

## What's an example strategy?

Let me start this section by stating the obvious: I don't know the answer, and I'm writing this to attempt to work through my past
experiences and possibly find a strategy that will work better next time. That said, I think there's value in showing my work. I can
offer a few tips and guidelines to the next person and my future self.

### Let's start from first principles

How did we get in this situation? Fundamentally, tech debt is the result of a living, breathing system including your current codebase, its
dependencies, your company's requirements, and your changing team's skills and knowledge. A systems approach is suitable to analyze it and
make recommendations. Although I am by no means an expert in systems theory, I think I can apply some basic ideas to the problem.

#### Create a model of the system

A picture is worth a thousand words. It would be a useful exercise to model your dev process and artifacts with simple squares and arrows.

We want to identify the inputs and outputs and general interactions between different parts. Once you have the relationships mapped out,
you can have conversations about them. At a fundamental level, tech debt accumulates when 'the wrong thing' grows faster than it is
addressed. Your systems diagram can be used to frame a conversation about key indicators. Once you have the list of what contributes
to the growth of some specific tech debt, there are some techniques we can apply to fix it at different levels.

#### Collect key metrics

You might have a hunch and intuition about the nature of your tech debt. What do you need to validate it and add support? What would disprove
it or reduce support? It's easy to find evidence to support your early conclusion due to confirmation bias, but it's worth fighting against it.
It's common in statistical analysis to set out to "prove the null hypothesis," which is to invert your hunch and focus on all the things that
might break your assumptions. If it turns out that you have disproved the null hypothesis, you've gathered a mountain of evidence to support the initial idea.

Either way, it's likely the information you need to communicate is not totally clear and organized, and you might need to do some early work. It
should be relatively uncontroversial to take some extra time during your work day to add automated metrics or do a one-off analysis. Some
examples of useful information I have collected in the past include:

- What projects are changed in the same PR? (scripts over git)
- What kinds of tickets have the most PRs or largest diffs? (scripts over git)
- What are the worst-case and common case CI times?

#### Consider some common kinds of general solutions

I can offer some general patterns.

- Add friction to processes that accumulate tech debt
- Reduce friction where you want an alternative to win
- Create or extend a data model to reduce special-cases. You know you are doing this right if end up deleting much more code
than you add.

#### Consider alternatives and risks

You'll want to anticipate people's questions before having a larger conversation. The solution might involve work at different levels.

If the solution is at the level of code or automation, you'll have to put a lot of consideration into a migration plan and how to enforce
common conventions across code repositories and teams. Not everyone will welcome extra change or work on their side, and you likely don't
want to do it all yourself. The changes might not have as much value in different sets of circumstances from your own, which might mean
they never get consistently implemented.

If the solution could involve team processes, it's often going to require a lot of discipline and agreement to add new steps, and process
changes might have unintended consequences. For example, if you add more conversation to incoming tickets, planning meetings will grow and
you are likely to lose capacity in the number of tickets your meeting can handle. An example of a process change I've tried in the past
is to add a 'tax' on velocity in order to have developers work on at least one unfamiliar part of the system. Over time, it would spread
knowledge and reduce bus factor, but it took some extra discipline. A key question to ask about a process change is 'when' or 'how often'
it applies, and what kinds of scenarios it would miss.

## How do you get there from here?

There are some activities that should be done incrementally, but be careful not to fall into full incrementalism.

### Keep notes

The knowledge-gathering part at the front end of any kind of tech debt or risk reduction is critical, and it can happen over time at low
cost. I encourage you to take plentiful notes and organize them in a kind of second-brain system. It doesn't have to be super high-tech.
Early on in my career I used 'one big text file' to track everything, but more recently I have started using Notion. What matters is
that it's easy to capture input, easy to search what you're looking for, and that it has a place to put more far-out ideas so you can
start to notice common pain points and patterns.

### Lower the activation energy for larger efforts

In order to reduce the risk from a larger effort, you can chip away at debt over time as part of your normal work. for me, what this looks like
is constant refactoring at PR-time. Each individual PR might be a bit bigger until sections of code settle in to a new pattern. Over time, quality
improves. When working like this, it's key to have some agreement so people aren't working in different directions. You can even have
_an actual conversation_ before falling into the 'local minimum' or 'feature creep' traps. Hopefully you like your coworkers.

To avoid the 'why are you working on this' trap, it helps to have some kind of past artifact or set of agreed-upon guidelines to reference.

Because these changes are applied incrementally as part of your normal work, areas of the code that are changed more often will become better, faster.
The best predictor of the future is the past, and it's likely some specific kinds of work will happen much more often than others. You can take
advantage of this tendency to understand the impact of a change in terms of when and how often it will apply.

Once there is enough clarity to be able to define and finish a larger piece of work (but not too large), write it up in your normal process, then get
it done in one shot.

## Good luck!

Seriously, it's hard out there, and I hope this helps. I want to live in a world where it's safe to assume we all want to build better. If these posts
help, or you know of a better way, let me know!
2 changes: 1 addition & 1 deletion blog/src/content/blog/first-post.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "First post"
description: "Lorem ipsum dolor sit amet"
pubDate: "Feb 21 2023"
pubDate: "Feb 20 2023"
---

## Hello World!
Expand Down