Monday, December 3, 2018

Safeguarding: A step-by-step guide

By: Llewellyn Falco, Josh Widzer, Jay Bazuzi

TL;DR:
Bugs happen. While you could simply fix them, you could instead take an extra step to prevent similar mistakes from occurring again. This 25-minute process will do that.

We will go in to the philosophies and reasons to do this in other articles.
1. When to do it
Do safeguarding right after you’ve fixed a bug. The same day or next is good. This is when it’s fresh in your mind and when improving the system still feels relevant.

The key roles that need to be in the room are the people who:
  1. understand what happened and why
  2. wrote the bug
  3. detected the bug
  4. fixed the bug
  5. project manager (someone who can approve the time required to work on the fixes)
  6. might resist the proposed remediation

2. Root Cause Analysis (RCA)
We are going to gather impartial observations about what happened.

Create a Google Doc that everyone can access, with with the following tree:
  • What caused us to write the bug?
  • Why didn’t it get caught sooner?
  • What made it hard to fix?

Everybody is going to start adding nodes under these three headings. They will also add questions in response to the nodes. All of this is done at the same time by all attendees without any talking.

This section is timeboxed to 10 minutes.

3. Vote
We will do a version of dot voting. Everyone will vote on as many items as they want but no more than once per item. Voting is done by putting your initials at the front of any item.

Example:
  • What caused us to write the bug?
    • [JW, LF]  requirements were unclear
    • [LF] Name of a function lies about what it does.
  • Why didn’t it get caught right away?
    • No automated Tests
    • [JW] Hard to write automated tests for this section of the code
  • What caused debugging time / cost?
    • Logs were too verbose, so we didn’t see what was going wrong.
    • [JHB, JW] Hard to redeploy site

After voting, copy the top 3-4 items into a section labeled: Remedations
This section is timeboxed to 3 minutes.

4. Budget
Before we come up with solutions, start by asking: was the total impact of this bug small, medium, or large? Have everyone hold up 1, 2, or 3 fingers. Pick the most common answer. Then propose an initial time box:
  • Small = 1/2 person-day
  • Medium = 2 person-days
  • Large = person-sprint.
Since we execute solutions immediately, this is the point where we need the approval of the project manager.
Because we intend to do 3 solutions, each solution can only be ¼ of the total budget (to allow some slack). This means that each item will be budgeted to:
  • Small = 1 hour
  • Medium = 1/2 day
  • Large = 2.5 days

This section is timeboxed to 2 minutes.
5. Identify Remediations
Next we are going to brainstorm improvements to our system to reduce the chances of this issue happening again. Going back to the Google Doc, everyone will silently add ideas to the second section that we copied from the top brainstorming section.
Solutions that require extra discipline are bad solutions. We are looking for ways to make success easier.
Remember to keep in mind these are timeboxed solutions meant to improve our system and environment as opposed to solve everything. Often small improvements yield big returns and if the problem still persists, we will get another chance to do a safeguarding in the future.

Example

  • Remediations
    • Requirements were unclear
      • Bring users into our grooming sessions
      • Earlier usability tests
      • Do earlier demos
    • Hard to redeploy site
      • Write down checklist of deployment steps
      • Automate build/test/upload sequence
      • Get a second server to deploy blue/green
After 7 minutes, everyone votes again just as we did in the last section.
This section is timeboxed to 10 minutes [7 brainstorming, 3 voting].
6. Add items to task board and do them!
Because safeguarding already has time approved work on them immediately.
Nothing we’ve done so far matters if we don’t implement any of the solutions. Make sure that useful action comes from this exercise, immediately add them to the taskboard (budget has already been approved in step 4) and start working on them. Remember that these items are timeboxed and are not meant to completely prevent the problems in the future, but rather to lessen the chances.
Final Notes
It is important to remember that safeguarding is a skill. The first time you do it, be patient and give yourself extra time, maybe an hour. Also, remember to practice it regularly, usually once per week, as you will get better at doing the process and finding good remediations.

Special Thanks to Arlo Belshee for creating Safeguarding.

More at:




1 comment:

Nasir Naseer said...

Structured Cabling Solutions
Do you have cable management issues in your data center?
Do you have trouble tracing cables from one piece of equipment to another?
Do you have unused fiber cables running underfloor and/or overhead
that are congesting pathways, blocking airflow and making a huge mess?
https://www.az-itsolution.com/structured-cabling-solutions