When to refactor code
As a die hard refactorer, but also pragmatic programmer, I often have a tough time articulating to other developers when a refactor is important and when it is gratuitous. I can imagine many people look at decisions I've made about when it is and isn't appropriate and think it's simply a whim or "when I feel like it". To clarify this for both myself and any future victims/co workers involved with refactoring decisions I may make, I submit this 10 item checklist.
- Is the cyclomatic complexity of the function below 5?
- Did you completely understand what cyclomatic complexity was without following that link?
- Do you have an automated test or documented test case for every path through the function?
- Do all of the existing test cases pass?
- Can you explain the function and all it's edge cases to me in less than 1 minute?
- If not, how about 5 minutes?
- 10 minutes?
- Are there fewer than 100 lines of code in the function (including comments)?
- Can you find two other developers who agree this code is error free just by visual inspection?
- Is this function used in only one place?
- Does the function meet it's performance goals (Time/Memory/CPU)?
Scoring
Add up the "no" answers above:- 0-1 - Why would you even think about refactoring this? You probably just want to rename variables because you don't like the previous developer's naming convention
- 2-5 - This might need a little tweaking, but I wouldn't hold up a production release for something in this range
- 6-8 - OK, we probably need to fix this... It's likely we're going to keep revisiting this and/or we don't actually know what it's doing. Still on the fence, but it's highly suspect
- 9+ - This is a prime candidate for refactoring. (Note, writing test cases is a form of refactoring)
While I know this is a rough guideline, I think it hits on the key factors that are important about the overall quality of source code and helps avoid overspending effort fixing things that aren't necessarily broken.
Comments
Don't mind the usual negative assholes.
I also found it much easier to grasp the significance of the questions by inverting them, so that you're counting "yes" answers instead of "no" answers - which makes the list read more like:
1) Is the cyclomatic complexity of the function above 5?
2) Are there missing tests for execution paths through the function?
3) Are existing test cases failing or broken?
4) Would explaining this function to me take more than a minute?
5) Five minutes?
6) Ten minutes?
7) Does the function - including comments - include more than 100 lines of code?
8) Are your developers still worried about bugs even after visually inspecting the code?
9) Is this function used in more than one place?
10) Do you have a verified performance problem (time/memory/CPU) with this code?
In that case the question is backwards.