A few days ago a colleague asked me the question of "when I consider that new code/logic should go behind a feature flag". I gave a short answer but truth is that using feature flags, or feature toggles is very common these days and taken for granted you know how to use them, but it is not always explained when to use them.
What follows is my humble opinion and experience, feel free to ignore it.
The single, most important fact that is usually forgotten and perverted is that feature flags are meant to be temporal, they should always be removed once fully rolled out (or replaced by a system switch, more on this later).
When to use Feature Flags
- New features. It is a fact that online complex systems will always have bugs, so having a flip switch to quickly "rollback" your new feature is a must. Also most non/trivial feature flag systems allow for gradual roll-outs, canary releases, role-based, entity-id based and other conditions.
- Experimental features. To me, the same scenario as previous point.
- Breaking changes. Many times should actually be something like a parallel change
- Code that changes storage data (format or manipulation). Data corruption can be so dangerous that I'd consider it almost like a breaking change scenario.
- Code that modifies input/output. Except if new I/O parameters/fields/things are optional.
- Code that raises new potentially unhandled exceptions. Extending your validation logic with new scenarios? Cover with a FF the new validation until all upper layers/callers have been revised and handle the new errors.
- Complex refactors or rewrites. If you have a really high code coverage might not need it, but we all know what usually happens with code coverage as codebases grow.
- Untested or poorly tested code. Your change might look tiny, but tiny changes in complex systems can wreak havoc in unfortunate situations.
- Code changes that might have potential performance hits. From adding a new data manipulation library to modifying an apparently innocuous ORM query, monitoring will alert you of performance degradations but if you can easily switch back to the old version while you ensure everything is fine, the better.
- Simple A/B testing. As in "one control group and one experiment group".
When NOT to use Feature Flags
- To disable system components. While the mechanics are the same, those uses are for System Switches, Ops Toggles or whatever you want to call them, but do not mix with feature flags, because it sends the wrong message of a feature flag being long term, when they shouldn't be.
- For permanent versioned features. Example: If you have versioned API endpoints, the Feature flag should disappear as soon as the new version is stable and you should have routing-based versioning (or user-based or any other solution except a feature flag).
- Complex A/B testing. There are better solutions than feature flag systems for A/B testing so use one instead.
Miscellaneous closing tips
Feature Flags are to be used by developers and product, while System Switches are for Systems, DevOps, SREs and the like.
When coding the feature flag check and forking logic, there are two accepted paths: The first one allows to just delete the lines in the future without any further changes:
if not feature_flags.enabled(feature_flags.constants.NEW_SHINY_FEATURE): # old path (don't forget to return to avoid executing also new logic!) # new logic
And the more commonly found, which just needs code re-indentation when removing the flag:
if feature_flags.enabled(feature_flags.constants.NEW_SHINY_FEATURE): # new logic else: # old logic
If you want to know more about what are Feature Flags, this article is quite detailed.