As the number of development teams grew and system complexity increased we have to find ways to ship features to customers continuously as well as protect customers from outages of 3rd party APIs. To solve this problem we can use "feature flags" technique and introduced implementation and lifecycle guidelines.

A feature flag (also called a feature switch) helps DevOps target customer needs by enabling (revealing) or disabling (hiding) in the solution they are developing, even before release. Feature flags make development faster and safer because it allows you to quickly iterate over a production deployment.

New approach allows for testing new features on a subset of loyal customers. No matter how much testing you do in lower environments, something unexpected usually comes up in production which you can’t prepare for due to volume issue, edge cases or environment issues so feature flags helped us a lot.

Types of feature flags

We identified 2 types of feature flags:

RuntimeAllows you to turn things on an off in a running application, these flags are the most useful, because you can enable features at will. For example if you worked 6 weeks on a feature, to not do a big-bang roll out andmigration or enable the feature for test users - you would use runtimefeature flags for this.
Build timeEnables or includes features into artifact based on a build process. For exampleif you have multiple payment providers, and you want to disable one and exclude it from the build - you would use build time FF.

Categories of feature flags by function

Also, we separate feature flags by function they perform:

1. 🏎 "Launch" feature flags

Temporary FFs that are created before feature is completed to enable continuous deploy of the app and enable feature for test users. Also to perform gradual rollout to customers based on their segment. These are usually deleted when feature is fully completed and deployed to customers.

Lifecycle of launch feature flags:

First step when starting a new feature is to create a "launch" feature flag
Then you wrap your entrypoint with a feature flag. Starting from this point on you can ship your changes to production with feature flag disabled
At the end of the project when feature is enabled for all customers, remove feature flag

2. 🚩 Risk-mitigation flags

These flags are long-lived, they are created and kept permanently. It is useful to turn on and off certain features of the application when necessary.

Few use cases

Wrap 3rd party integrations to wire off in case of a failure of 3rd party. For example:

in an e-commerce platform that supports multiple payment systems it makes sense to wrap each one of them with separate feature flags, so that when one is unavailable or have degraded performance you can simply save yourself few grey hairs by not having to process payments using payment provider in a degraded state but then turn it back on when it's healthy.
Another example is wiring off warehouse integration when warehouse software is down

Wrap internal features with feature flags to disable in case of performance issues, load events, etc

Implementation guidelines

Defined and implemented a framework for working with feature flags with few requirements in mind: naming conventions, lifecycle of feature flags, etc
Flags should be named after features they "enable", not "disable":

enable_paypal? is a good name, because it is clear if feature flag is enabled, PayPal will be enabled too 😄
disable_paypal? is bad, because for developers it takes more time to process negative statements (we tested this!)

Feature flags must be available in all environments starting from dev
Automated provisioning of feature flags. This is required to prevent situations when feature flag exists in one environment and not the other, making conditions of the system unpredictable
Defined and implemented "smart defaults" to make feature flags easy and robust to work with
Feature flag you add should not be conflicting with other feature flags

💡

Alternative to feature flags Prior to introduction of feature flags the alternative was to time releases to feature launches which turned out to be super stressful and fragile as it didn’t allow to test new features in production prior to customer launch to verify quality.