Dependency injection in Python with gin-config

Thom Hopmans · July 28, 2023

datascience python dependency-injection

Dependency injection in Python with gin-config

In sofware development writing clean and testable code is essential for building robust applications. Dependency injection is a design pattern that promotes loose coupling between components by letting objects or functions receive other objects or functions that it depends on. A correct implementation of dependency injection leads to code that is easier to maintain and easier to test. While there are several ways to implement dependency injection in Python, one particularly useful library for data applications is gin-config. This post will explore how gin-config can be used, and how it simplifies data workflows, especially when experimenting in notebooks.

What is `gin-config`?

gin-config is a lightweight Python library (it has no other dependencies 🙏) that provides a simple way to perform dependency injection in your applications. Functions or classes can be decorated with @gin.configurable, allowing values to be supplied from a config file or string using a simple Python-like syntax.

Advantages of `gin-config`

Using gin-config offers several advantages:

Minimalistic: gin-config is designed to be lightweight and non-intrusive. It does not impose a specific coding style or architecture. The only thing that is required is decorating functions or classes with @gin.configurable, making it easy to integrate it into any of your existing projects without making significant changes.
Flexibility: With gin-config, you can define configurations and update them during runtime. This flexibility is valuable when deploying applications in different environments or handling various use cases, e.g. only wanting to calculate features for a subset of your data.
Testability: gin-config makes it easier to isolate and test individual components, because custom configurations can be provided for specific test cases.

Getting Started with `gin-config`

Let’s dive into the steps to start using gin-config for dependency injection in Python:

Step 1: Installation

First, install gin-config using pip

pip install gin-config

or add it your pyproject.toml with Poetry using

poetry add gin-config

Step 2: Define configurables

Inject the configurations into your classes or functions using the @gin.configurable decorator.

import gin
import pandas as pd

gin.parse_config_file("config.gin")

def run() -> float:
    data = fetch_data()
    X, y = calculate_features(data)
    model = fit(X, y)
    return score(model, X, y)

@gin.configurable
def fetch_data(
    batch_size: int,
    features: list[str],
) -> pd.DataFrame:

@gin.configurable
def calculate_features(
    data: pd.DataFrame,
    features: list[str],
) -> tuple[pd.DataFrame, pd.DataFrame]:

Step 3: Define configuration

Create a config.gin file to define your configuration. For example:

fetch_data.batch_size = 500
calculate_features.features = [
    "feature_1",
    "feature_2",
    "feature_3"
]

Note that the names in the configuration match the attributes of the classes or functions we made configurable in the previous step. This is all it takes to understand the basics of the gin-config syntax.

Step 4: Apply configuration

In your Python code, import gin and parse the configuration files:

import gin

gin.parse_config_file("config.gin")

Step 5: Instantiate

Instantiate your classes or call your functions and the configurations will be automatically injected:

score = run()
score

> 0.76

That’s all!

Exploring Ideas: The Power of Notebooks

Notebooks are like playgrounds for data scientists. They allow us to experiment and play with code in an interactive way, making it easy to try out different ideas and see immediate results, i.e. the performance of a new model feature. But that's not all! Once we find a solution that works well in the notebook, we can take that code and turn it into the real deal - production code. Production code is the strong and reliable version of our playful experimental code. It's like transforming a rough sketch into a polished masterpiece.

The downside of converting experimental code to production code is that it tends to become less easy to configure for new experiments, e.g. when a new experiment requires new functionality. As a result, production code is duplicated multiple times into notebooks and slightly adjusted to make it run for that fancy new experiment or to test that one extra hypothesis. Although this seems like a feasible approach, it can become a burden over time. E.g. what changes did we make in the notebook compared to the function in production? And what if the changes we want to make to the function are 3 layers deep? Well, dependency injection to the rescue!

Using the same example as above, we now pretend that all run() and other application logic is stored in the repository. But instead of running this application on our cloud infrastructure, we want to experiment with it in a notebook environment. Therefore, we update the production code in the repository to accept a gin-config configuration as an argument:

def run(gin_config: pathlib.Path | str) -> float:
    gin.parse_config(gin_config)

    data = fetch_data()
    features = calculate_features(data)
    return predict(features)

In the notebook environment, the function is then imported from the repository, and given a custom configuration with for example a lower batch size or other features:

import run from repository

gin_config = """
fetch_data.batch_size = 200
calculate_features.features = ["feature_1", "feature_2"]
"""

run(gin_config)

> 0.89

And tada, using dependency injection we can run various experiments in our notebook with the production code from the repository, while having the flexibility to inject custom code or even methods. 🎉

For this example, a reasonable alternative would have been just passing the config as an argument to the functions. However, this usually implies passing along a configuration file throughout your applicaton, to multiple functions at different places. With gin-config you can keep all of this centralized in one file, and with a single line of code parse it and make it available throughout your entire application.

Note that the above example is only a minor demonstration of the great power that comes with gin-config. It also supports making classes and or function from other packages configurable with gin.external_configurable(tf.train.MomentumOptimizer) or injecting functions with dnn.activation_fn = @tf.nn.tanh in config.gin. But that folks, is for a future post.

Conclusion

gin-config is a powerful tool for dependency injection in Python applications. By using gin-config, we are able to easily configure code in a flexible manner, which in turns enables rapid experimentation from within notebooks. Additionaly, because it improves code re-usability, this leads to cleaner and easier maintainable code. Finally, the simple integration make it an excellent choice for any project. The key to successful dependency injection lies in proper organization and separation of concerns. By applying DI principles using gin-config, you can set the first steps on your way to building scalable and robust Python applications that are easy to maintain and extend, even while using notebooks for analysis.