Making Codegen Pluggable#

This document describes various code generation and runtime concepts that can be used to make Smithy code generators open to extension.

Why make codegen extensible?#

Smithy code generators need to be extensible so that optional features can be contributed to augment generated code. For example, Smithy code generators can generate generic clients that know how to send requests to an endpoint, but AWS SDK code generators resolve endpoints based on other configuration settings like regions. Smithy code generators should have no built-in concept of "region", and instead they should rely on codegen integrations that can augment generated code based on the presence of traits and configuration found in smithy-build.json files.

Integrations#

Integrations are the primary abstraction used to customize Smithy code generators. Integrations are found on the classpath using Java Service Provider Interfaces (SPI) and are used to customize Smithy code generators.

What can integrations customize?#

Various aspects of a Smithy code generator can be customized. For example:

Generate custom files like licenses, readmes, etc.
Preprocess the model (e.g., validate that the model uses only features supported by the generator, remove unsupported features, add codegen specific traits, etc)
Add parameters used to configure a client (e.g., constructor arguments, builder parameters, etc)
Inject interceptors into the client automatically (based on traits or opt-in flags)
Inject custom client or server HTTP request headers
Contribute available protocol implementations that a generator can choose from when generating clients or servers
Contribute authentication scheme implementations that a generator can choose from when generating clients or servers
Intercept and modify sections of generated code (this feature is part of AbstractCodeWriter)
Add dependencies either unconditionally or based on the presence of shapes and traits in the model
Modify the SymbolProvider used to convert shapes into code (e.g., add custom reserved words, change how names are generated, etc.)
Add custom retry-strategies

Only customize through opt-in#

Simply finding an integration on the classpath should not enable the integration. Integrations should only be enabled through opt-in signals. Traits found in the model and feature configuration in smithy-build.json are used to enable customizations performed by integrations.

Creating a `SmithyIntegrations`#

Smithy codegen provides a pre-built integration interface, SmithyIntegration, that should be used by every Smithy code generator. Using this standardized interface ensures all code generators follow the same basic framework and makes it easier to contribute features that span multiple code generators.

SmithyIntegration requires a few generic type parameters:

SmithyIntegration<S,
                  W extends SymbolWriter<W, ?>,
                  C extends CodegenContext<S, W, ?>>

S: The settings object used to configure the code generator. This object should be a basic POJO or Java record that captures the same properties used by smithy-build.json files to configure the generator. For example, this object might contain the service shape ID being generated, a specific protocol shape ID to generate, the code generation mode (client or server), etc.
W: The specific subclass of SymbolWriter that is used by the code generator. For example, if generating Python code, you should create a PythonWriter and supply that as the type parameter.
C: The CodegenContext object used by the generator. This type depends on codegen settings. It provides integration methods access to the model being generated, the settings object, the SymbolProvider used to convert shapes to Symbols, and a FileManifest that's used to write files to disk. Each implementation is expected to create a specific subtype of CodegenContext.

Example of a custom SmithyIntegration for Python:

interface PythonIntegration extends
        SmithyIntegration<PythonSettings, PythonWriter, PythonContext> {}

Example codegen settings type:

record PythonSettings(ShapeId service, ShapeId protocol);
// A builder pattern could be applied later if the number of arguments grows.

Example of a custom CodegenContext:

record PythonContext(
    Model model,
    PythonSettings settings,
    SymbolProvider symbolProvider,
    FileManifest fileManifest,
    WriterDelegator<PythonWriter> writerDelegator,
    List<PythonIntegration> integrations
) implements CodegenContext<PythonSettings, PythonWriter, PythonIntegration> {}

This integration is then implemented to implement customizations:

public final class AddCodeLicense implements PythonIntegration {
    // implement overrides, detailed below
}

Identifying integrations#

Integrations are identified using the SmithyIntegration#name() method. This method will return the canonical class name of the integration by default, but it can be overridden to provide a different name. Note that naming conflicts between integrations are not allowed.

How integrations are ordered#

Integrations are ordered using a kind of priority ordered dependency graph. Integrations can specify that they should be applied before other integrations by name and/or after other integrations by name. The following example states that the integration needs to run before "Foo" but after "CodeLicenseHeader":

@Override
public List<String> runBefore() {
    return List.of("Foo");
}

@Override List<String> runAfter() {
    return List.of("CodeLicenseHeader");
}

In rare cases, you might need more granular control over the order of an integration. A priority can be provided to influence when the integration is applied relative to other integrations when their dependencies are resolved. The higher the priority, the earlier an integration is applied.

@Override
public byte priority() {
    return 64;
}

Tip

DirectedCodegen automatically handles finding integrations on the classpath and topologically ordering them.

Preprocessing models#

A common requirement of code generators is to preprocess the model. For example, a generator that doesn't support event streams might want to filter out event stream operations and emit warnings. A generator could also choose to apply synthetic traits (traits that are not persisted when the model is serialized) to shapes in the model as part of their code generation strategy.

The model can be preprocessed by implementing the SmithyIntegration#preprocessModel method and returning an updated model.

@Override
Model preprocessModel(Model model, PythonSettings settings) {
    // Perform some transformation and return the updated model.
    return model;
}

Changing how shapes are named or how files are generated#

Another requirement when generating code might be to change the strategy used for naming shapes, the file location of shapes, or just adding metadata to each Symbol created by a SymbolProvider. This can be achieved by implementing SmithyIntegration#decorateSymbolProvider:

@Override
public SymbolProvider decorateSymbolProvider(Model model, PythonSetting settings, SymbolProvider symbolProvider) {
    // Decorate the symbol provider and add a "foo" property to every symbol.
    return shape -> symbolProvider.toSymbol(shape)
            .toBuilder()
            .putProperty("foo", "hello!")
            .build());
}

Intercepting and updating sections of code#

Code generators can designate sections of code that can be modified by integrations. This feature allows integrations to do things like add text to every code file (for example a license header), apply annotations to generated types, change the type signature of a class, change how classes are created, etc. Implementations of CodeInterceptors registered with SmithyIntegrations must be added to each code writer created during code generation.

Let's say you wanted to emit a customizable section in generated code where headers for the code could be modified to add a custom license header or disclaimer that the code is generated. This can be achieved by first creating an implementation of CodeSection. We'll call it CodeHeader:

// This event does not need any properties.
record CodeHeader() implements CodeSection;

When generating code, inject the section at the start of each file:

mywriter.injectSection(new CodeHeader());

injectSection is used because this section of code is empty by default. If the section should have content by default, then use pushSection and popSection:

mywriter.pushSection(new CodeHeader());
mywriter.write("// This is generated code");
mywriter.popSection();

The call to injectSection implicitly calls popSection. When popSection is called, the code that was written during that section is sent to each matching interceptor so that they can prepend to the contents, append to the contents, or completely rewrite the contents.

CodeInterceptors can be registered to append to this section by returning interceptors from SmithyIntegration#interceptors:

@Override
public List<? extends CodeInterceptor<? extends CodeSection, PythonWriter>>
        interceptors(C codegenContext) {
    return List.of(new CodeHeaderInterceptor());
}

Interceptors should be created as a dedicated class. The following interceptor appends to the existing content in the section:

final class CodeHeaderInterceptor extends CodeInterceptor.Appender<CodeHeader, PythonWriter> {
    @Override
    public Class<CodeHeader> sectionType() {
        return CodeHeader.class;
    }

    @Override
    public void append(PythonWriter writer, CodeHeader section) {
        writer.write("""
            /*
             * Copyright 2023 example.com, Inc. or its affiliates. All Rights Reserved.
             */
            """);
    }
}

Generating other custom content#

Integrations might need to write additional files like a README, license files, or generate additional code. Integrations can override the SmithyIntegration#customize method to perform anything they need to do. This method is provided the CodegenContext type that is used with the integration, allowing the customize method access to the model, settings object, symbol provider, WriterDelegator, and FileManifest used to save and read files.

The following example writes a custom README.md file:

@Override
public void customize(PythonContext context) {
    context.writerDelegator().useFileWriter("README.md", writer -> {
        writer.write("""
            # $L service client
            Client SDK library ...""",
            context.settings().service()
        );
    });
}

Registering `SmithyIntegration`s#

Implementations of Integrations are registered with Java SPI by adding a specific META-INF file and found on the classpath. For example, if the integration class is defined as software.amazon.smithy.python.client.PythonIntegration, then when using Gradle, the fully qualified class name of each implementation of the integration needs to placed in a file named src/main/resources/META-INF/services/software.amazon.smithy.python.client.PythonIntegration.

For example:

# in src/main/resources/META-INF/services/software.amazon.smithy.python.client.PythonIntegration
software.foobaz.AddCodeLicense

Using `SmithyIntegration`s in generators#

DirectedCodegen automatically handles finding integrations on the classpath, topologically ordering them, and applying each integration method at the appropriate point of code generation.

Making Codegen Pluggable#

Why make codegen extensible?#

Integrations#

What can integrations customize?#

Only customize through opt-in#

Creating a SmithyIntegrations#

Identifying integrations#

How integrations are ordered#

Preprocessing models#

Changing how shapes are named or how files are generated#

Intercepting and updating sections of code#

Generating other custom content#

Registering SmithyIntegrations#

Using SmithyIntegrations in generators#

Creating a `SmithyIntegrations`#

Registering `SmithyIntegration`s#

Using `SmithyIntegration`s in generators#