Decoupling Codegen with Symbols#

Symbols are used in Smithy code generators to refer to qualified types in the target programming language. Symbols provide a layer of abstraction between the code being generated and the logic used to determine:

  • how shapes are named
  • where types are declared and defined
  • the runtime dependencies needed for a type
  • the imports needed to define and reference a type

Quick Symbol example#

The following example uses the built-in "T" formatter of SymbolWriter to write symbols to the generated code and automatically add imports to the file:

// Create MyWriter, an imaginary subclass of SymbolWriter.
// Set the namespace of the writer to "example.foo", which internally
// calls SymbolWriter#relativizeSymbols, passing in "example.foo".
var writer = new MyWriter("example.foo");

// Create an example Symbol that refers to a type in the same namespace
// as the writer's current namespace. Setting the the namespace also
// requires the namespace separator.
var exampleSymbol = Symbol.builder()
    .name("Example")
    .namespace("example.foo", ".")
    .build();

// "$T" in the following call to 'write' is replaced with just
// the name of the Symbol ("Example") because the Symbol's
// namespace matches the relative namespace of the writer.
// If exampleSymbol was in a different namespace, the writer
// would either write the fully-qualified name or would add
// an import to the file for the symbol.
writer.write("""
    if ($T.isEmpty()) {
        doSomething();
    }
    """,
    exampleSymbol);

Benefits of Symbols#

  • Refactoring. Symbols make it easy to refactor how shapes are generated and the file location of where shapes are generated.
  • Managing imports. Symbols can contain the list of other symbols needed in order to refer to the type. For example, if a list has generic type parameters (e.g., List<BigDecimal>), then the symbol that refers to the list would also contain symbols referring to the generic type parameters. All the symbols needed in a generated code file can then be used to automatically create import statements as symbols are referenced by a CodeWriter.
  • Managing dependencies. Symbols can define the dependencies needed by the generated code in order to reference the symbol. For example, if BigDecimal implementations are provided by a third-party library, then a dependency is needed by the generated code in order to reference BigDecimal in code. Symbols carry this information because they are implementations of a SymbolDependencyContainer. After writing code to a CodeWriter, all the dependencies of each referenced symbol can be used to generate a dependency graph (e.g., Maven poms, Python setup.py files, etc.).
  • Readability. Referencing symbols using $T makes the string templates passed to CodeWriter more succinct.

Referencing Symbols using SymbolReference#

A SymbolReference is used to refer to another Symbol. For example, a SymbolReference is used when a Symbol uses generic type parameters. The following example is a Symbol created for a list of example.foo.Example values in a Java-like language, collections.List<example.foo.Example>:

// Create the "Example" Symbol used in the "List<Example>".
var exampleSymbol = Symbol.builder()
        .name("Example")
        .namespace("example.foo", ".")
        .build();

// Create a "List<Example>" that can be properly relativized
// against the current namespace of the SymbolWriter.
var listSymbol = Symbol.builder()
        .name("List")
        .namespace("collections", ".")
        // Automatically creates a SymbolReference from a Symbol.
        .addReference(exampleSymbol)
        .build();

Aliasing Symbols#

An alias can be given to SymbolReference to deconflict the symbol with other symbols already in the target namespace. For example, let's say you need to reference a type that uses a fairly common name, so you decide to alias the common name to something that is far less likely to have conflicts.

The following example creates a Symbol for List<__Example> where __Example is an alias to com.foo.Example:

var exampleSymbol = Symbol.builder()
    .name("Example")
    .namespace("example.foo", ".")
    .build();

// Alias "Example" to "__Example".
SymbolReference exampleReference = exampleSymbol.toReference("__Example");

// Create a "List<__Example>" that can be properly relativized
// against the current namespace of the SymbolWriter.
var listSymbol = Symbol.builder()
    .name("List")
    .namespace("collections", ".")
    .addReference(exampleReference)
    .build();

When a SymbolReference is added to a Symbol, SymbolWriter will know that the references of the Symbol need to be accounted for when writing the symbol by importing any necessary dependencies with appropriate aliases.

// Hypothetical example of managing imports and using references.
var writer = new MyWriter("example.other.namespace");

writer.write("""
    var list = new $T();
    """,
    listSymbol);

assert(writer.toString().equals("""
    package example.other.namespace;

    import collections.List;
    import example.foo.Example as __Example;

    var list = new List<__Example>();
    """));

Symbol dependencies#

Symbols can be used to automatically generate dependency closures and configuration files based on the symbols written to a SymbolWriter. This allows generated code to depend on only the closure of dependencies they actually need. Codegen plugins can conditionally require runtime dependencies in generated code (something needed by AWS SDKs to add dependencies on AWS signature version 4 implementations, credential providers, and other AWS-specific features).

The symbols used during codegen can be tracked using a WriterDelegator, and from these tracked symbols, the graph of referenced SymbolDependency can be written to whatever dependency manifest format is needed for the target environment.

Dependencies are registered with a Symbol by creating a SymbolDependency and adding them to the Symbol via Symbol#addDependency.

The following example creates a TypeScript Symbol for big decimal that refers to a type defined in a package named big:

// This dependency is needed in JavaScript.
var bigRuntimeDependency = SymbolDependency.builder()
        .dependencyType("dependencies")
        .packageName("big")
        .version("^5.2.2")
        .build();

// This dependency is needed by the TypeScript compiler.
var bigTsDependency = SymbolDependency.builder()
        .dependencyType("devDependencies")
        .packageName("@types/big.js")
        .version("^4.0.5")
        .build();

// Create a symbol used for big decimals in Smithy.
var big = Symbol.builder()
        .name("Big")
        .namespace("big", "/")
        .addDependency(bigRuntimeDependency)
        .addDependency(bigTsDependency)
        .build();

As you can see in the above example, symbol dependencies can have a dependency type that is used to classify when the dependency is needed (see SymbolDependency#dependencyType). It is common in TypeScript libraries to need different dependencies for JavaScript code vs TypeScript type definitions, so two dependencies were added to the created big symbol: one that is a normal "dependencies" and one that is a "devDependencies".

SymbolDependency best practices#

Creating a SymbolDependency in each place the dependency is needed spreads them out all over a project, making it difficult to change dependencies. Rather than create a SymbolDependency each time they are needed in the code generator, a better practice is to create a dedicated Java enum that contains each Symbol used in the project. This enum can be referenced throughout a project, making it possible to update version numbers in a single place.

For example:

/**
 * An enum of all of the built-in dependencies managed by this package.
 */
public enum TypeScriptDependency implements SymbolDependencyContainer {

    // Conditionally added if a big decimal shape is found in a model.
    BIG_JS("dependencies", "big.js", "^5.2.2"),
    TYPES_BIG_JS("devDependencies", "@types/big.js", "^4.0.5");

    public static final String NORMAL_DEPENDENCY = "dependencies";
    public static final String DEV_DEPENDENCY = "devDependencies";
    public static final String PEER_DEPENDENCY = "peerDependencies";
    public static final String BUNDLED_DEPENDENCY = "bundledDependencies";
    public static final String OPTIONAL_DEPENDENCY = "optionalDependencies";

    public final SymbolDependency dependency;

    TypeScriptDependency(String type, String name, String version) {
        this.dependency = SymbolDependency.builder()
                .dependencyType(type)
                .packageName(name)
                .version(version)
                .build();
    }

    @Override
    public List<SymbolDependency> getDependencies() {
        return Collections.singletonList(dependency);
    }
}

Note

  1. The enum implements SymbolDependencyContainer, an abstraction for composing dependencies.
  2. This example is taken from smithy-typescript, which shows other possibilities like how to define unconditional dependencies that are needed by every client.

Tracking externally controlled dependencies#

A DependencyTracker can be used to track available dependencies using a JSON file that can be then used to provide version numbers to an enum. This can be useful if version numbers are maintained outside a code generator or need to be translated from other formats or lock files. For example:

/**
 * An enum of all of the built-in dependencies managed by this package.
 */
public enum TypeScriptDependency implements SymbolDependencyContainer {

    // Conditionally added if a big decimal shape is found in a model.
    BIG_JS("big.js"),
    TYPES_BIG_JS("@types/big.js");

    public final SymbolDependency dependency;

    TypeScriptDependency(String name) {
        this.dependency = VersionFile.VERSIONS.getByName(name);
    }

    @Override
    public List<SymbolDependency> getDependencies() {
        return Collections.singletonList(dependency);
    }

    private static final class VersionFile {
        private static final DependencyTracker VERSIONS = new DependencyTracker();
        static {
            String path = "sdkVersions.json";
            VERSIONS.addDependenciesFromJson(SdkVersion.class.getResource(path));
        }
    }
}

Converting shapes to Symbols with SymbolProviders#

A SymbolProvider is used to convert Smithy shapes to Symbols. A SymbolProvider is the brains of a Smithy code generator; it tells the code generator the types used to represent shapes in the model, the dependencies needed by generated code, the filenames used to declare and define types, and automatically ensures reserved words in the target language are accounted for during codegen.

A selection of existing SymbolProviders can be found at:

  1. TypeScript: https://github.com/awslabs/smithy-typescript/blob/main/smithy-typescript-codegen/src/main/java/software/amazon/smithy/typescript/codegen/SymbolVisitor.java
  2. Python: https://github.com/awslabs/smithy-python/blob/develop/codegen/smithy-python-codegen/src/main/java/software/amazon/smithy/python/codegen/SymbolVisitor.java
  3. Go: https://github.com/aws/smithy-go/blob/main/codegen/smithy-go-codegen/src/main/java/software/amazon/smithy/go/codegen/SymbolVisitor.java

The simplest way to implement a SymbolProvider is to also implement ShapeVisitor. The basic setup will look something like this:

package software.amazon.smithy.python.codegen;

import java.util.logging.Logger;
import software.amazon.smithy.codegen.core.SymbolProvider;
import software.amazon.smithy.model.Model;
import software.amazon.smithy.model.shapes.ServiceShape;
import software.amazon.smithy.model.shapes.Shape;
import software.amazon.smithy.model.shapes.ShapeVisitor;

final class SymbolVisitor implements SymbolProvider, ShapeVisitor<Symbol> {

    private static final Logger LOGGER = Logger.getLogger(SymbolVisitor.class.getName());

    private final Model model;
    private final MySettings settings;
    private final ServiceShape service;

    SymbolVisitor(Model model, MySettings settings) {
        this.model = model;
        this.settings = settings;
        this.service = model.expectShape(settings.getService(), ServiceShape.class);
    }

    @Override
    public Symbol toSymbol(Shape shape) {
        Symbol symbol = shape.accept(this);
        LOGGER.fine(() -> format("Creating symbol from %s: %s", shape, symbol));
        // TODO: Escape reserved words.
        return symbol;
    }

    @Override
    public Symbol structureShape(StructureShape shape) {
        String name = getDefaultShapeName(shape);

        // Generate errors differently than normal structures.
        if (shape.hasTrait(ErrorTrait.class)) {
            return createErrorStructure(shape);
        } else {
            return createNormalStructure(shape);
        }
    }

    private Symbol createErrorStructure(StructureShape shape) {
        throw new UnsupportedOperationException("Error type codegen not yet implemented");
    }

    private Symbol createNormalStructure(StructureShape shape) {
        return Symbol.builder()
            .name(name)
            // Change this to however the settings object configures the
            // target namespace and the namespace separator for the
            // language.
            .namespace(settings.getNamespace(), ".")
            // Change this to however filenames should work each generated
            // type. If this changes, then the files used to generate
            // code should automatically change too.
            .definitionFile("models/" + name + ".xyz")
            .build();
    }

    private String getDefaultShapeName(Shape shape) {
        // Use the service-aliased name and ensure it's capitalized.
        return StringUtils.capitalize(shape.getId().getName(service));
    }

    // TODO implement other ShapeVisitor methods.
}

Automatically handling reserved words#

Smithy code generators are expected to generate valid code for the target environment. Service teams defining Smithy models should not need to know the intricacies of how Smithy models are converted to every programming language. Instead, Smithy code generators should ensure that reserved words in a target environment are not used by the SymbolProvider. The smithy-codegen-core library provides several abstractions for handling reserved words. These abstractions should be used in your SymbolProvider.

The primary abstraction is the ReservedWords interface. The ReservedWordsBuilder class provides a convenient way to build an instance of ReservedWords. These ReservedWords instances should be integrated into your SymbolProvider by passing the created names, namespaces, and member names through the appropriate escaper. For example:

final class SymbolVisitor implements SymbolProvider, ShapeVisitor<Symbol> {

    // ... other properties
    private final ReservedWords escaper;

    SymbolVisitor(Model model, MySettings settings) {
        // ... other setup
        this.escaper = new ReservedWordsBuilder()
            .put("function", service.getId().getName() + "Function")
            .put("throw", service.getId().getName() + "Throw")
            .build();
    }

    // other methods...

    private String getDefaultShapeName(Shape shape) {
        String name = StringUtils.capitalize(shape.getId().getName(service));
        return escaper.escape(name);
    }
}

While you can manually define the mapping for each reserved word, a simpler method is to create an algorithm for automatically handling reserved words. This can be done by creating a newline delimited file that contains each reserved word and a Function<String, String> that takes a reserved word and returns an escaped word. For example, given the following file named reservedwords.txt:

function
throw

ReservedWordsBuilder can be configured to escape words using the file and your escaping function.

Function<String, String> escaper = word -> {
    // Returns something like "MyServiceFunction".
    return service.getId().getName() + StringUtils.capitalize(word);
});

URL wordsFile = getClass().getResource("reservedwords.txt");

ReservedWords escaper = new ReservedWordsBuilder()
    .loadWords(wordsFile, escaper)
    .build();

Reserved words handling should be as granular as possible. If a symbol is only reserved in certain contexts, then that word should only be treated as reserved in that context. This might require the use of multiple instances of ReservedWords.

var memberNameEscaper = new ReservedWordsBuilder()
    .loadWords(memberNameWordsFile, escaper)
    .build();

var classNameEscaper = new ReservedWordsBuilder()
    .loadWords(classNameWordsFile, escaper)
    .build();

Reserved words handling is case-sensitive by default. You can use reserved words file case insensitively using ReservedWordsBuilder#loadCaseInsensitiveWords.

var escaper = new ReservedWordsBuilder()
    .loadCaseInsensitiveWords(wordsFile, escaper)
    .build();

Composing SymbolProviders#

SymbolProvider has a very simple interface, making it easy to compose functionality using decorators. Decorators can be used to do things like add caching or add more contextual data to Symbols.

The following example decorates a SymbolProvider by adding caching of resolved Symbols:

var cachedProvider = SymbolProvider.caching(mySymbolProvider);

The following example creates a decorator that adds a "shape" property to every Symbol:

final class MyCodegenPlugin {
    static SymbolProvider wrapSymbolProvider(SymbolProvider delegate) {
        return shape -> {
            return delegate.toSymbol(shape).toBuilder()
                    .putProperty("shape", shape)
                    .build();
        };
    }
}

var wrapped = MyCodegenPlugin.wrapSymbolProvider(mySymbolProvider);

Integrating Symbols into your SymbolWriter#

SymbolWriter provides some building blocks to help integrate Symbols into a particular programming language, but the actual gluing together of abstractions, generating import statements, generating dependencies, accounting for aliasing, etc. is an exercise left to each language implementation of Smithy.

Create an ImportContainer for your language#

An ImportContainer is used to track the imports associated with a specific file being generated. Each time a Symbol is written to a SymbolWriter, and each call to methods like SymbolWriter#addImport, Symbols are sent to the ImportContainer owned by the SymbolWriter. The ImportContainer should be aware of the current namespace in use by the SymbolWriter.

The following example implements a simple ImportContainer for a made up language. If a provided Symbol is in the same namespace that the container is tracking, the import is discarded. Otherwise, each import is added to a map of namespaces to a map of alias → target name.

final class MyLangImports implements ImportContainer {
    private final Map<String, Map<String, String>> imports = new TreeMap<>();
    private final MyLangSettings settings;
    private final String namespace;

    MyLangImports(MyLangSettings settings, String namespace) {
        this.settings = settings;
        this.namespace = namespace;
    }

    @Override
    public void importSymbol(Symbol symbol, String alias) {
        var symbolNamespace = symbol.getNamespace();

        // Only import symbols in other namespaces.
        if (!symbolNamespace.equals(namespace)) {
            var namespaceImports = imports.computeIfAbsent(symbolNamespace, ns -> new TreeMap<>());
            namespaceImports.put(alias, symbol.getName());
        }
    }

    @Override
    public String toString() {
        if (imports.isEmpty()) {
            return "";
        }

        // Build up each line of import statements.
        var builder = new StringBuilder();

        for (var entry : imports.entrySet()) {
            var ns = entry.getKey();
            var alias = entry.getValue().getKey();
            var target = entry.getValue().getValue();
            builder.append("import ").append(target);

            // Use a made up aliasing syntax if the alias differs from the target.
            if (!alias.equals(target)) {
                builder.append(" as ").append(alias)
            }

            // Import from a target namespace.
            builder.append(" from ").append(ns);
        }

        builder.append("\n");
        return builder.toString();
    }
}

ImportContainer implements toString so that SymbolWriter can write out imports before writing out the rest of the code.

Create a SymbolWriter subclass#

Each language should create a subclass of SymbolWriter that automatically manages imports, symbols, and writes documentation strings.

The following example shows how a subclass of SymbolWriter can be created.

public final class MyWriter extends SymbolWriter<MyWriter, MyImportContainer> {

    public MyWriter(String namespace) {
        super(new MyImportContainer(namespace));

        // Write Symbols relative to the current namespace.
        setRelativizeSymbols(namespace);
    }

    @Override
    public String toString() {
        // You can override how code is converted to a string. For example,
        // this allows you to add a prelude to generated code or to write the
        // necessary imports that were used in the writer.
        return getImportContainer().toString() + "\n\n" + super.toString();
    }

    public MyWriter someCustomMethod() {
        // You can implement custom methods that are specific to whatever
        // language you're implementing a generator for.
        return this;
    }
}

Use WriterDelegator to create writers#

In order to track the dependencies used while generating code, and to add code interceptors to each created SymbolWriter, code generators should use a WriterDelegator to create SymbolWriters.

A WriterDelegator is used to create and track all the SymbolWriters used during code generation. A codegen project will generally use a single WriterDelegator during codegen. You need to ensure your WriterDelegator knows about the code interceptors returned from SmithyIntegrations so that it can apply transformations as code is written.

Let's say you need to generate code for a structure shape. You ask the WriterDelegator to give you the appropriate SymbolWriter:

delegator.useShapeWriter(shape, writer -> {
    writer.write("Structure $L", shape.getId());
});

WriterDelegator will create the appropriate SymbolWriter that writes to the correct file location based on the Symbol created for the given shape. If multiple shapes use the same filename, then WriterDelegator will provide the same SymbolWriter to each call to useShapeWriter, and it will automatically inject \n prior to vending a previously used writer (this can be customized).

Use useFileWriter to write to a file that isn't specific to a shape:

delegator.useFileWriter("README.md", writer -> {
    writer.write("""
        # This is my README!

        Do you like it?
        """);
});

When codegen has completed, the generator needs to call flushWriters on the delegator to write each created SymbolWriter to the FileManifest the generator is using:

delegator.flushWriters();

All the symbol dependencies detected when using each SymbolWriter can be retrieved from the delegator using getDependencies.

List<SymbolDependency> dependencies = delegator.getDependencies();

These dependencies can then be used to generate things like dependency manifests for the created code.

FAQ#

How do I add more information to Symbols, SymbolReferences, and SymbolDependencies?#

Use typed property bags to store additional information. For example:

Symbol foo = Symbol.builder()
    .name("Foo")
    .namespace("example.foo", ".")
    .putProperty("customData", "hello")
    .build();

String customData = foo.getProperty("customData", String.class);

You can add properties to an existing Symbol, SymbolReference, or SymbolDependency by calling toBuilder first:

foo = foo.toBuilder()
    .putProperty("anotherProperty", true)
    .build();

Does SymbolWriter require one namespace per file?#

No, but that's the easiest way to use SymbolWriter. Your language's subclass can be setup in a way that it uses multiple ImportContainer instances per/namespace in a single file. For example, an ImportContainer could be given the current namespace of a SymbolWriter each time it's invoked, allowing the ImportContainer to perform more targeted relativization. Then the ImportContainer would need special methods used to convert each nested namespace's imports to a string. It's an abstract exercise left up to the implementation.