Polyglot Code Refactoring Tools

How to refactor across multiple languages in a codebase?

Polyglot Code Refactoring Tools

Modern code refactoring tools need to support refactors that span multiple programming languages. Today’s applications are polyglot – built using multiple programming languages. This means code refactors such as stale feature flag cleanup often involve multiple programming languages in the same code base. A mobile App’s code base, for example, may include both Objective-C and Swift for iOS, and both Java and Kolin for Android. Its backend services may include both Java and Go, while the Web version of the App may include TypeScript or JavaScript. Cleaning up a stale feature flag from code may require cleaning up code across all these languages.

Lexical tools such as grep and sed naturally are polyglot. Need similar tools for more sophisticated refactoring (e.g., syntactic or type-based).

The need for polyglot refactoring tools poses a number of challenges:

  • Cleaning up code within a single app or service that spans multiple languages. For example, Kotlin and Java for a single Android mobile app, or Swift and Objective-C for iOS.

  • Cleaning up a code base that spans multiple languages. Mobile, front end, backend, data, etc.

  • A single tool that can handle multiple languages in a single framework. AST-based.

Refactoring across multiple languages presents unique challenges. When a developer decides to delete a method in one language, it might require changes in another language that depends on it. This interdependence can lead to cascading code rewrites across the entire codebase, potentially spanning multiple languages. This process needs to be handled carefully to avoid introducing bugs during refactoring.

The following example shows a code rewrite scenario where the developer wants to delete the method declaration isLocEnabled (annotated with @Value(“location”)) from a Java class named JavaClass, and replace all its usages with true throughout the codebase, spanning both Java and Kotlin files.

To keep the codebase free from dead code, the developer also wants to simplify the callsites of isLocEnabled and transitively remove both unreachable and dead code in Java and Kotlin.

In the Kotlin code above, once isLocEnabled is replaced with true, the refactoring tool should also automatically trigger a chain of deep cleanup code rewrites: (1) simplify true || x > 0 to true, (2) delete the redundant if(true) statement, (3) delete the unreachable return statement, (4) delete the unused parameter y in the apply function, and (5) finally, delete the unnecessary import declaration.

The Kotlin compiler automatically performs dead code elimination, ensuring that the generated code is free of any dead or unreachable code. However, retaining dead and unreachable code at the source-level contributes to technical debt, increases build times, and maintenance burden. It also makes it hard for developers to understand and reason about the code’s behavior, especially when they are isolating and fixing bugs.

We should select a refactoring tool that

  1. works seamlessly across multiple languages (e.g., Java and Kotlin in this case).

  2. automatically performs deep cleanup to eliminate unnecessary unreachable and dead code.

  3. is lightweight and fast, allowing it to scale to large codebases.

In our previous blog-post, we discussed various types of refactoring tools, along with their pros and cons. For the refactoring operation described above, using a build-system based tool mandates separate implementations for Java and Kotlin, each embedding deep code cleanup operations described above. The code cleanup operations are fairly identical between the two languages and hence do not need to be duplicated. Maintaining multiple versions adds extra complexity, particularly in ensuring feature parity across versions, and updating them when the build system and/or the compiler version changes.

On the other hand, an AST-based refactoring tool is a better alternative for this scenario. It’s lightweight, language-agnostic, and enables deep cleanups like if-simplification and dead code elimination without requiring symbol and/or type information. Importantly, these tools’ polyglot nature allows for sharing implementations across languages while aligning with the recent trends in the increasing diversity of programming languages used within companies.

To the best of our knowledge, a few AST-based refactoring tools are available for developers to perform cross-language code rewrites: Uber’s PolyglotPiranha, Meta’s Codemod, and Semgrep. PolyglotPiranha stands out for its ability to perform sophisticated code migrations and cleanups across multiple languages while operating at the syntactic level. We will dive deep into PolyglotPiranha and show how it can perform the described code rewrite by composing a series of simple match-replace rules as shown below

PolyglotPiranha constructs a graph of match-replace rules. Each rule is a node in the graph comprising two clauses: (1) a match clause to match a code snippet, and (2) a replace clause for the replacement code. The edges in the graph specify the order in which rules are applied. Each match of a node may produce a set of captures that can be used by subsequent nodes in the graph.

PolyglotPiranha iteratively applies a set of primitive deep cleanup rules until it reaches a fixed point:

  • DeleteMethodDeclaration: Removes a specified method declaration from a class by matching a 'templated' method name, and removing associated comments and annotations. In addition, it captures the fully-qualified class and method names for use in subsequent nodes.

  • DeleteImportDeclaration: Uses the captured data from DeleteMethodDeclaration to eliminate the corresponding import declaration..

  • PropagateConstant: Replaces all the method calls to the deleted Java method with a constant true.

  • SimplifyDisjunction: Simplifies boolean conditionals, transforming if (true || C) and if (false && C) to true and false, respectively.

  • SimplifyConditional: Deletes dead code related to conditionals, such as if (true) and if (false).

  • DeleteUnreachableCode: Deletes unreachable code, for instance, when a return statement S dominates another statement (or statement block) B, B can be safely deleted.

  • DeleteUnusedParameter: Removes unused parameters in private methods where it’s safe. This rule is not triggered on public methods as they may be used beyond the current module.

The resulting Kotlin code after all the simplifications is shown in Step 7) above. At this point, it is worth going back to our original code and seeing how far we have come in terms of automating the refactoring operation using an AST-based polyglot refactoring tool. It is indeed a ‘tada’ moment for developers, as the tool has completely automated the grunt work!

PolyglotPiranha currently supports 10 programming languages including Java, Kotlin, Swift, Go, Python, Scala, JavaScript, Typescript, and Thrift. Since it is built on top of tree-sitter which officially supports 133 programming languages, it is fairly straightforward to add a new language to PolyglotPiranha, with the caveat that syntactic difference between languages such as (true || COND) in Java and true or COND in Python should be handled seamlessly in the tool itself. Based on our observation, a large fraction of the deep cleanup rules are reusable within a broad family of languages (Java, Kotlin, etc.).

In this discussion, we have left many hanging threads of PolyglotPiranha, including the details of its match-replace graph domain-specific language (DSL), semantics, runtime execution engine, and ability to perform fast interprocedural analysis across large codebases. These are topics for a future blog-post.

If your organization faces challenges with migrations, dependency upgrades, and/or addressing security vulnerabilities, we'd love to hear from you and would be keen on automating these tasks for you.

We invite you to join our Slack community, where we continue to explore and discuss these topics further.