Migrating to DSL data storage

We finish the migration by designing and implementing a family of DSLs.

by Pieter Olivier on 10 Jul 2025

Pieter Olivier

Women with code projected onto her

In this post, we discuss some of the problems and solutions of introducing domain-specific languages (DSLs) to store data in an existing application. But first, a brief recap. In the first post of this series, we explained why you might want to store some of your data in the form of text files written in a domain-specific language (DSL). Next, in the second post, we showed how some legacy applications can benefit from an intermediate solution where the data is already stored in text documents using JSON. Although this solution offers some version management advantages, it is primarily useful as a stepping stone in an incremental migration to a full DSL approach. In this third and final post, we discussed the full DSL approach to store data from an existing application, including typical problems and solutions.

Unfortunately, no standard recipe exists for introducing DSLs for data storage into your application landscape. Therefore, in this post, we will provide you with some guidelines and tips on how to approach this operation.

Consider the whole family

In most complex applications, different parts of the system use different but often related data. Depending on the situation, a single data DSL could cover all of the application’s needs or a (small) family of DSLs might be better suited. In all cases, careful analysis of the overlap and links between the different concepts in the data is crucial before designing the DSL(s). This is really important because once you start accumulating a significant body of code written in the DSL, improving the DSL’s design will become far more expensive. That being said, we also have experience with rejuvenating legacy DSLs and automating the conversion of old code to the rejuvenated version, including recognizing and refactoring old idioms to new constructs.

Families of DSLs

When different DSLs are needed, common elements between them should be implemented in the same way to avoid confusion and compatibility issues. A good example of this is the use of identifiers. Most identifiers should follow the same (syntactic) rules as they are often used to refer to elements in other DSLs. Other examples could be using the same rules for constants like strings and numbers. DSLs should preferably be designed to interact with each other, for example, by sharing the same namespace of identifiers. Modularity in your syntax definitions is important: copy-and-pasting (a.k.a. “First order reuse”) will hurt consistency and maintenance.

Modular migration

In larger applications, converting the whole application at once to use DSLs for data storage might not be feasible. Deciding how an evolutionary migration should be approached requires an in-depth analysis of the code and how it uses the data. If multiple DSLs are introduced, a “one DSL at a time” approach might be appropriate.

DSL requirement analysis

We recommend starting with a requirements analysis, possibly using the lightweight requirement analysis approach. Because we are renovating an existing application, an important extra source of requirements is the user interface of the application and how users interact with it. These requirements can be further augmented with data found in the data structures stored by the application to verify nothing was missed. In our experience, starting with the user interface results in more intuitive DSLs than if you start by analyzing what the application actually stores.

Application based on DSLs

DSL implementation

In some cases, the data structures as processed by the applications are already close to the data represented in the DSL. Sometimes, the content of database records can be mapped directly to DSL documents. In certain cases, the data structures used internally in the application happen to map beautifully onto a DSL without compromising its quality.

If querying is not an issue (data is only retrieved and stored based on a primary key), it might be just a matter of replacing the code that reads and/or writes records with code that can handle DSL documents. Where previously a record in the database was read, now the DSL document is read and parsed. Writing the record entails serializing the data to a DSL document on a disk.

When the mapping is not straightforward or when complex queries are required, more effort will be needed. The possibilities include multiple data structures mapping to a single DSL, data from multiple DSLs mapping to a single internal data structure, and complex queries that require indexes to be maintained alongside the DSL source files.

The implementation phase of the DSL can require a restructuring of the data structures and code to ensure the existing application works with the data represented in the DSL. When this operation is done carefully, it tends to improve the structure of the codebase to better match the actual business concepts described in the DSL: the level of abstraction the code works at will effectively be raised.

Key takeaways

  1. Consider the relationships between your data when designing DSLs.
  2. DSLs and families can often be implemented incrementally.
  3. Use the application user interface as an important source of requirements.
  4. Avoid letting the current code structure influence the design of the DSL.

Get in touch

Do you face challenges in the design and implementation of DSL support for your application? Then reach out to us and discover how we can help you with this process.



Header image by ThisIsEngineering on Pexels

Recent posts

Migrating to DSL data storage

by Pieter Olivier

Pieter Olivier
In this post, we discuss some of the problems and solutions of introducing domain-specific languages (DSLs) to store data in an existing application. But first, a brief recap. In the first post of this series, we explained why you might want to store some of your data in the form of text files written in a domain-specific language (DSL). Next, in the second post, we showed how some legacy applications can benefit from an intermediate solution where the data is already stored in text documents using JSON.

Read More…

Writing typecheckers is hard because a myriad of constraints must be checked. That leads to entangled and difficult-to-maintain typechecker code. TypePal is a new approach to improve upon this situation. We use it in many projects at Swat.engineering, including the typechecker of the open-source Bird DSL. Typechecking is necessary and cost-effective Typechecking is omnipresent but not always very visible. It ranges from a friendly notice from your IDE that a certain import in your program is unused to a harsh error that a function is called in the wrong way.

Read More…