Software Code Composition

Software Code Composition

The ultimate goal for the TextComposerLib library is to be used for systematically generating structured static text code for arbitrary languages and applications. Like there is no single tool that can be used to compose all kinds of art, my experience with code generation is that no single method is suitable for all situations. We almost always need to integrate several code generation methods to get the desired results.

There are several ways we create our code:

  1. We can always write the whole code ourselves. This is the way things were done 15 years ago. Now the very large software systems we create are almost impossible to write manually. We must use code generation at some point in the development. Nevertheless, the design and engineering of the code is never a task for computers. The structure, correctness, and efficiency of the produced code largely depends on the designer not the tool.
  2. Print statements are another alternative for smaller systems with few intricacies. They can be useful for things like adding simple comments to the code, unrolling fixed loops, and implementing C-like macro preprocessors. This approach saves some time but can cause very bad things for structure and correctness of the code. Every code generator will, in its final stage, depend on print statements, but we need to organize these loose prints into higher-level text generation functions, just like we encapsulated the old Goto statements into If-Then, For, and Try-Catch constructs.
  3. Template-based code generation has been a popular approach for many years now, especially within the data-base and web software design realm. When most of our code is symmetric and its main body is fixed, like HTML pages and the code for data access layers, we create templates containing missing parts that we fill depending on some data-source and configurable parameters. This approach is much better than the direct use of print statements, just like control of flow statements are much better than the Goto statements.
  4. Constructing an Abstract Syntax Tree then unparsing into code is a very powerful approach used by many. This approach can help in producing correct code, at least syntactically, for some target language. It’s usually not practical to use AST-based code generation for whole systems because we need to create the AST by creating and connecting many smaller nodes as objects. The readability of the code that performs the generation is reduced when we try to produce more and more code. Also this approach is not suited for general code generation because each target language would require its specific AST and unparser design. This approach is best suited for creating snippets or small portions of code for a specific target language, and using templates to organize the snippets within the larg body of the remaining code.
  5. Software Factories are used for producing large software systems by assembling code parts using many different tools under the control of the software designer. A software factory can be setup to produce a family of related software along with their unit tests, documentations, alternative implementations, feature selection, etc. This is the goal in mind for the TextComposerLib and all its composers.

1. Generating Code using Text Composers

The TextComposerLib contains composers that act as “structured print statements”. My intention was to create classes that imitate how I write code manually.

  • The LinearComposer is very useful in this context as it captures what I do while writing consecutive lines of code by selecting where to write the next line or how to indent the code.
  • The ListComposer and other structured composers capture the symmetry of well-formatted text in many ways. When combined with .NET’s Linq to objects, these composers can produce arrays or lists of structured text with arbitrary separators and prefix\suffix text.
  • The ParametricComposer is close to a template generator but lacks in-template processing. This composer just substitutes text in place of identified places in the template.
  • The MappingComposer is also a form of template generator that can be used in conjunction with the ITextExpression interface to add processing capabilities to text templates. We just need to add an interpreter that takes a marked text segment and parses it into a text expression tree then interpret the tree into a string to be substituted in the template. We can also use any other interpretation\transformation of the marked segments text we wish.
  • The RegionComposer is similar to the MappingComposer but works on line regions of text rather than contiguous text segments. It can be used to mark blocks of code lines for generation while the tag of its slot regions can hold additional information to guide the generated code, perhaps using an ITextExpression interpreter or any other method.
  • The FilesComposer can be used to structure generated text into code files and folders as desired with arbitrary order of creation.
  • The ProgressComposer can be used to track and generate logs of all these steps in detail for debugging and tracking purposes.

Together these composers can be used to generate highly structured, well-formatted text code files very similar to how we think while writing manual code. We can use the composers to construct larger systems like code preprocessors, code conversion systems, template processing engines, aspect weaving systems, and more.

2. Generating Code using ASTs

Like many software developers, I have a history with many general purpose programming languages like QBasic, C, C++, VB.NET, C#, and F# among others, in addition to many domain specific ones. Each language has its own specific syntax and semantics that give it the characteristics we know. Generating code in one target language using the AST-unparsing approach requires creating a specific AST for each target language and a specific unparser as well. Nevertheless, many syntactic elements are common among various languages that they can be effectively abstracted into AST nodes common to many target languages. For example most C-Like languages contain very similar forms of comments, if-else conditions, for loops, try-catch, switch-case, return, continue-break, class and structure definition, method and function declaration, and many others. The main parts of the if-else construct are the if-condition, the true-statements, and the false-statements. Here we only need to create a general AST node class for the if-else construct to serve several languages at once (C, C++, Java, C#, VB.NET, etc.) and just write the correct unparsing procedure for each specific language. This is the approach I took for creating general AST node classes under the TextComposerLib.Code.SyntaxTree namespace. All node classes implement the ISyntaxTreeElement interface, so we can add more nodes, either general to many or specific to a single target language.

ISyntaxTreeElement Interface

ISyntaxTreeElement Interface and some AST node classes

A significant portion of any language is related to expressions the language can have. Some languages, specifically functional ones like F#, deal exclusively with expressions rather than statements. The SteExpression class under the TextComposerLib.Code.SyntaxTree.Expressions namespace is used as the only representation AST node for all expressions. This class is mainly responsible for storing and manipulating the tree structure of the expression. Many kinds of expressions exist, for example literal values, variables, arithmetic operators, function calls, array element access, etc. The details of the kind of the expression is stored in the HeadSpecs member of the the SteExpression class. This member is of type ISteExpressionHeadSpecs, the main interface for holding all information about expression kinds.

SteExpression Class Diagram

For any specific target language we need a class to construct the AST. Naturally not all AST node classes implementing the ISyntaxTreeElement interface are relevant to all languages. The LanguageSyntaxFactory class under the TextComposerLib.Code.Languages namespace can be used, directly or through a derived class, to construct the AST as desired for a specific target language. The LanguageCodeGenerator class is the base for unparsing the AST. This class implements the double-dispatch dynamic visitor pattern explained here to traverse the AST and generate text based on the nodes. We can inherit a class from the LanguageCodeGenerator class to implement an unparser for a selected target language. The abstract LanguageServer class contains the members SyntaxFactory and CodeGenerator that hold the AST construction and unparsing objects. We can inherit from this class and add whatever members to serve our code generation needs for the selected target language. In addition, the simple LanguageInfo class contains some information about the target language like its name and version. We also have the ILanguageSyntaxConverter interface and the abstract LanguageExpressionConverter that implements it. The class is intended to convert a given expression from one target language to another. This was required in GMac to convert Mathematica symbolic expressions into target language computational expressions as explained in the GMacAPI Guide. All these interfaces and classes are the infrastructure for creating general target AST’s and generating structured code from them. GMac provides a good use case for this infrastructure and the full details are easily followed through its code. In time more classes and functions will be added to this infrastructure to serve more target languages and more syntax capabilities while, probably, having the same high-level design.

AST Code Generation Classes

3. Generating Code Libraries

The GMacCodeLibraryComposer class under the TextComposerLib.Code namespace is the base class for generating complex code for a large software library in a selected target language. The GMacAPI Guide explains how GMac extends this class to generate geometric computing code from GA models. This code library generator class should be inherited into an assembly line that contains components to compose our code in a very modular organized way to ease readability and debugging. The full source code of this very important class is shown here:

The members are straightforward to follow but we need to understand the correct calling sequence for the methods:

  • If we focus on the Generate() method we find that the first step is to make sure the code library generator is ready to begin the generation process by calling the VerifyReadyToGenerate() method. There we should put any checks we need on the inputs of the library generator to make sure they are sufficient for the process.
  • Next a call to the InitializeGenerator() method is done. This method initializes any parametric text templates used during code generation, clears the code files composer, and initialize any other components we need for the process.
  • Next the composeTextFilesAction delegate, passed as a parameter to the Generate() method, is executed. Here we should write code to use our components to generate the code. There is another overload of the Generate() method taking no arguments. This overload by default calls the ComposeTextFiles() method as the code composition action.
  • Finally, the code files composer is finalized to finish its work, and the FinalizeGenerator() method is called to finalize any other components used during the code generation process.

In this way, the GMacCodeLibraryComposer class provides a general framework for code generation of larger size than do the text composers and the AST-unparser methods. It’s up to the software designer to select the specific components to add to this class and how they interact with the data-source to compose the final code.

WordPress Appliance - Powered by TurnKey Linux