Design and implement a dynamically typed programming language yourself

Estimated read time 7 min read

background

In the past few years, I mainly worked on things related to “compilation optimization” (hot repair framework, bytecode hook framework, bytecode optimization, etc.). However, strictly speaking, these things have nothing to do with “compilation”, they are just familiarity. I am relatively familiar with Java bytecode and the APIs of libraries such as asm and javassist, but calling it “compilation optimization” sounds a bit fancy 😂

At that time, I also wanted to write a real “compiler”. I read some famous books such as “Dragon Book” and “Compiler Design”. I found that it seemed very difficult to even implement a lexical analyzer, so I immediately gave up. .

Later, due to business needs, a javac plug-in was developed: A simple application of the javac compiler plug-in in code migration . The plug-in itself is relatively simple to implement. When I was doing this, I took a look at the javac code and it didn’t seem too complicated. So I imitated him and wrote my own Java compiler and implemented part of the syntax, but I didn’t insist on writing it down. Part of the reason was that my implementation plan at the time could not implement mechanisms such as apt.

Although only a small part has been achieved, the benefits to me are still considerable, such as:

  1. This is the first handwritten lexical analyzer and syntax analyzer I have implemented, which can be regarded as the first step in compiler development.
  2. In addition, I also discovered a Java syntax that I had not noticed for many years: when returning a multi-dimensional array in Java, the dimensions of the array can be written on the left and right sides of the function declaration. The dimension is the sum of both sides, so the following three statements They are equivalent and both return a two-dimensional array. Although I know this is of little value, I might never have known this if I hadn’t tried to write this little compiler.
private static int[][] f() {...}
private static int[] f()[] {...}
private static int f()[][] {...}

I have been doing some stability-related things recently. Sometimes I need to use hooks to locate or fix problems. I have to look at the code of the virtual machine. Regardless of various optimizations, it does not seem to be particularly complicated in terms of interpretation and execution. It is approaching five years ago. During the vacation, I wanted to implement a dynamically typed programming language so that I could experience compiler and virtual machine development, so I came up with this project – charon

charon

The features currently implemented are as follows:

  1. Supported types: bool, long, double, string, function, class
  2. Functions can be assigned to variables, class fields, as parameters or return values ​​(method is different from function, it is not a first-class type, cannot be assigned to variables, class fields, nor can it be used as a parameter or return value of a function or method)
  3. Support common language structures, such as: if-elseif-else, while-break-continue
  4. Supports defining classes and methods. Fields of the current instance can be accessed through ‘this’ in the method.
  5. Support simple ffimechanisms for doing charonthings that cannot be done, such as printing output: __print, __println
  • For a description of the EBNF-like grammar, please refer to: grammar.txt
  • For bytecode format & bytecode instructions, please refer to: bytecode-format.txt

The specific syntax & semantics are expected to be mentioned later in the article “Grammar Introduction & Grammar Analyzer Implementation”. For example, it is similar to rust in charon: variables can be redefined, assignments are statements rather than expressions, if’s then’ is a block rather than a general statement, etc.

Code repository & build

Construct:

  1. You need to install rust first. If you have not installed it before, you can install it according to the instructions on the rust official website.
  2. Clone the code repository
  3. cd to the charon project root directory
  4. implement:cargo build --release

After completing the above steps, target/release3 executable programs will be generated in the directory:

  1. charonc: charon’s compiler will compile the code into bytecode. The file suffix is: .charonbc. The bytecode format is similar to Java bytecode.
  2. charonpcharonc: The bytecode file generated by disassembly .charonbc, the function is similar to javap
  3. charon: Virtual machine executable program, charonsource code can be passed in, or charoncbytecode generated by compilation can be passed in

code example

There is one in the project root directory examples, and there are some sample codes for reference.

The following uses charonbuilding a binary tree and performing in-order traversal as an example:

#!/usr/bin/env charon

// 调用createBinaryTree创建一棵二叉树,然后通过inOrder进行中序遍历
inOrder(createBinaryTree());

class Node {}

//              10
//            /    \
//           6      14
//         /  \    /  \
//        4    8  12  16
//
func createBinaryTree() {
    var left = Node();
    left.value = 4;

    var right = Node();
    right.value = 8;

    var l = Node();
    l.value = 6;
    l.left = left;
    l.right = right;

    var left = Node();
    left.value = 12;

    var right = Node();
    right.value = 16;

    var r = Node();
    r.value = 14;
    r.left = left;
    r.right = right;

    var root = Node();
    root.value = 10;
    root.left = l;
    root.right = r;

    return root;
}

func inOrder(root) {
    if (root.left) {
        inOrder(root.left);
    }
    __println(root.value);
    if (root.right) {
        inOrder(root.right);
    }
}

The above code example can be executed in the following ways:

  1. Directly call the virtual machine to execute:
    • charon binary-tree.charon
  2. First compiled into bytecode, and then executed by the virtual machine
    • charonc binary-tree.charon
    • charon binary-tree.charonbc
  3. charonIt is supported shebang, so you can add executable permissions to the source code file and then execute it directly.
    • chmod +x binary-tree.charon
    • ./binary-tree.charon

In addition, you can see from the above code:

  1. Functions can be used without having to declare them in advance
  2. Variables can be redefined
  3. Like many dynamic languages, class fields do not have to be defined in the class body.
  4. For unassigned class fields, you will get null when reading
  5. Values ​​like null are implicitly converted to false when used as a bool expression

Follow-up plan

  • Later, we plan to compile several related documents in order to deepen our understanding of compiler & virtual machine related technologies. It is expected that the following articles will be added:
    1. Lexical introduction & implementation of lexical analyzer
    2. Syntax introduction & implementation of syntax analyzer
    3. Semantic analysis, bytecode design & bytecode generation (charon is based on stack, and will also be briefly compared with register-based implementation)
    4. The implementation of the virtual machine (such as the layout of the stack frame, the support for method ‘this’ is not the same as Java, and the support of ffi, which will be briefly discussed later)
    5. Error handling in compilers & virtual machines (line number and column number information processing during compilation, discussion on error recovery, stack overflow detection at runtime, etc.)
  • Although it is a toy project, he still learned some things while developing it: For example, I always thought I was familiar with Java bytecode, but is information like max_locals necessary, and how to use the virtual machine? How does the virtual machine lay out the stack frame when calling methods and external methods? Wait, I didn’t really know before this. However, this language is still too small, and I won’t really use it, so even if gc and other mechanisms are added, there may not be a test case to test it. If I have time later, I may consider learning to write a simple jvm😂

You May Also Like

More From Author

+ There are no comments

Add yours