PROSE Framework

Grammar syntax

This section describes the syntax of PROSE v1 DSL grammars.

Note: this syntax is volatile and subject to breaking changes in the future. We will notify you about each breaking change in accordance with SemVer.

Hereafter in the general syntax descriptions, angle brackets <> denote a placeholder to be filled with a concrete value, and curly braces {} with an opt subscript denote an optional component (unless specified otherwise).

Basic structure

The basic structure of a *.grammar file is as follows:

<Namespace usings>
<Semantics usings>
<Learner usings>

language <Name>;
<Feature declarations>

<Nonterminal symbols and rules>
<Terminal symbols>

It first specifies some metadata about the DSL, and then describes it as a grammar. A PROSE language is represented as a context-free grammar (opens in new tab) – i.e., as a set of rules, where each symbol on the left-hand side is bound to a set of possible expansions of this symbol on the right-hand side.

Usings

Namespace usings

These statements are identical to the corresponding C# forms. They import a namespace into the current scope.

using System.Text.RegularExpressions;

Semantics usings

These statements specify semantics holders – static classes that contain implementations of the grammar’s operators. There may be more than one semantics holder, as long as their members do not conflict with each other.

using semantics TestLanguage.Semantics.Holder;

Learner usings

These statements specify learning logic holders – non-static classes that inherit DomainLearningLogic and contain domain-specific helper learning logic such as witness functions (opens in new tab) and value generators. There may be more than one learning logic holder.

using learners TestLanguage.Learning.LogicHolder;

Language name

May be any valid C# full type identifier – that is, a dot-separated string where each element is a valid C# identifier.

Features

A feature (opens in new tab) is a computed property on an AST in the language. Each such property has a name, type, and associated feature calculator functions that compute the value of this property on each given AST. A feature may be declared as complete, which requires it to be defined on every possible AST kind in the language. By default, a feature is not complete. Syntax:

{@complete} feature <Type> <Name> = <Implementation class 1>, …, <Implementation class N>;

Here @complete is an optional completeness annotation, and the comma-separated identifiers on the right specify one or more classes that inherit Feature and provide implementations of the feature calculators. Notice that one feature may be implemented in multiple possible ways (e.g. the program’s RankingScore may be computed differently as LikelihoodScore or PerformanceScore, depending on the requirements), thus it is possible to specify multiple implementation classes for the same feature. A feature class does not have to be specified in the *.grammar file to properly interact with the framework. As long as it inherits Feature and holds the required feature calculator implementations, its instances may be used at runtime to compute the value of the corresponding feature on the ASTs. However, specifying it in the *.grammar file provides additional information to the dslc grammar compiler. The compiler can then verify your feature calculator definitions and provide more detailed error messages. Example:

using TestLanguage.ScoreFunctions;
@complete feature double RankingScore = LikelihoodScore, PerformanceScore, ReadabilityScore;

// alternatively:
@complete feature double RankingScore = TestLanguage.ScoreFunctions.LikelihoodScore,
                                        TestLanguage.ScoreFunctions.PerformanceScore,
                                        TestLanguage.ScoreFunctions.ReadabilityScore;

feature HashSet<int> UsedConstants = TestLanguage.UsedConstantsCalculator;

Terminal rules

Each terminal symbol of the grammar is associated with its own unique terminal rule. Terminal rules specify the leaf symbols that will be replaced with literal constants or variables in the AST. For example:

A terminal rule int k; specifies a symbol k that represents a literal integer constant.
A terminal rule @input string v; specifies a variable symbol v that contains program input data at runtime.

Syntax:

{@values[<Generator member>]} {@input} <Type> <Symbol name>;

Annotations

`@input`

Denotes the input variable passed to the DSL programs. A DSL program may depend only on a single input variable, although of an arbitrary type.

`@values`

A user can specify the list of possible values that a literal symbol can be set to. This is done with a @values[G] annotation, where G is a value generator – a reference to a user-defined static field, property, or method. The compiler will search for G in the provided learning logic holders, and will report an error if it does not find a type-compatible member. Example: given the following declaration of a terminal s:

using learners TestLanguage.Learners;
@values[StringGen] string s;

any of the following members in TestLanguage.Learners can serve as a generator for s:

namespace TestLanguage
{
	public class Learners : DomainLearningLogic
	{
		// Field
		public static string[] StringGen = {"", "42", "foobar"};

		// Property
		public static string[] StringGen => new[] {"", "42", "foobar"};
        public static string[] StringGen {
            get { return new[] {"", "42", "foobar"}; }
        }

		// Method
		public static string[] StringGen() => new[] {"", "42", "foobar"};
        public static string[] StringGen() {
            return new[] {"", "42", "foobar"};
        }
	}
}

Nonterminal rules

A nonterminal rule describes a possible production (opens in new tab) in a context-free grammar of the DSL. In contrast to conventional programming languages, the productions of PROSE grammars describe not the surface syntax of DSL programs, but their direct semantics as ASTs. In other words, where a conventional context-free grammar would specify something like

expression := atom '+' atom | atom '-' atom ;

the corresponding PROSE grammar specifies

expression := Plus(atom, atom) | Minus(atom, atom) ;

This snippet contains two nonterminal rules expression := Plus(atom, atom) and expression := Minus(atom, atom). The functions Plus and Minus are operators in the grammar – domain-specific functions that may be included as steps of your DSL programs. Thusly, PROSE DSLs do not have a syntax – they directly describe a grammar of possible domain-specific program actions.

Structure

Every nonterminal rule has a head and a body. Its head is a typed nonterminal symbol on the left-hand side of the production. Its body is a sequence of free symbols on the right-hand side, which may be nonterminal or terminal (i.e. variables or constants). There exist multiple different kinds of nonterminal rules, which differ in their semantics as well as in the roles of the symbols in their bodies. Syntax:

{@start} <Type> <Symbol name> := <Rule 1> | ... | <Rule N>;

Annotations

`@start`

An optional annotation that specifies the start symbol of the grammar – that is, the root nonterminal symbol of all programs in this DSL.