PROSE group header - blue background

PROSE – Json Extraction

The main entry point is Session class’s Learn() method, which returns a Program object. The Program’s key method is Run() that executes the program on an input Json to obtain the extracted output. Each program also has a Schema property that defines the structure of the extracted data.

Other important methods are Serialize() and Deserialize() to serialize and deserialize Program object.

To use Extraction.Json, one needs to reference:

Microsoft.ProgramSynthesis.Extraction.Json.dll, Microsoft.ProgramSynthesis.Extraction.Json.Learner.dll
and Microsoft.ProgramSynthesis.Extraction.Json.Semantics.dll.

The Sample Project (opens in new tab) illustrates our API usage.

Basic Usage

By default, Extraction.Json learns a join program in which inner arrays are joined with other fields. As a result, an outer object in the input Json can be flattened into several rows in the output table.

The below snippet illustrates a learning session to generate such program from the input jsonText:

string jsonText = ... 
var session = new Session(); 
session.Inputs.Add(jsonText); 
Program program = session.Learn(); 

Clients may add NoJoinInnerArrays constraint to the session to learn non-join programs, as illustrated in the following snippet:

var noJoinSession = new Session();
session.Inputs.Add(jsonText); 
noJoinSession.Constraints.Add(new NoJoinInnerArrays());
Program noJoinProgram = noJoinSession.Learn();

Serializing/Deserializing a Program

The Extraction.Json.Program.Serialize() method serializes the learned program to a string. The Extraction.Json.Loader.Instance.Load() method deserializes the program text to a program.

// program was learned previously
string progText = program.Serialize();
Program loadProg = Loader.Instance.Load(progText);

Executing a Program

Given an input Json, a program can generate a hierarchical tree or a flattened table. If the program is a join program, the table is flattened either using outer join (default) or inner join semantics.

Generating a Tree

Use this method to obtain a hierarchical tree of the input document.

// program was learned previously
ITreeOutput tree = program.Run(jsonText);

Generating a Table

Supply the desired join semantics to the RunTable() method as follows:

// program was learned previously

IEnumerable outerJoinTable = program.RunTable(jsonText, TreeToTableSemantics.OuterJoin);

IEnumerable innerJoinTable = program.RunTable(jsonText, TreeToTableSemantics.InnerJoin);