Information Sciences Institute Aligner

This is an algorithmic aligner based on the paper Aligning English Strings with Abstract Meaning Representation Graphs. The code is a python`ized version of the ISI aligner code, which is a bunch of bash scripts and a few c++ files. A copy of that project can be found here.

Due to the complexity of the alignment process and the underlying mgiza aligner, the code is not setup to be used as part of the library for inference. If you are doing simple inference, it's recommended that you use the faa aligner. If you want to use this code, expect to need to modify the scripts a bit and customize it for your use case, as this is not setup for ease of use.

The ISI alignment code is included here because this is the aligner that has been commonly used with AMR and, I believe, the aligner used to create alignments for LDC2020T02. It also performs slightly better than the FAA aligner (see performance at the bottom)

To use the code you will need to install and compile mgiza.

Note that the main alignment process is a bash script so this will not run under Windows, though it could be converted if someone wanted to put in the effort.

Usage

There are no library calls associated with the aligner. All of the code is in the scripts directory under the ISI Aligner. These scripts are simply run in order to conduct the alignment and scoring process. You will need a copy of LDC2014T12 to run the code, although it could easily be modified to run on other versions. For scoring, the original AMR 1.0 corpus is required as the gold alignments are tied to these graphs.

Directories and file locations are generally setup in each script under the __main__ statement. Note that you will need to set the location of the mgiza binaries at the top of the bash script Run_Aligner.sh

Unlike neural net models, the mgiza aligner doesn't natively separate training and inference into two distinct steps. Training and alignment all happen as part of the same process. While it is possible to re-use the pretrained tables to do inference, the scores generally drop a few points (possibly because it resumes training on the smaller inference dataset) and the code here is not setup to do inference.

If you would like to align your own sentences / graphs, I would recommend modifying the script Gather_LDC.py and having the code append them on to the sents.txt and gstrings.txt files created by the script. The alignments can then be extracted from the end of the amr_alignment_strings.txt file after running all all steps (scripts) of the process.

Performance

Score of the ISI_Aligner against the gold ISI hand alignments for LDC2014T12

Dev scores    Precision: 93.78   Recall: 80.30   F1: 86.52
Test scores   Precision: 92.05   Recall: 76.64   F1: 83.64

Scores here resemble the scores from the original paper within normal run-to-run variation of ~0.5 points.

These scores are obtained during training. When scoring with only the test/dev sets and using pre-trained parameters, the scores drop around 2-3 points.