Fast_Align Algorithm Aligner

This is an algorithmic aligner based on the paper Aligning English Strings with Abstract Meaning Representation Graphs. The code is based on the ISI aligner code. A copy of that project can be found here. The project makes use of original pre/post-processing code but replaces the use of the mgiza app with fast_align. The bash scripts have been converted to python and a new "inference" step allows for pre-trained parameters to be used during run-time operation.

To use the code you will need to install and compile the C++ code for fast_align. The compile process will produce binaries for fast_align and atools in the same directory. Put these in your path or you can set the environment variable FABIN_DIR to their directory. The aligner/fast_align binaries work under both Windows and Linux.

The aligner comes with pre-trained parameters that are included in a tar.gz file in the project. The first time the aligner is run, it will un-tar the files in amrlib/data/model_aligner_faa/.

If you'd like to train, or just test the aligner, see the scripts in the FAA_Aligner scripts directory You can run these in order to create a new model and test it. Each script will complete in just a few seconds. Note that the scripts are setup to use LDC2014T12 (AMR-1), since these are what the test hand-alignments are made from.

Usage

To use the aligner you should have a list of sentences and an amr graphs, in string format.

Example aligner usage

from amrlib.alignments.faa_aligner import FAA_Aligner
inference = FAA_Aligner()
amr_surface_aligns, alignment_strings = inference.align_sents(sents, graph_strings)
print(alignment_strings)

The code returns the original amr graphs with surface alignments added and a list of alignment strings in ISI (not JAMR) format.

!! Note that the input sents need to be space tokenized strings.

Performance

Score of the FAA_Aligner against the gold ISI hand alignments for LDC2014T12 **1

Dev scores    Precision: 89.30   Recall: 78.20   F1: 83.38
Test scores   Precision: 86.03   Recall: 79.00   F1: 82.37

**1 Note that these scores are obtained during training. When scoring with only the test/dev sets and using pre-trained parameters, the scores vary slightly (less than 0.5) from the original.