This project provides code for a transition-based parsing algorithm for English sentences to AMR. It is dependent on the imitation learning code provided in the hopshackle/mr-dagger repository. There are also dependencies on following libraries in the build path (precise versions not critical, but these are the ones we used):
- stanford-corenlp-3.3.1.jar
- stanford-corenlp-3.3.1-models.jar
- trove4j-3.0.3.jar
- scala-parser-combinators-2_11-1.0.3.jar
- edu.mit.jwi_2.3.0_jdk.jar
- stringmetric-core-0.25.3.jar
Currently all training uses AROW. Once built, the key additional runtime dependency is the installation of WordNet, with the WNHOME environment variable pointing to this directory so that lemmas et al can be looked up.
The main Scala class used for execution is RunDagger. A wrapper java class ScalaRunner is provided that calls this (useful for some JVMs).
A sample execution is:
nohup java -Xms1g -Xmx48g -jar ../amr-dagger-J7-semeval.jar --dagger.output.path ./ --dagger.iterations 10 --debug false --dagger.print.interval 1 --train.data ../deft-training-dev-all.txt --validation.data ../deft-p2-amr-r1-amrs-test-all.txt --test.data ../semeval_test_final.txt --lossFunction NaivePenaltyAbs --num.cores 8 --algorithm Dagger --policy.decay 0.3 --oracleLoss true --maxActions 200 --aligner Pourdamghani --arow.smoothing 100 --WXfeatures PCKX --reentrance false --reducedActions true --punctuation false --arow.iterations 3 --preferKnown false --fileCache true --instanceThreshold 1 --threshold 0.2 --minTrainingSize 30 --maxTrainingSize 120 --startingClassifier false --wikification true --average true --previousTrainingIter 2 --rolloutLimit 10 --logTrainingStats false --brownCluster ../Brown100.txt &> C226.txt &
(This example is the run used for the Semeval2016 submission for Task 8.)
The output in the specified output directory includes Smatch scores for each sentence at each iteration in the validation and training set, plus the AMR output for each. If debug is switched on, then the key log file is CollectInstances_debug_I.txt for iteration I. This contains information on the RollIn trajectory taken, plus the losses calculated for each RollOut action considered.
Also output are, for each iteration:
SmatchScores_<type>_<n>.txtAMR_prediction_<type>_<n>.txtAMR_target_<type>_<n>.txt
Where <n> is the iteration number, and <type> can be trng, val, test, instanceCollection. The first three are for the training, validation and test sets respectively using the trained classifier. The last is from the trajectory generation stage that is used to create training data.
The key F-Scores are obtained by grep on the raw console output. ("Training" and "Validation" are useful searches.)
Key (non-obvious) parameters:
lossFunctioncan be any combination ofNaiveSmatch, Smatch, Penalty, Abs, suitable concatenated in that orderoracleLoss truemeans thatlossFunctionis ignored completely, no RollOuts take place and simple 0-1 oracle loss is used. This is equivalent to Classic Dagger.reentranceoftruewill switch on Phase 2 of the algorithm, and consider re-entrant arcswikificationdefaults totrue. It enables the Wikification action.reducedActionsdefaults tofalse. Set totrueto just consider the actions chosen by the expert and currently trained classifier at each step. As well as the bets option from the trained classifier, any other which are evaluated as being withinthresholdof the best option are also included as possible options.thresholdis used withreducedActions trueto determine which actions are RolledOut. Any action with a score from the current classifier withinthresholdof the best score will be rolled out (in addition to the expert action). If no classifier has yet been trained - so in the first iteration - a random number of actions are taken (defined byrolloutLimit).rollOutLimitdefaults to 100, and is used withreducedActions true. It sets an upper bound on the number of actions that generate RollOuts (the highest scoring actions are taken until the bound is reached).WXFeaturesis any concatenation of the charactersPfor parent-features,Afor action-features,Cfor child-features,Xfor deletion features,Wfor word-features, andSfor sibling featuresmaxActionsis a setting to prevent RollOuts getting into infinite loops and never terminating (or finite loops that are non-productive)algorithmcan be any ofDagger, LOLS, LOLSDet, LIDO, DILO, DILDOclassifierdefaults toAROW. Other options arePA(Passive Aggressive) andPERCEPTRON.initialExpertProbandpolicy.decayprovide the parameterisation for expert involvement. The one unavoidable override to these is that the expert is always used on the RollOuts with probability 1.0 in the first iteration.debug truewill output details of RollIn trajectories and the losses calculated for each RollOut option considered in a set of files startingCollectInstances_debug_.detaildefaults tofalse. If set totrueand alsodebug true, then additional excruciating detail is switched on detailing full RollOout trajectories during instance collection.prelimOracleRunwill execute a single pass on the training data using the expert to train a policy by simple imitation learning using 0-1 oracle loss. This is then used as the starting policy for the main run.startingClassifierdefaults tofalse. Iftruethen rather than start with an empty classifier in the first iteration, one is initialised from two files put into the location specified bydagger.output.path; a StartingClassifier.txt and a FeatureIndex.txt.fileCacheset totruewill store training instances on disk rather than in RAM. Recommended.expertHorizondefaults tofalse. If set totrue, then theexpertAfterandexpertHorizonIncoptions are enabledexpertAfterwill RollOut with the specified algorithm for the number of steps specified, and then use the expert 100% of the time.expertHorizonIncwill increment theexpertAfterparameter by the specified amount at each iteration after the first.minTrainingSizedefaults to 100. All training examples with this numner of AMR nodes or fewer will be included in the first iteration of training.trainingSizeIncdefaults to 10. This number will be added tominTrainingSizeafter each iteration to determine the maximum size of training sentences to include, as measured in number of AMR nodes.maxTrainingSizedefault to 100. This sets an upper ceiling on the size of training sentences to use.instanceThresholdspecifies the alpha-bound to use in training. After this number of mis-classifications, the instance will be discarded from further training.coachingLambdawill be used to multiply the score from the learned classifier for an action, which will be added to the loss calculated during training.instanceThresholdspecifies the alpha-bound to use in training. After this number of mis-classifications, the instance will be discarded from further training.logTrainingStatsdefaults totrue, and will run the trained classifier on the full training set at each iteration. To save time with large training sets, set this tofalse.dropClassifierdefaults tofalse. Is set totruethen the trained classifier is dropped after each iteration - so the next one is retrained from scratch. The default is to start from the current classifier, and train it further with all collected data.previousTrainingIterdefaults to 100. This will use only training instances from the specified number of previous DAgger iterations. Setting this to 0 will therefore only use the current iteration's data. This option only works withfileCache true.actionsPerSizedefaults to 5. This overridesmaxActionson the training data to speed up early iterations. With the default setting, the number of AMR nodes is multiplied by 5, and this is used in place ofmaxActionsif it is smaller.samplesdefaults to 1, and specified the number of RollOuts to generate for each action. This is relevant for v-DAgger, due to the stochasticity of the RollOuts. The final score is taken as an average of all samples.num.coresdefaults to 1, and sets the number of cores that the instance collection can be split over. Trajectory generation / data collection benefits from this, but the batch classifier training is always single core.random.seeddefaults to 1. Any other random seed can be set.rare.feat.countdefaults to 0. If set higher, then any features with this many or fewer occurrences in the data will not be included in training.shuffledefaults tofalse. If switched on, then all instances will be shuffled before training takes place. Do not use withfileCache true, as this will load everything into memory and explodinate your machine.preferKnowndefaults tofalse. With this setting the expert will preferentially useVERB,LEMMAandWORDparameters for naming nodes. This ensures there is training data that can generalise to unseen AMR concepts that happen to match english words. If set tofalse, then the expert will always use the full AMR concept name.brownClusterspecified the file that specifies brown clusters to be used in features. If this option is not provided, then brown cluster features will be switched off.alignerdefaults toJAMR, which uses the alignment code from the original JAMR system. Other values areimproved, which uses another set of heuristics (which work OK on newswire data, but are dreadful on other data - see code for detail), andPourdamghani, which uses the Pourdamghani alignment files included in the LDC corpus distribution. In this last case, the relevant file needs to be present as a .txt_tok file, where matches the name of the training file.nameConstraintsdefault tofalse. If set, this applies soe AMR domain knowledge that enforce the transition system (mostly) to givenamenodes that only haveopNas outgoing AMR relations, and that all the children ofnamenodes are leaves in the AMR graph, with string format.
Output files that define the classifier are:
FinalClassifier.txtFeatureIndex.txtClassifier_<n>.txt, where<n>is the iteration number.
Once saved a classifier can be run over new data using RunClassifier.scala, or the java wrapper ClassifierRunner.java. These have the same parameter settings as above, although most are not used. The key ones are:
dagger.output.pathas abovetrain.dataneeds to specify the original file of training data used to train the classifiertest.datashould be a semi-colon delimited list of AMR test files to be processed.aligner, WXfeatures, reentrance, wikification, brownCluster, preferKnownneed to have the same values as used to train the original classifier (unfortunately the classifier file does not record these currently)featureIndexspecifies the location of the FeatureIndex.txt fileclassifierspecifies the location of the Classifier.txt file
This should then generate AMR_prediction_base_<n>.txt for each file specified in test.data, where <n> is just the order the files were listed.