1. Extracted phrase that best supported the given tweet that exemplifies the provided sentiment.
2. Added two heads on top of the Transformer models to predict the
start/end index of the selected phrase of a tweet separately, instead
of directly predicting a span, then used Grad-CAM to visualize if the
prediction make sense.
3. Implemented LSTM, BERT, RoBERTa, ALBert and XLNet to increase
diversities of the models’ structures.
4. Used sequence bucketing to accelerate training by dynamically padding every batch to the maximum sequence
length which occurs in that batch.