Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

We propose a stochastic version of Anderson mixing with theoretical guarantees and promising results in training neural networks. (Read More)

Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision

We propose a self-supervised CWS approach with a straightforward and effective architecture, which outperforms previous methods on 9 different CWS datasets. (Read More)

Self-Supervised Quality Estimation for Machine Translation

We propose a simple self-supervised method for quality estimation, which outperforms several previous unsupervised methods. (Read More)

TRICE: Gradual Finetuning for Multi-Source Sequence Generation

We propose TRICE, a task-agnostic Transferring fRamework for multI-sourCe sEquence generation. The transferring process is divided into two stages in the manner of gradual finetuning, which achieves state-of-the-art results on several challenging tasks. (Read More)