Towards Robust Machine Translation Evaluation with Neural Metrics