Dear all,
Please find below a summary of an available PhD position in computer science/artificial intelligence/NLP in Nancy.
Best regards,
—
Christophe Cerisara
CR CNRS équipe Synalp
Ph.D position available at LORIA laboratory in Nancy, France, in the team Synalp on deep learning for NLP; starting as soon as possible.
Title: Weakly supervised deep learning for natural language processing
This thesis will study and propose novel weakly-supervised deep learning models and training methods and their application to Natural Language Processing (NLP) tasks. The focus of the thesis will be
on weak supervision, i.e., on desining novel training approaches that can capture generic and transferable information from raw data sources. This challenge indeed constitutes one of the main
bottleneck of current and future deep learning methods, as manually annotated datasets are always too scarce and inevitably outdated and not representative of contemporary data any more even after a
short amount of time. Most successful deep learning models hence rely on one of the standard approaches to compensate for this lack of data: data augmentation and transfer learning in image
processing, training generic representations based on embeddings in NLP, or more generally building generative models like Variational and Generative Adversarial Networks that give access to
relatively generic models of data. Several other classical approaches complete this list, including unsupervised and semi-supervised training, multi-task learning, co-training, few-shot learning, dual
learning, etc.
Amongst this variety of methods, the thesis will focus on designing models that extract generic information from sparsely annotated language data. A potentially promising direction of research will be
to adapt a novel unsupervised approximation of the classifier risk to achieve few-shot training of deep neural networks. Another privileged approach will deal with designing more complex embeddings
models that include very-long-term memory to build up deeper contextual reading models. These models shall be applied and evaluated on standard NLP tasks as well as for anomaly detection in large
unlabeled language data streams.
Contact: Christophe Cerisara, cerisara@loria.fr