You are here

Reconstructing large-scale hierarchical structures

Team and supervisors
Department / Team: 
Team Web Site: 
http://www-druid.irisa.fr/
PhD Director
David Gross-Amblard
Co-director(s), co-supervisor(s)
Zoltan Miklos
Mickaël Foursov
Contact(s)
NameEmail addressPhone Number
Zoltan Miklos
zoltan.miklos@irisa.fr
0299842254
PhD subject
Abstract

Many of the networks that one can identify in connection with human societies show a hierarchical structure [1,2]. For example scientific terms can naturally organized in hierarchies according to the disciplines and their sub-domains. Reconstructing these hierarchies from the underlying data and understanding their properties can lead to new insights, while the hierarchical structures also have a number of applications. For example, image classification can give more robust results if the classification tasks were related to large-scale semantic taxonomies [3]. Understanding the properties of these hierarchical structures is also of high interest [4].

The hierarchy reconstruction is challenging problem. While tree structures are widely used, in several application domains the hierarchies are better represented as directed acyclic graphs (that correspond to overlapping implicit structures), and not as trees. In particular if we wish to understand the organizational properties, limiting the structure leads to approximate observations.

The candidate should develop techniques that can reconstruct large-scale hierarchical structures.  The thesis will focus on some of the following aspects.

  • Analyze the application specific requirements for hierarchical structures (i.e. whether one should consider trees, or some other hierarchical structure, with potentially with some additional of constrains) and define specific quality metrics.
  • Design appropriate data structures and methods to store hierarchical structures. We need a specific data structure that enables querying specific parts of the large-scale hierarchy and also realizing a zoomable visualization.
  • Embeddings in hyperbolical space [5,6] were successfully used for analyzing hierarchical structures. While hyperbolical embeddings performed well in general setting, they might need some specific task-specific adaptations.   
  • Recent breakthroughs in NLP research on word embeddings could offer useful tools for our work. In particular ELMO [7], BERT [8] can offer substantially improved embeddings that also eliminate polysemy-related problems. Besides being a direct tool, they also offer a methodological approach for constructing hierarchies.
  • Our goal is to reconstruct the hierarchical structure of scientific domains, exploiting a collection of scientific articles. This is a work complementary to the ANR EPIQUE project where our group collaborates with experts in social sciences (philosophy of science) and in complex systems.
  • We would like to reconstruct hierarchical structures in other domains. In particular our results show that hierarchical skill models can be exploited for improving the task assignment quality for crowdsourcing [9], but constructing such skill hierarchies remains challenging.

 

Bibliography

[1] Anna Zafeiris, Tamás Vicsek. Why we live in hierarchies ? Springer, 2018.

[2] Aaron Clauset, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the prediction of missing links in networks. Nature. , 453(7191):98–101, 2008.

[3] Jia, Y., Darell, T. Latent task adaptation with large-scale hierarchies. ICCV’13, pp. 2080-2087, 2013.

[4] Gergely Palla, Gergely Tibély, Enys Mones, Péter Pollner, Tamás Vicsek.  Hierarchical networks of scientific journals. Polgrave communications 1 15016 (2015)

[5] Maximilian Nickel, Douwe Kiela. Poicaré embeddings for learning hierarchical representations. NIPS, 2017.

[6] Christopher De Sa, Albert Gu, Christopher Re, Frederic Sala. Representation Tradeoffs for hyperbolic embeddings. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4460-4469, 2018.

[7] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 2227-2237, 2018

[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, 2018.

[9] Panagiotis Mavridis, David Gross-Amblard, Zoltan Miklos. Using Hierarchical Skills for Optimized Task Assignment in Knowledge-intensive Crowdsourcing. 25th International World Wide Web Conference (WWW 2016)

Work start date: 
septembre 2019
Keywords: 
knowledge extraction, text mining, hierarchical models, hyperbolical embeddings, deep learning
Place: 
IRISA - Campus universitaire de Beaulieu, Rennes