Representation of uncertainty and imprecision between clusters with belief functions

Defense type
Starting date
End date
IRISA Rennes
Zuowei Zhang (DRUID)

Clustering  aims  to  divide  massive  data  without  any  prior  information  into  groups with similar data structures or physical structures.  To derive such a proposal, several different clustering strategies have been proposed, such as  partition  based  methods, hierarchical based methods, distribution based methods, spectral based methods, den- sity based methods, grid based methods, machine learning based methods, and neural networks based methods.  But most of the  methods cannot characterize the uncertainty and imprecision between clusters at the same time.  In recent years, evidential clustering (EC),  based on the concept of credal  partition,  has  received a lot of  attention  for its  ability  to  deal  with  this  problem.   It  inherits the  advantages  of  the  theory  of  be- lief functions (TBF) in reasoning about uncertain and imprecise knowledge. However, since EC is still in the development stage, there are still issues such as unclear under-lying  concepts, high computational complexity, and inability to

detect imbalanced or arbitrary clusters, limiting the applications of EC.

In this thesis, we work on proposing some alternative schemes to solve these issues. Specifically, four works are brought forward to handle them one by one. We start from a systematic study of EC. In this work, we present the concepts and definitions of data (inputs), methods (models), and clusters (results) of EC, as well as that of the other types (i.e.  hard/fuzzy/possibilistic ones), based on different theories (i.e.   probability  theory, fuzzy  sets  theory,  possibility  theory,  and  the  theory  of  belief functions). This is because EC is regarded as the evidential version of hard, fuzzy, and possibilistic clustering under the TBF. These concepts and definitions explain why EC can better characterize the uncertainty and imprecision between clusters. Moreover, we also study EC from the seminal to state-of-the-art methods in the context of data-based inputs,  resulting  in  a  coherent  and  comprehensive survey  to  analyze  these  methods. Based  on  the  analysis  of  some  representative  techniques from  different  perspectives (e.g.  center, measure type, complexity), we provide a guiding scheme to help users to choose appropriate evidential methods in their cases.

Afterward,  we  introduce  a  dynamic  evidential  clustering  (DEC)  algorithm  for  the issue of the high computational complexity of traditional EC when characterizing the uncertainty  and imprecision  between  clusters. In DEC, most  query objects are considered to  have precise  cluster  information,  so  an  FCM-like  objective  function  is  first employed  and minimized  to  obtain  the  support  levels  of  the  real  singletons  (specific) clusters  to which  the  query  objects  belong.   Then  the  query  object  is  initially  adap- tively assigned  to  the  outlier,  precise  or  imprecise  one  via  a  new  rule  based  on  the conflicts  between  the  different  support  levels.  Each  imprecise  object  is  finally  reassigned to the singleton clusters or related meta-cluster by partial credal redistribution with  the corresponding  dynamic  edited  framework  to  reduce  the  computational  burden.  The proposed DEC method can reduce the complexity to a level similar to that of fuzzy/possibilistic clustering, extending EC’s application in big data.

Following,  we  extend  EC  to  detect  imbalanced  clusters  by  combining  mean  shift with traditional EC under the TBF, mainly containing two characteristics.  First,  the query object is preliminarily assigned as the noise,  precise,  or imprecise one based on the notion of “belief shift”. Second, partial credal redistribution with dynamic cluster centers,  to avoid the “uniform effect” (imbalanced clusters),  is established to reassign imprecise  objects  to  the  singleton cluster  or  related  meta-cluster. Once  an  object  is assigned to a meta-cluster, it indicates that the imbalanced singleton clusters involved in the meta-cluster cannot be distinguished because this object may be located in the overlapping  or  intermediate  areas  of  these  imbalanced singleton  clusters.  By  doing this,  the  BSC  can reasonably  characterize  the  uncertainty and  imprecision  between imbalanced singleton clusters.

To avoid losing generality, we also investigate the representation of uncertainty and imprecision between  clusters  regardless  of  their  shape,  size,  and  dimensionality  based on  density peaks  and  the  TBF.  First,  we  consider  that  different  neighbors  can  provide complementary evidence supporting the object as a cluster center and redefine a distance-based density  function  to  obtain  more  robust  cluster  centers  in  the  decision graph. Then,  we present  a  new  evidential  convergence  rule  to  assign  the  remaining objects  to  different clusters. Finally,  similar  to  BSC,  the  objects  located  in  the  over- lapping  or intermediate  areas  of  different  arbitrary  singleton  clusters  are  assigned  to corresponding meta-clusters  to  characterize  the  uncertainty  and  imprecision  between these arbitrary clusters.

The effectiveness of the proposed algorithms is estimated on different artificial and natural datasets. Experiments show that our proposed algorithms effectively improve the execution efficiency of traditional EC and detect imbalanced or arbitrary clusters, and characterizes the uncertainty and imprecision between these clusters.

Composition of the jury
- ELOUEDI Zied, Institut Supérieur de Gestion de Tunis,
- DENOEUX Thierry, Université Technologique de Compiègne
- VRAIN Christelle, Université d'Orléans
- HAN Deqiang, Xi’an Jiaotong University
- MARTIN Arnaud, Université de Rennes 1 IUT de Lannion , directeur de thèse
- ZHOU Kuang, Northwestern Polytechnical University
- LIU Zhunga, Northwestern Polytechnical University