K. H. Dhanyalakshmi, Mahantesha B. N. Naika, R. S. Sajeevan, Oommen K. Mathew, K. Mohamed Sha, Ramanathan Sowdhamini, and Karaba N. Nataraja. An approach to function annotation for proteins of unknown function (pufs) in the transcriptome of indian mulberry. PLOS ONE, 2016.
 Thomas D. Niehaus, Antje M.K. Thamm, Valérie de Crécy-Lagard, and Andrew D. Hanson. Proteins of unknown biochemical function: A persistent problem and a roadmap to help overcome it. Plant Physiology, 2015.
 Mingcong Wang, Maxim V. Kapralov, and Maria Anisimova. Coevolution of amino acid residues in the key photosynthetic enzyme rubisco. BMC Evolutionary Biology, 2011.
 François Coste. Learning the language of biological sequences. In Topics in Grammatical Inference. Springer-Verlag, 2016.
 François Coste and Goulven Kerbellec. A similar fragments merging approach to learn automata on proteins. In 16th European Conference on Machine Learning, 2005.
 Goulven Kerbellec. Learning automata modelling families of protein sequences. PhD thesis, Université Rennes 1, June 2008.
 François Coste, Gaelle Garet, and Jacques Nicolas. A bottom-up efficient algorithm learning substitutable languages from positive examples. 12th International Conference on Grammatical Inference, 2014.
 Gaëlle Garet. Classification and characterization of enzymatic families with formal methods. PhD thesis, Université Rennes 1, December 2014.
 Faruck Morcos, Terence Hwa, José N. Onuchic, and Martin Weigt. Direct coupling analysis for protein contact prediction, Springer New York, 2014.
 Clovis Galiez. Structural fragments : comparison, predictability from the sequence and application to the identication of viral structural proteins. PhD thesis, Université Rennes 1, December 2015.