Project highlights
· The project aims to develop an innovative AI based method to achieve automatic identification of herbarium specimens
· The developed system has the potential to speed up taxonomic progress on a mega-diverse plant group of the southeast Asian rainforests
· This method could then be applied across other plant groups, helping to document the world’s plant diversity including the identification of new species.
Overview
The United Nations Sustainable Development Goal 15 ‘Life on Land’ (UN, 2022) strives to protect terrestrial ecosystems and halt biodiversity loss. Documenting the world’s plant diversity through taxonomic publications is key to its conservation: without understanding which species are present, it is not possible to protect them or to evaluate their potential significance (Cheek et al., 2020). In the worldwide network of over 3000 herbaria, millions of herbarium specimens are preserved records of plant diversity (Heberling et al., 2019). It is these specimens, already in herbaria, that represent >50% of yet-to-be described species (Bebber et al. 2010). However, the accurate identification of these specimens is a time-consuming process, due to the large volume of specimens, challenges in taxonomically difficult groups and a decreasing number of experts. Increasing the speed that taxonomic outputs are produced is key to acting against the threats to the world’s habitats.
By developing artificial intelligence methods to automatically identify specimens, this project aims to accelerate taxonomic efforts in the genus Cyrtandra (African violet family), a mega-diverse, poorly-known group, common in Southeast Asian rain forests (Atkins et al., 2021). More specifically, it will develop a hierarchical framework that consists of a cascade network to address classification at different levels and a meta-learning strategy to solve extreme challenges where only one sample is available. The cascade networks can make use of the knowledge of species, genus, family and other higher taxonomic levels to improve identification accuracy. Figure 1 shows an example of a herbarium specimen to be classified. Specimen data at the species level is often imbalanced, i.e one class label might just have one observation and the other might have a very high number of observations. Directly training deep learning models on such kinds of datasets will result in overfitting. To overcome this challenge, the meta-learning strategy (Snell et al. 2017) will be explored to improve the accuracy of species-level recognition.
This project has the following specific objectives:
- Develop a cascade multi-label deep architecture which can take the prior knowledge of a given class hierarchy and key information of herbaria into account when performing identification at different taxon levels.
- Explore effective meta-learning and fine-tuning methods to improve the identification performance on taxonomically unbalanced datasets
- Build an easy-to-use software system to classify herbarium images into different labels with confidence values and speed up the discovery of new species.
Figure 1: An example of a herbarium specimen of a Cyrtandra spp.
CENTA Flagship
This is a CENTA Flagship Project
Host
Loughborough UniversityTheme
- Climate and Environmental Sustainability
- Organisms and Ecosystems
Supervisors
Project investigator
Haibin Cai ([email protected])
Co-investigators
Stephanos Theodossiades ([email protected])
Gemma Bramley, Royal Botanic Gardens Kew ([email protected])
Hannah Atkins, Royal Botanic Gardens Edinburgh ([email protected])
How to apply
- Each host has a slightly different application process.
Find out how to apply for this studentship. - All applications must include the CENTA application form. Choose your application route
Methodology
Herbarium recognition is very challenging due to the high similarity of the different morphologies and a limited number of data samples. The information on genera, and family of higher taxon levels in herbaria is useful for human experts when doing identification, however, it is evident that most state-of-the-art deep learning models do not make use of this prior knowledge (Hussein et al. 2022). Considering that a single model has difficulty in recognizing the exact species, this project proposes to use a cascade of networks to make use of different levels of the class information to help improve the recognition performance. To deal with the limited sample problem in species-level recognition, we will use the meta-learning strategies to equip the final model with the ability to calculate the similarities between a given sample image and a new image.
Training and skills
Students will be awarded CENTA2 Training Credits (CTCs) for participation in CENTA2-provided and ‘free choice’ external training. One CTC equates to 1⁄2 day session and students must accrue 100 CTCs across the three years of their PhD.
The PhD candidate will have the unique opportunity to work collaboratively with the Loughborough AI research team in Computer Science for three years. They will have full access to specialist resources including laboratories, and High-Performance Computing facilities at Loughborough University and the expertise available within the Royal Botanic Gardens Kew team. The research will also provide opportunities to attend academic conferences, summer schools, and other training courses to improve technical skills. The candidate will master advanced deep learning techniques and have excellent career prospects on the successful completion of the PhD.
Partners and collaboration
Co-supervision will be by Dr Gemma Bramley (Royal Botanic Gardens Kew) and Dr Hannah Atkins (Royal Botanic Garden Edinburgh [RBGE]), taxonomic experts on the study genus, Cyrtandra. Kew and RBGE house major herbaria (RBG Kew, 2022; RBGE 2022), between them holding the world’s largest number of herbarium collections of Cyrtandra, making them key to this project’s success. Access to the facilities at both sites will be open for the duration of the project although the student should expect to spend most of their time in Loughborough.
Further details
For further information about this project, please contact Dr Haibin Cai ([email protected]) or Prof Stephanos Theodossiades ([email protected]). For general information about CENTA and the application process, please visit the CENTA website: https://centa.ac.uk/. For enquiries about the application process, please contact the School of Science ([email protected]).
If you wish to apply to the project, applications should include:
- A CENTA application form, downloadable from: CENTA application
- A CV with the names of at least two referees (preferably three and who can comment on your academic abilities)
- Submit your application and complete the host institution application process via: http://www.lboro.ac.uk/study/apply/research/. Please quote CENTA23_LU1 when completing the application form.
Applications to be received by the end of the day on Wednesday 11th January 2023.
Possible timeline
Year 1
During the first year the candidate will conduct a comprehensive review of deep learning based herbarium recognition. They will have regular meetings with supervisors to develop effective classification methods. Apart from these research activities, they will also attend several Loughborough Doctoral College organized courses related to academic writing, presentation, etc. Their research progress will be accessed by a six-month report, a first-year report and a review meeting with assessors.
Year 2
The candidate is expected to submit a review paper to peer review journals. They will also work on developing algorithms related to data pre-processing, image augmentation and the cascade networks to achieve good recognition accuracy. The performance of the developed models will be evaluated on different herbarium datasets.
Year 3
The candidate will continue working on the research project. They will focus on further improving the recognition accuracy by combining the developed models with strategies such as meta-learning and fine tuning.
Further reading
Atkins, H.J., Bramley, G.L.C., Nishii, K., Möller, M., Olivar, J.E.C., Kartonegoro, A. & Hughes, M. 2021. Sectional polyphyly and morphological homoplasy in Southeast Asian Cyrtandra (Gesneriaceae): consequences for the taxonomy of a mega-diverse genus. Plant Systematics and Evolution 307: Article 60 https://doi.org/10.1007/s00606-021-01784-x
Bebber, D.P., Carine, M.A, Wood, J.R, Wortley, A.H et al. (2010). Herbaria are a major frontier for species discovery. PNAS 107: 22169-22171.
Cheek, M., Nic Lughadha, E., Kirk, P., Lindon H. et al. 2020. New scientific discoveries: plants and fungi. Plants, People, Planet DOI: 10.1002/ppp3.10148
Heberling, J.M., Prather, L.A. & Tonsor, S.J. (2019). The Changing Uses of Herbarium Data in an Era of Global Change: An Overview Using Automated Content Analysis. BioScience 69: 812-822.
Hussein, B.R., Malik, O.A., Ong, W.H. and Slik, J.W.F., 2022. Applications of computer vision and machine learning techniques for digitized herbarium specimens: A systematic literature review. Ecological Informatics, p.101641.
Royal Botanic Garden Edinburgh (2022) A leading botanical collection of approximately 3 million specimens, representing half to two thirds of the word’s flora. Available at: https://www.rbge.org.uk/science-and-conservation/herbarium/ (Accessed: 18 October 2022).
Royal Botanic Garden Kew (2022) The Herbarium | Kew
Snell, J., Swersky, K. and Zemel, R., 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
UN (2022) Goal 15 | Department of Economic and Social Affairs (un.org)
COVID-19
Loughborough University has been able to continue operating during the COVID-19 pandemic and related lockdowns. It has essential work conducted on-site in conjunction with health and safety measures and social distancing. This research topic can also be done off-site by remotely accessing the computer workstations in the computer science department. A slight shift of the topic is the topic is possible with the agreement of both organizations.