Dr. Sebastian Köhler On The Future of the Human Phenotype Ontology Resource

Sebastian Köhler, a founder of the Human Phenotype Ontology (HPO), discusses the future of this widely adopted standardized vocabulary for phenotypic abnormalities.

I think one of the major goals should be to focus on the molecular level and try to describe in more detail which phenotypic differences are caused by different variants in different genes that are all linked to the same syndrome, in order to be able to better stratify patients and make big steps towards personalisation of healthcare.

Sebastian Köhler, Co-Founder of the Human Phenotype Ontology.

The Human Phenotype Ontology (HPO) is a standardized, structured, vocabulary that describes phenotypic abnormalities encountered in human disease. With over 13,000 terms and over 156,000 annotations to hereditary diseases, the HPO can be leveraged as a powerful tool for phenotype-driven differential diagnostics, genomic diagnostics, precision medicine, and translational research. The HPO is increasingly adopted as a standard for phenotypic abnormalities, used by international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools like PhenoTips. Its widespread adoption contributes towards efforts in global data exchange that advance the identification of disease etiologies, with unparalleled benefits for rare diseases.

PhenoTips’ core software enables deep phenotyping with the HPO

Natural Language Processing, a simple and predictive term search, and gene and diagnosis suggestions based on HPO profiles, all at your fingertips.

Following our Speaker Series “The Importance of Deep Phenotyping in Precision Medicine” in which HPO founder Prof. Dr. Peter Robinson discussed the HPO’s applications in genomic analysis and precision medicine, PhenoTips spoke with HPO founder Dr. Sebastian Köhler about the future of the HPO.

Over the past decade the HPO has grown in both application and annotations, with thousands of new HPO terms added in the last year alone. The majority of these applications focus on rare disease, and Dr. Köhler foresees that rare disease would continue to be the focus in the coming years, with knowledge from rare disease research applied to more common conditions.

“From what I hear from the rare disease community in Europe, people are very interested in supporting newborn screening,” says Dr. Köhler. “I think that structured data like encoded HPO or SNOMED, or whatever standard is en vogue, will play a key role [in newborn screening]. But to be honest, I’m rather sure it is especially going to be HPO because only HPO has the built-in feature to act as a bridge between the clinical features and the molecular genetics knowledge.”

Secondarily, Dr. Köhler notes that the HPO will play an increasingly important role in rare disease matchmaking and cohort identification. “In order to publish a report on a new gene or a new rare disease you need a minimum amount of affected cases or families, and the rarer the disease is the harder it will be to find those patients. In order to automatically identify patients with a shared phenotypic representation across the globe, you need data standards that work out of the box around the globe, as well as the exchange format and protocols. Hence, I think stratification and cohort identification is going to be an important topic, not only for rare disease but also for the pharma industry, for example, to identify homogeneous cohorts across the globe.”

Aside from the obvious research applications, the HPO has incredible functionality in clinical care. While some countries, such as the UK, are making a concerted effort to integrate HPO utilization into clinical workflows, others have been slow to adopt the standardized language. In Germany, for example, Dr. Köhler notes that the use of Orphanet codes in clinical care has only recently become standard practice, and it is a much simpler task to code a diagnosis when compared to the complexity of capturing the complete phenotypic presentation of a patient in HPO terms.

“I think it’s just a lot of work.” He hypothesizes, “Given you already know the diagnosis of the patient and you want to code this using the Orphanet codes, that’s not a lot of work. But reporting the full clinical spectrum of a patient is different, and if you do it you should do it right. It’s probably about 5 to 15 HPO terms that you would have to assign, it’s a lot of work.”

This bottleneck could be reduced with the use of software that performs Natural Language Processing (NLP) on clinical notes, pulling HPO terms from free-text, however, Dr. Köhler notes that so far NLP, although improved, has not delivered.

“We cannot entirely blame NLP here, it’s also the way medical texts are written,” says Dr. Köhler, “I haven’t read a lot of clinical letters but the ones I’ve seen are very complex. What I’ve seen in the clinical letters is very often that one sentence in the beginning of the paragraph is later referred to, so you need to have the information from the first sentence in order to really understand the later sentence for example, or paragraphs are providing context to a later paragraph.”

In order for clinics to see widespread adoption of the HPO, he notes that, “either you change the healthcare system and doctors have better tools, like PhenoTips, but also more time to record the phenotypes, or you substantially improve the NLP.”

While the HPO is an incredibly powerful tool, especially when acting as a summary of the phenotypic presentation of a case, translating free-text notes to coded HPO-phenotypes inherently loses context and some level of nuance. Free-text is much more powerful than non-free text, especially in rare disease cases, which is one possible explanation for its lack of adoption in the clinic according to Dr. Köhler. In order to improve this situation, he suggests further research focusing on cases that are currently not solved using HPO-phenotyping alone.

“It would be interesting to look into the exact reasons and investigate if more information could have helped.” He says, “I assume the problem most often is that the free-text description you’re comparing it to doesn’t contain enough information, but you cannot fix one without the other. If you want to have more details in the HPO-annotated descriptions, you also need more details in the written description that you compare it to, and as long as you lack details in these [standardized] databases it doesn’t make sense to also increase the details in the patient description.”

Despite this positive-feedback loop of inaccuracy, Dr. Köhler believes increasing the level of detail in HPO-captured phenotypes should still be done, as case notes must be as detailed as possible since they are typically described in one clinical appointment and further revisions are unlikely.

Specifically, he notes that “it would be a good idea to at least start with the most obvious things that are currently missing, like the timelines and severity which are often harder to code.”

additionally, Dr. Köhler believes that the HPO should build on the existing annotation set that describes diseases in terms of their typical phenotypic representation.

Dr. Köhler believes that in order to achieve his vision for the expansion and improvement of the HPO, community involvement will be essential.

“We give you this resource for free, but we also need you.” He says, “You need to report back to us, and if you find that a particular part of your speciality is missing something or is wrong or unclear, then get involved.”

The HPO founders are no strangers to community involvement. In the past, senior personnel like Prof. Dr. Peter Robinson have run workshops such as one with ophthalmologists in which the HPO and Orphanet were contributed to in parallel. In the future, Dr. Robinson’s vision for the HPO will likely drive extensions such as linking the HPO to the Medical Action Ontology (MAxO), as well as linking the ontology to exposure ontologies that take into account the environmental origins of some rare diseases.

Another important aspect of community involvement is the collaborative efforts required to translate the standardized ontology into additional languages, which is essential in expanding the community of HPO users. This translation process can be extremely challenging, as typos, structural issues, and differences in terms are common occurrences, making assistance from these language communities incredibly necessary.

Dr. Köhler lists the biggest translation efforts as the Spanish, Chinese, French, and German language translations, each to varying degrees of completion. Some translations have required professional resources, while others remain incomplete due to lack of community involvement.

Dr. Köhler continues to coordinate HPO translations alongside a colleague from the UK, acting as the main community contact for the projects. In the future he hopes to include a long-requested Arabic translation.

About Dr. Sebastian Köhler:Currently Senior Information Architect at Ada Health, Dr. Köhler is the author of over 60 publications in digital health, rare diseases, genomics, and knowledge representation. He has been developing machine learning tools for clinical, phenotype-driven interpretation of next generation sequencing results, such as PhenIX and ExomeWalker. He currently focuses on interoperability and semantic standards to build semantic knowledge graphs for AI and data analysis. Dr. Köhler’s latest paper, The Human Phenotype Ontology in 2021, summarizes recent advances in the HPO’s development and utilization.


Learn more about the Human Phenotype Ontology.

For more information on the HPO and its applications, watch the presentation by HPO co-founder Dr. Peter Robinson.

Watch the presentation