InCOB 2018 Workshop - Exploring Programmatic Access to Protein Sequence, Function and Structure Information

Summary

Integrating publicly available biological data from multiple data sources (including your own) can be critical to analyse data and infer patterns that may not otherwise be obvious. In this workshop you will learn how to discover and integrate biological information as well as visualisation components from protein resources at the European Bioinformatics Institute (EMBL-EBI), with a focus on UniProt and PDBe.

Learn about accessing data such as gene-protein relationships, protein sequences, protein structures and function. Explore web services, like the UniProt API or the PDBe API providing programmatic access to data as well as freely available visualisation components, such as PDBe’s LiteMol viewer and UniProt’s ProtVista that you can re-use to visualise this data as well as your own.

Workshop schedule

Session 1 - Introductions

09:00 - 09.10 Welcome

09:10 - 09:50 Introduction to UniProt

09:50 - 10:30 Introduction to PDBe

Coffee break 30min

Session 2 - API

11:00 - 11:45 PDBe API

11:45 - 12:30 UniProt API

Lunch break 60min

Session 3 - Web components

13:30 - 14:15 UniProt open access visualisation components

14:15 - 15:00 PDBe web component library

Coffee break 30min

Session 4 - Bring it all together

15:30 - 17:00 Case study (or own ideas)

Tutors:

Andrew Nightingale, EMBL-EBI, Wellcome Genome Campus Cambridge, UK

Andrew Nightingale

I have received a BSc honours degree in Biochemistry from The University of Bath in 1997, a MSc in Computational Biology from the University of York in 1998 and a Ph.D. in Biochemistry/Bioinformatics in 2002 from The University of Leeds; where I worked in the Dr Howard Parish and Professor David Westhead groups. I then joined ProSpect Pharma Inc as a computational biologist and research scientist specialising in the development of new techniques for structure based drug design by NMR in collaboration with the Professor Steven Homans group. In 2012, I joined the UniProt team at EMBL-EBI as a data scientist responsible for the identification, analysis and integration of data resources related to genomic annotation, variation and active biomolecules to expand the UniProt Knowledgebase (UniProtKB) for clinical research and the understanding of the molecular mechanisms of diseases. This has led to the development of new data resources and tools such as the RESTful Proteins API, UniProt annotations available for integration into genome browsers and a genomic coordinates service for mapping protein expressing genes to their expressed proteins and annotations. I have composed and presented workshops on behalf of EMBL-EBI on the access and use of UniProt variation data; including a two day workshop at The Colombian Center for Bioinformatics and Computational Biology (BIOS) targeted at pre-doctoral and post-doctoral fellows interested in analysing the functional consequences of variants. I have also written and presented an EMBL-EBI webinar on programmatic access to UniProt data via its RESTful Proteins API.

Mihaly Varadi, EMBL-EBI, Wellcome Genome Campus, Cambridge, UK

Mihaly Varadi

Background: I received my MSc in Applied Zoology from the Veterinary University of Budapest in 2011, and my PhD in Bioscience Engineering/Computational Biology from the Vrije Universiteit Brussel in 2015. In the same year I have joined the Protein Data Bank in Europe at the European Bioinformatics Institute (EMBL-EBI) as scientific programmer. Since then I have been working on the back-end of the PDB as well as on providing programmatic access to the data via APIs, and displaying information on our front-ends using web components. I work with Python/Django, Java/Neo4J and TypeScript/Angular&Polymer.