This notebook is a quick demo of a BioCyc Web API I've released for Python. While incomplete the API offers access to most basic attributes for metabolites, proteins, reactions, pathways and organisms in the database. The Python interface comes with an disk-based caching mechanism under ~/.biocyc
that greatly reduces the delay (and load) for BioCyc servers.
The interface supports multiple + configurable caches, so it's possible to share the cache across multiple machines.
The biocyc
module is hosted on PyPi and can be installed from the command line with:
pip install biocyc
Read on for more info.
Basic initialisation¶
Import the biocyc
object from the biocyc
module. This object provides the base access to the database for the initial get. You can set the organism using set_organism
and one of the standard BioCyc database identifiers. Note that this only affects the organism-database used for direct requests on the biocyc object. Sub-requests on existing objects will use the same database as that object (otherwise things would be very confusing indeed).
import os
from biocyc import biocyc
os.environ['http_proxy'] = '' # Set your proxy if neccessary
biocyc.set_organism('meta')
Making a request¶
To get an database object (of any type) simply using the unique BioCyc identifiers for it. Here we request L-Lactate
. Note that if you do this from within an IP[y] Notebook you get a nice table output of all associated attributes for an object. This includes direct links to the BioCyc database and other database annotations.
o=biocyc.get('L-LACTATE')
o
Exploring further¶
Now we have an object we can perform sub-queries by accessing fields. If you access the o.reactions
field you will trigger a dynamic request for all entities in that list. Connections to the BioCyc server are throttled at 1/second, so this may take a little while on long lists. However, retrieved data is cached under ~/.biocyc
so subsequent requests will be much quicker. By default the cache is set to expire objects after ~6 months, and the cache folder can be shared between multiple machines.
_Note: If you just want access to the identifiers, you can use the o._reactions
field to access these without triggering a request_
r = o.reactions
r
r[1]
You can access sub-entities and manipulate objects using standard Python list processing.
ps = [r.pathways for r in o.reactions]
p = [p for sl in ps for p in sl]
p
p[0]
Finally¶
That's all for now! Hopefully this shows how Python (and IPython notebook) access to the BioCyc Web API may be useful. Support for additional attributes, API calls etc. is planned for the future. If you have specific requests, get in touch!