Structure prediction with Boltz
Boltz, is an open-source and fully commercially available model from the MIT Jameel Clinic, designed to accurately model complex biomolecular interactions. It can be used to predict structure from a single protein sequence, or more complex problems involving multimer proteins and ligands.
Useful links
Usage with Ginkgo's AI Python client
First, initialize the Ginkgo AI python client. You will need to first get a GINKGOAI_API_KEY. To get your API key, go to models.ginkgobioworks.ai and create a free account. Copy your API key, which should be visible once you have logged in.
First, install the Ginkgo ai python client ginkgo-ai-client:
Structure prediction with a single protein sequence
Let's ask Ginkgo's Boltz server for the structure of the GFP protein!
The response contains a link to a structure file which can be downloaded either as CIF or PDB:
The response also contains confidence data. Here a confidence score of 0.95, close to the maximum of 1, indicates a high confidence in the result.
Predictions with multimers and ligands
For more complex problems with multiple protein chains and ligands, the best is to start from a YAML Boltz input file (see the Boltz instructions, and examples for more details).
In this example (in full here), we start from a typical Boltz YAML input file defining a dimer (a protein with two chains A and B sharing an identical sequence), and ligands.
We load the file with from_yaml_file
and submit it to our Boltz server:
And voilà:
Handling larger batches
For large sets of queries, it is better to use an iterator, managed via send_requests_by_batches
.
Processing a FASTA file with many protein sequences
If you have a FASTA file with many protein sequences, you might define an iterator that builds a query for each protein sequence in the file
Then the queries are generated and sent in batches via send_requests_by_batches
and the results are processed as they become available:
Processing a folder with multiple files
Throughput and Pricing
The Boltz model can complete predictions on our servers in as little as 20s for short proteins and as much as ~4 minutes for more complex problems with multiple chains and ligands.
We only currently accept protein sequences under 1000 amino-acids.
The current pricing is a combination of:
A fixed cost of $0.01 per request
Plus an additional $0.00025 per amino-acid in the sequences (so $0.1 for a 400-AA sequence)
Plus an additional $0.0025 per ligand.
Last updated