# Structure prediction with Boltz

Boltz, is an open-source and fully commercially available model from the MIT Jameel Clinic, designed to accurately model complex biomolecular interactions. It can be used to predict structure from a single protein sequence, or more complex problems involving multimer proteins and ligands.

**Useful links**

* [The Boltz repository on Github](https://github.com/jwohlwend/boltz)
* [The introductory blog from the Boltz team](https://jclinic.mit.edu/boltz-1/)&#x20;

## Usage with Ginkgo's AI Python client&#x20;

First, initialize the Ginkgo AI python client. You will need to first get a GINKGOAI\_API\_KEY. To get your API key, go to [models.ginkgobioworks.ai](https://models.ginkgobioworks.ai/) and create a free account. Copy your API key, which should be visible once you have logged in.

First, install the Ginkgo ai python client [ginkgo-ai-client:](https://ginkgobioworks.github.io/ginkgo-ai-client/)

```bash
pip install ginkgo-ai-client
```

```python
from ginkgo_ai_client import GinkgoAIClient, PromoterActivityQuery

# If the api_key field is omitted, it will be read from the env GINKGOAI_API_KEY
client = GinkgoAIClient(api_key="xxxxx-xxxx-xx-xxxxx-xxx")
```

### Structure prediction with a single protein sequence

Let's ask Ginkgo's Boltz server for the structure of the GFP protein!

```python
sequence = (
    "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL"
    "VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV"
    "NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD"
    "HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK"
)

query = BoltzStructurePredictionQuery.from_protein_sequence(sequence)
response = client.send_request(query)

```

The response contains a link to a structure file which can be downloaded either as CIF or PDB:

```
response.download_structure(path="GFP.cif")
response.download_structure(path="GFP.pdb") # conversion on-the-fly
```

<figure><img src="/files/B1Q0fGLVFWHtKsNifQGs" alt="" width="375"><figcaption></figcaption></figure>

The response also contains confidence data. Here a confidence score of 0.95, close to the maximum of 1, indicates a high confidence in the result.&#x20;

```python
>>> response.confidence_data
{'ptm': 0.9286148548126221,
 'iptm': 0,
 'chains_ptm': {'0': 0.9286148548126221},
 'complex_pde': 0.3618420660495758,
 'ligand_iptm': 0,
 'complex_ipde': 0,
 'protein_iptm': 0,
 'complex_plddt': 0.9573345184326172,
 'complex_iplddt': 0,
 'confidence_score': 0.9515905380249023,
 'pair_chains_iptm': {'0': {'0': 0.9286148548126221}}}
```

### Predictions with multimers and ligands

For more complex problems with multiple protein chains and ligands, the best is to start from a YAML Boltz input file (see the Boltz [instructions](https://github.com/jwohlwend/boltz/blob/main/docs/prediction.md), and [examples](https://github.com/jwohlwend/boltz/tree/main/examples) for more details).

In this example (in full [here](https://github.com/ginkgobioworks/ginkgo-ai-client/blob/main/examples/boltz_structure_inference/with_ligand.py)), we start from a typical Boltz YAML input file defining a dimer (a protein with two chains A and B sharing an identical sequence), and ligands.

```yaml
sequences:
  - protein:
      id: [A, B]
      sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVME...
  - ligand:
      id: [C, D]
      ccd: SAH
  - ligand:
      id: [E, F]
      smiles: N[C@@H](Cc1ccc(O)cc1)C(=O)O
```

We load the file with `from_yaml_file` and submit it to our Boltz server:

```python
query = BoltzStructurePredictionQuery.from_yaml_file("with_ligand.yaml")
response = client.send_request(query, timeout=1000)
response.download_structure("with_ligand.pdb")
```

And voilà:

<figure><img src="/files/UTmHbOG3ebEMrrZjd6Yi" alt="" width="563"><figcaption><p>A dimer with two identical chains (left and right) and identical ligands docked in each chain </p></figcaption></figure>

### Handling larger batches

For large sets of queries, it is better to use an iterator, managed via `send_requests_by_batches` .&#x20;

#### Processing a FASTA file with many protein sequences

If you have a FASTA file with many protein sequences, you might define an iterator that builds a query for each protein sequence in the file&#x20;

```python
from Bio import SeqIO
    
queries_iterator = (
    BoltzStructurePredictionQuery.from_protein_sequence(
        sequence=str(record.seq)
        query_name=record.id
    )
    for record in SeqIO.parse("my_proteins.fa", format="fasta")
)
```

Then the queries are generated and sent in batches via `send_requests_by_batches` and the results are processed as they become available:

<pre class="language-python"><code class="lang-python"><strong>for batch_result in client.send_requests_by_batches(queries_iterator, batch_size=10):
</strong>    for result in batch_result:
        result.download_structure(f"{result.query_name}.pdb")
</code></pre>

#### Processing a folder with multiple files

If your input is more complex, with multiple chains and ligands, and you have a folder with different  YAML Boltz input files, here is how you would use `send_requests_by_batches` :thumbsup:

```python
queries_iterator = (
    BoltzStructurePredictionQuery.from_yaml_file(f)
    for f in Path(folder).glob("*.yaml")
)

for batch_result in client.send_requests_by_batches(queries_iterator, batch_size=10):
    for result in batch_result:
        # the query name is the name of the original yaml file (without extension)
        result.download_structure(f"{result.query_name}.pdb")
```

## Throughput and Pricing

The Boltz model can complete predictions on our servers in as little as 20s for short proteins and as much as \~4 minutes for more complex problems with multiple chains and ligands.

We only currently accept protein sequences under 1000 amino-acids.

The current pricing is a combination of:

* A fixed cost of $0.01 per request
* Plus an additional  $0.00025 per amino-acid in the sequences (so $0.1 for a 400-AA sequence)
* Plus an additional $0.0025 per ligand.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ginkgo-1.gitbook.io/ginkgo/applications/structure-prediction-with-boltz.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
