lcdna
Microbial DNA Design with Diffusion
Model Overview
Our first long-context DNA diffusion model focuses on microbial DNA design. The autoregressive diffusion model operates by iteratively refining DNA sequences based on the model's understanding DNA regulatory logic in microbial genomes. Starting from a user-provided sequence template, it gradually transforms the unspecified nucleotide characters into a coherent and functional DNA sequence. At each step, the model uses the surrounding DNA context (up to 30kb) to predict the best nucleotide choices for the masked nucleotide characters specified by the user.
Input
The model requires a sequence template consisting of IUPAC nucleotides of a length of up to 30,000 characters. Degenerate nucleotides (e.g. N
indicating any of A, T, G, or C
, K
indicating either G or T
). Optionally, users may customize generation by specifying a sampling temperature between 0.0
and 1.0
, and a decoding strategy. Decoding strategy options are one of the following:
Example
Output
The output is a dictionary with the key "sequence", linked to the diffusion sampling generated nucleotide sequence.
Example
Last updated