BpForms: toolkit for concretely describing non-canonical DNA, RNA, and proteins

BpForms is a toolkit for unambiguously representing the primary sequence of forms of biopolymers. By concretely representing the primary sequence of biopolymers, BpForms aims to facilitate concrete discussion about DNA modification, post-transcriptional processing, and post-translational processing; facilitate the determination of the structures of biopolymer forms; facilitate the integration of data about DNA modification, post-transcriptional processing, and post-translational processing; and enable whole-cell models that represent DNA modification, post-transcriptional processing, and post-translational processing and the functions of modified DNA, RNA, and proteins.

BpForms includes a notation for describing biopolymer forms, as well as this website, a JSON REST API, a command line interface, and a Python API for calculating properties of biopolymer forms. These tools are available open-source under the MIT license.

BpForms verifier/calculator

Enter a biopolymer form

Computed properties of the biopolymer form

BpForms notation

Overview

The BpForms notation represents biopolymers as IUPAC/IUBMB sequences augmented with (a) multiple-letter alphabet-defined monomeric forms delimited by curly brackets, (b) user-defined monomeric forms described in square brackets by one or more attributes separated by "|", and (b) additional attributes that can be used to represent circularity and crosslinks.

The structure, backbone-bond-atom, backbone-displaced-atom, left-bond-atom, left-displaced-atom, right-bond-atom, and right-displaced-atom monomeric form attributes are required to calculate the chemical formula, molecular weight, and charge. All other attributes are optional.

The definition of the BpForms grammar is available at GitHub . The grammar is defined in Lark syntax , which is based on EBNF syntax .

Examples

DNA

{dI}ACGC

Length: 5
Formula: C48H55N20O30P5
Molecular weight: 1546.9
Charge: -6

Deoxyinosine at the first position

RNA

AC{9A}GC

Length: 5
Formula: C48H55N20O35P5
Molecular weight: 1626.9
Charge: -6

Inosine at the third position

Protein

ACU[id: "U"]C

Length: 4
Formula: C12H22N4O5S2Se1
Molecular weight: 445.4
Charge: 0

L-selenocysteine at the third position

Alphabets

BpForms has several pre-built alphabets.

Examples

DNA

C{m2A}G{m2C}

Length: 4
Formula: C39H50N17O24P4
Molecular weight: 1264.8
Charge: -5

2-methyladenine at the second position and 2-methylcytosine at the fourth position

RNA

A{21C}GC

Length: 4
Formula: C44H60N18O29P4
Molecular weight: 1429.0
Charge: -4

2-lysidine at the second position

Protein

{AA0037}E{AA0038}

Length: 3
Formula: C12H18N3O14P2
Molecular weight: 490.2
Charge: -5

O-phospho-L-serine at the first position and O-phospho-L-threonine at the third position

Structures of monomeric forms

The structure monomeric form attribute describes the chemical structure of the inline monomeric form. This attribute is a SMILES-encoded string. Each monomeric form can only have one structure. This attribute is required to calculate the structure of the BpForm.

Example

The text below illustrates how describe the modified DNA nucleobase hypoxanthine, and the image below illustrates the molecule that the text specifies. The atom labels indicate the numbers of the atoms within the molecule. These numbers can be generated with OpenBabel .

[id: "dI"
    | name: "hypoxanthine"
    | structure: "O=C1NC=NC2=C1N=CN2"
    ]

Bonds between monomeric forms and backbones

The backbone-bond-atom and backbone-displaced-atom monomeric form attributes describe the bonds between monomeric forms and their backbone. Each monomeric form can form multiple bonds with the backbone and multiple atoms can be displaced. These attributes are required to calculate the structure of the BpForm.

Example

The example below illustrates how to describe the modified DNA nucleotide deoxyinosine monophosphate. The red atoms in the image indicate the sugar-phosphate backbone.

[id: "dI"
    | name: "hypoxanthine"
    | structure: "O=C1NC=NC2=C1N=CN2"
    | backbone-bond-atom: N10
    | backbone-displaced-atom: H10
    ]

Bonds between adjacent monomeric forms

The left-bond-atom, left-displaced-atom, right-bond-atom, and right-displaced-atom monomeric form attributes describe the bonds between successive monomeric forms. Each monomeric form can have multiple bonds and multiple displaced atoms. These attributes are required to calculate the structure of the BpForm.

Example

The example below illustrates how to describe the modified protein residue N5-methyl-L-arginine. The green atoms indicate the C terminus. The blue atoms indicate N terminus.

[id: "AA0305"
    | name: "N5-methyl-L-arginine"
    | structure: "O=C[C@H](CCCN(C(=[NH2])N)C)[NH3+]"
    | backbone-bond-atom: C2
    | backbone-displaced-atom: H2
    | right-bond-atom: C2
    | left-bond-atom: N15-1
    | left-displaced-atom: H15+1
    | left-displaced-atom: H15+1
    ]

The example below illustrates how to describe a polymer with N5-methyl-L-arginine at the third position. The green lines indicate the bonds between the successive monomeric forms. The black labels indicate the positions of monomeric forms within the sequence.

A[id: "AA0305"
    | name: "N5-methyl-L-arginine"
    | structure: "O=C[C@H](CCCN(C(=[NH2])N)C)[NH3+]"
    | backbone-bond-atom: C2
    | backbone-displaced-atom: H2
    | right-bond-atom: C2
    | left-bond-atom: N15-1
    | left-displaced-atom: H15+1
    | left-displaced-atom: H15+1
    ]A

Bonds that create circular polymers

The circular polymer attribute indicates that there is a bond between the last and first monomeric forms, creating a circular polymer.

Example

The example below illustrates how to describe the circular DNA dimer of the DNA nucleotides deoxyadenosine monophosphaate and dexocytosine monophosphate. The red atoms indicates the sugar-phosphate backbones. The green lines indicate the bonds between successive sugar-phosphate backbones. The black labels indicate the positions of monomeric forms within the sequence.

AC | circular

Crosslinks between monomeric forms

The crosslink polymer attribute indicates that there is a bond between two non-adjacent monomeric forms. Polymers can have zero, one, or more crosslinks. This attribute can be used to describe intrastran crosslinks in DNA, disulfide bonds between cysteines in proteins, and other bonds.

Example

The example below illustrates how to describe a tripeptide with a disulfide bond. The blue line indicates the disulfide bond (crosslink). The green lines indicate the bonds between the successive residues. The black labels indicate the positions of monomeric forms within the sequence.

CAC | crosslink: [
    left-bond-atom: 1S1 |
    left-displaced-atom: 1H1 |
    right-bond-atom: 3S1 |
    right-displaced-atom: 3H1
]

Uncertainty about the primary sequence

BpForms can represent two types of uncertainty in the primary sequences of biopolymer forms.

  • The delta-mass delta-charge monomeric form attributes describe uncertainty in the chemical identity of the monomeric form.
  • The position monomeric form attribute describes uncertainty in the position of the monomeric form within the sequence.

Examples

  • [id: "dAMP" | delta-mass: 1 | delta-charge: 1]: indicates the presence of an additional proton whose exact location is not known.
  • [id: "dI" | position: 2-3]: indicates that deoxyinosine may occur anywhere between the second and third position.

Metadata about monomeric forms

BpForms can represent several types of metadata about monomeric forms.

  • The id and name monomeric form attributes are human-readable labels for monomeric forms. Only one id and one name is allowed per monomeric form.
  • The synonym monomeric form attribute is an additional human-readable label. Monomeric forms can have multiple synonyms.
  • The identifier monomeric form attribute indicates entries in databases and ontologies which are equivalent to the monomeric form. Monomeric forms can have multiple identifiers. The namespace and id of each identifer must be separated by a "/".
  • The comments monomeric form attribute describes additional information about the monomeric form. Monomeric forms can only have one comment.

Examples

  • [id: "dI" | name: "deoxyinosine"]: represents the id and name of deoxyinosine.
  • [id: "dI" | synonym: "deoxyinosine" | synonym: "2'-deoxyinosine"]: represents multiple synonyms of deoxyinosine.
  • [id: "dI" | identifier: "CHEBI:28997" @ "chebi" | identifier: "65058" @ "pubchem.compound"]: represents equivalent entries in ChEBI and PubChem to deoxyinosine.
  • [id: "dI" | comments: "A purine 2'-deoxyribonucleoside that is inosine ..."]: represents comments about deoxyinosine.

Resources for reconstructing biopolymer forms

DNA resources

  • DNAMod : Database of non-canonical DNA nucleobases
  • MethDB : Database of non-canonical DNA
  • MethSMRT : Database of non-canonical DNA
  • REPAIRtoire : Database of DNA damages

RNA resources

  • MODOMICS : Database of non-canonical RNA nucleosides
  • RMBase : Database of modified RNA
  • RNA Modification Database : Database of modified RNA

Drawing structures of monomeric forms

  • ChemAxon Marvin : Software for drawing structures of monomeric forms
  • OpenBabel : Software for calculating the numbers of the atoms in monomeric forms

Protein resources

  • dbPTM : Database of non-canonical amino acids
  • Delta Mass : Database of modified amino acids
  • FindMod : Database of post-translational modifications
  • iPTMnet : Database of post-translational modifications
  • ProForma : Notation for protein forms. Note, this notation is not unambiguous. This limits its abiltiy to facilitate data integration and the calculation of properties of protein forms.
  • PDB Chemical Components : Database of modified amino acids
  • PDB in Europe Chemical Components : Database of modified amino acids
  • PhosphoSitePlus : Database of protein phosphorylations
  • Protein Ontology : Database of modified proteins
  • PSIMOD : Ontology of non-canonical amino acids
  • RESID : Database of non-canonical protein residues
  • UniMod : Database of non-canonical amino acids
  • UniProt : Database of modified amino acids in proteins
  • UniProt Controlled Vocabulary of Posttranslational Modifications : Database of modified amino acids

Uses cases for epigenomics, proteomics, systems biology, synthetic biology, and proteomics

Epigenomics

BpForms can help researchers precisely communicate the structures of modified DNA such as methylations that bacteria use to distinguish self from non-self DNA. We anticipate that this will be increasingly important as researchers continue to discover new types of modifications and begin to investigate their impact on the interactions of proteins with DNA.

Example

Several chemotherapeutics, such as cisplatin, have toxic side effects due to damaging the DNA of healthy cells. Cells have several pathways with overlapping functions to repair DNA damage. This includes direct repair, base excision repair, nucleotide excision repair, and homologous recombination. Because chemotherapeutics cause a wide range of damage and because cells have several pathways to repair DNA damage, it remains difficult to assemble an integrated understanding of the repair of DNA damage caused by chemotherapeutics. BpForms could help researchers develop an integrated understanding of DNA repair by helping researchers concretely communicate the damage caused by each chemotherapeutic and the types of damage repaired by each pathway.

Proteomics

Example

Systems biology

BpForms can help modelers describe the semantic meaning of models by precisely describing the species in models. Importantly, this precision makes it easier for other researchers to understand, reuse, extend, and compose models for other studies. BpForms can also help modelers build more comprehensive models by helping identify gaps in models such as missing intermediate modification states of proteins and missing interactions between modification states. In particular, BpForms can help modelers identify the full combinatorial complexity of biochemistry that needs to be modeled. In addition, BpForms can help researchers increase the quality of models by helping identify errors such as element imbalances.

Example

The Kholodenko model of the eukaryotic MAPK signaling cascade (DOI: 10.1046/j.1432-1327.2000.01197.x , BioModels: BIOMD0000000010 ) represents the biphosporylation of Mek1/MAPKK by Mos/MAPKKK and the biphosporylation of Erk2/MAPK by Mek1/MAPKK. Annotating the structures of the species in the model with BpForms, enabled us to identify two gaps in the model: two additional intermediate phosphorylation forms of Mek1 and Erk2 and the reactions involving these species. BpForms also enabled us to identify several unbalanced reactions due to the lack of representation of phosphate donors.

Synthetic biology

BpForms can help engineers precisely represent and communicate the structures of parts for synthetic organisms. In addition, BpForms could help engineers identify the dependencies and interfaces of parts which, in turn, could help engineers more reliably use parts in alternative hosts, compose parts, and share parts.

Example

E. coli pyruvate dehydrogenase requires lipoate ligation at L43 of the active site of the E1 subunit (UniProt: P96104 ). By representing lipoate ligation of the E1 subunit, BpForms can help capture the dependence of E. coli pyruvate dehydrogenase on lipoate ligase. In turn, this could help engineers recognize that E. coli pyruvate dehydrogenase can only be used in other hosts that have a lipoate ligase, or that a lipoate ligase, such as LplA (UnitProt: P32099 ), must be co-transformed with E. coli pyruvate dehydrogenase.

Transcriptomics

BpForms can help researchers precisely communicate the sequences of rRNA, tRNA, and other non-coding RNA; analyze RNA modifications; and improve the quality of reported sequences by identifying errors in the descriptions of modified RNA such as undefined monomeric forms and inconsistent bonds (e.g., 3' caps that are not located at the 3' position).

Example: Analysis of the rRNA and tRNA in MODOMICS

MODOMICS contains 732 curated sequences of rRNA and tRNA . We used BpForms together with MODOMICS to assess the metabolic cost of RNA modification in Escherichia coli. We found that E. coli tRNA have 7.8 ±- 2.2 modifications that increase their mass by 177.5 ± 108.0 Da and charge by 0.62 ± 0.76. This analysis also led us to add missing information about the origin of several of the monomeric forms derived from MODOMICS and correct three types of errors errors in the MODOMICS RNA sequences. The code is available at GitHub .

Integrating BpForms into the BioPAX, CellML, SBML, and SBOL standards

Knowledge of pathways (BioPAX)

BpForms can be used with the sequence child of DNAReference, RNAReference, and ProteinReference objects of the BioPAX format to concretely describe all of the DNA, RNA, and proteins involved in pathways.

Examples

DNA: E. coli K-12 MG1655 Dam 6-methyladenine sites (701..914) involved in host recognition

...
  <bp:DNA>
    <bp:entityReference>
      <bp:DNAReference>
        <bp:sequence
          rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
          rdf:about="http://edamontology.org/format_3909#dna">
          ...
          TGATTTGCCGTGGCGAGAAAATGTCG{a}TCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAAC
          GTTACTGTTATCG{a}TCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATAT
          TGCTGAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTG{a}TCACATGGTGCTGATGGCAGGTT
          ...
        </bp:sequence>
      </bp:DNAReference>
    </bp:entityReference>
  </bp:DNA>
...

RNA: Modifications of B. subtilis tRNAUGC involved in stability

...
  <bp:RNA>
    <bp:entityReference>
      <bp:RNAReference>
        <bp:sequence
          rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
          rdf:about="http://edamontology.org/format_3909#rna">
          GGAGCCUUAGCUCAGC{8U}GGGAGAGCGCCUGCUU{501U}GC{6A}CGCAGGAG{7G}UCAGCGG{5U}{9U}CGAUCCCGCUAGGCUCCA
          CCA
        </bp:sequence>
      </bp:RNAReference>
    </bp:entityReference>
  </bp:RNA>
...

Protein: Modifications of H. sapiens MAPK3 involved in signaling

...
  <bp:Protein>
    <bp:entityReference>
      <bp:ProteinReference>
        <bp:sequence
          rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
          rdf:about="http://edamontology.org/format_3909#protein">
          M{AA0041}AAAAQGGGGGEPRRTEGVGPGVPGEVEMVKGQPFDVGPRYTQLQYIGEGAYGMVSSAYDHVRKTRVAIKKISPFEHQTYCQRTL
          REIQILLRFRHENVIGIRDILRASTLEAMRDVYIVQDLMETDLYKLLKSQQLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLINTTCD
          LKICDFGLARIADPEHDH{AA0038}GFL{AA0038}E{AA0039}VA{AA0038}RWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPI
          FPGKHYLDQLNHILGILGSPSQEDLNCIINMKARNYLQSLPSKTKVAWAKLFPKSDSKALDLLDRMLTFNPNKRITVEEALAHPYLEQYYDPT
          DEPVAEEPFTFAMELDDLPKERLKELIFQETARFQPGVLEAP
        </bp:sequence>
      </bp:ProteinReference>
    </bp:entityReference>
  </bp:Protein>
...

Kinetic models (SBML)

BpForms can be used with the annotation element of species objects of the Systems Biology Markup Language (SBML) to concretely describe the meaning of each species in a model.

Examples

Protein: Phosphorylated Cdc2 and Cdc12 in the yeast cell cycle (DOI: 10.1073/pnas.88.16.7328 , BioModels: BIOMD0000000005 ). See complete SBML file .

...
  <species name="cdc2k-p" metaid="cdc2k">
    <annotation>
      <rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about="#cdc2k-p">
          <bpforms:seq xmlns:bpforms="https://bpforms.org/protein">
            MENYQKVEKIGEG{AA0038}span>{AA0039}GVVYKARHKLSGRIVAMKKIRLEDESEGVPSTAIREISLLKEVNDENNRSNCVRLLDI
            LHAESKLYLVFEFLDMDLKKYMDRISETGATSLDPRLVQKFTYQLVNGVNFCHSRRIIHRDLKPQNLLIDKEGNLKLADFGLARSFGVPLRN
            Y{AA0038}HEIVTLWYRAPEVLLGSRHYSTGVDIWSVGCIFAEMIRRSPLFPGDSEIDEIFKIFQVLGTPNEEVWPGVTLLQDYKSTFPRW
            KRMDLHKVVPNGEEDAIELLSAMLVYDPAHRISAKRALQQNYLRDFH
          </bpforms:seq>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
  </species>
...

Protein: Phosphorylated Mos/Raf1, Mek1, and Erk2 in the eukaryote MAPK cascade (DOI: 10.1046/j.1432-1327.2000.01197.x , BioModels: BIOMD0000000010 ). See complete SBML file .

...
  <species name="Erk2-PP" metaid="_584615">
    <annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about="#_584615">
          <bpforms:seq xmlns:bpforms="https://bpforms.org/protein">
            MAAAGAASNPGGGPEMVRGQAFDVGPRYINLAYIGEGAYGMVCSAHDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFKHENIIGINDI
            IRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHT
            GFL{AA0038}E{AA0039}VATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLK
            ARNYLLSLPHKNKVPWNRLFPNADPKALDLLDKMLTFNPHKRIEVEAALAHPYLEQYYDPSDEPVAEAPFKFEMELDDLPKETLKELIFEET
            ARFQPGY
          </bpforms:seq>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
  </species>
...

Kinetic models (CellML)

BpForms can be used with the RDF element of component objects to concretely describe the meaning of each component with BpForms in CellML .

Example

Protein: Phosphorylated Erk and Mek in signal transduction (DOI: 10.1038/msb.2009.4 , Physiome Model Repository ). See complete CellML file .

...
  <component cmeta:id="ypp" name="ypp">
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about="#ypp">
          <bpforms:seq xmlns:bpforms="https://bpforms.org/protein">
            MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRLEAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSG
            LVMARKLIHLEIKPAIRNQIIRELQVLHECNSPYIVGFYGAFYSDGEISICMEHMDGGSLDQVLKKAGRIPEQILGKVSIAVIKGLTYLRE
            KHKIMHRDVKPSNILVNSRGEIKLCDFGVSGQLID{AA0037}MAN{AA0037}FVGTRSYMSPERLQGTHYSVQSDIWSMGLSLVEMAVG
            RYPIPPPDAKELELMFGCQVEGDAAETPPRPRTPGRPLSSYGMDSRPPMAIFELLDYIVNEPPPKLPSGVFSLEFQDFVNKCLIKNPAERA
            DLKQLMVHAFIKRSDAEEVDFAGWLCSTIGLNQPSTPTHAAGV
          </bpforms:seq>
        </rdf:Description>
      </rdf:RDF>
    </component>
...

Designs for synthetic parts (SBOL)

BpForms can be used with the elements attribute of Sequence objects to concretely describe the meaning of each DNA, RNA, and protein molecule in genetic designs encoded in the Synthetic Biology Open Language (SBOL) .

The following URIs should be used to indicate the encodings for the sequences of DNA, RNA, and protein molecules.

  • DNA: http://edamontology.org/format_3909#dna
  • RNA: http://edamontology.org/format_3909#rna
  • Protein: http://edamontology.org/format_3909#protein

See SBOL SEP 033 and issue 77 for more information.

Examples

RNA: Modified B. subtilis tRNAILE 69 (SynBioHub: BO_28687 ). See complete SBOL file .

...
  <sbol:Sequence>
    <sbol:elements>
      GGGCCUGUAGCUCAGC{8U}GG{8U}{8U}AGAGCGCACGCCUGAU{62A}AGCGUGAG{7G}UCGAUGG{5U}{9U}CGAGUCCAUUCAGGCCCACCA
    </sbol:elements>
    <sbol:encoding rdf:resource="http://edamontology.org/format_3909#rna"/>
  </sbol:Sequence>
...

Protein: Lipoate-ligated acetyltransferase component PdhC of B. subtilis pyruvate dehydrogenase complex (SynBioHub: BO_32431 ). See complete SBOL file .

...
  <sbol:Sequence>
    <sbol:elements>
      MAFEFKLPDIGEGIHEGEIVKWFVKPNDEVDEDDVLAEVQND{AA0118}AVVEIPSPVKGKVLELKVEEGTVATVGQTIITFDAPGYEDLQFKGSDE
      SDDAKTEAQVQSTAEAGQDVAKEEQAQEPAKATGAGQQDQAEVDPNKRVIAMPSVRKYAREKGVDIRKVTGSGNNGRVVKEDIDSFVNGGAQEAAPQE
      TAAPQETAAKPAAAPAPEGEFPETREKMSGIRKAIAKAMVNSKHTAPHVTLMDEVDVTNLVAHRKQFKQVAADQGIKLTYLPYVVKALTSALKKFPVL
      NTSIDDKTDEVIQKHYFNIGIAADTEKGLLVPVVKNADRKSVFEISDEINGLATKAREGKLAPAEMKGASCTITNIGSAGGQWFTPVINHPEVAILGI
      GRIAEKAIVRDGEIVAAPVLALSLSFDHRMIDGATAQNALNHIKRLLNDPQLILMEA
    </sbol:elements>
    <sbol:encoding rdf:resource="http://edamontology.org/format_3909#protein"/>
  </sbol:Sequence>
...

BpForm interfaces

Webform

The webform above can be used to validate BpForms and calculate their properties.

JSON REST API

The BpForms JSON REST API is available at https://bpforms.org/api.

Command line interface

The BpForms command line interface is available from PyPI .

Python package

The BpForms Python package is available from PyPI .

Source code

BpForms is available open-source from GitHub .

Tutorials, documentation, and help

Documentation for the notation

Documentation for the notation is available above . The grammar for the notation is available at GitHub

Definitions of the monomeric forms in the alphabets

Documentation for the alphabets is available above . This includes images and detailed information about each monomeric form in each alphabet.

Query builder for the REST API

A visual interface for building REST queries is available at bpforms.org/api .

Documentation for the REST API

Documentation for the REST API is available at bpforms.org/api .

Documentation for the command line program

Documentation for the command line program is available inline by running bpforms --help.

Tutorial for the Python API

A Jupyter notebook with an interactive tutorial is available at sandbox.karrlab.org .

Documentation for the Python API

Detailed documentation for the Python API is available at docs.karrlab.org .

Questions

Please contact the Karr Lab with any questions.

Contributing to BpForms

Contributing to the alphabets

The alphabets are defined by YAML files (see GitHub ). To contribute to the alphabets or to contribute a new alphabet, please edit or create a YAML file and submit your changes via a Git pull request.

Contributing to the software

To contribute to the software, please submit a Git pull request.

About BpForms

Citing BpForms

Lang PF, Chebaro Y & Jonathan R. Karr. BpForms: a toolkit for concretely describing modified DNA, RNA and proteins. arXiv:1903.10042

Team

BpForms was developed by Jonathan Karr , Yassmine Chebaro , and Paul Lang in the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA.

Acknowledgements

BpForms was supported by a National Institute of Health P41 award , a National Institute of Health MIRA R35 award , and a National Science Foundation INSPIRE award .

License

BpForms is released under the MIT license .

Questions/comments

Please contact Jonathan Karr with any questions or comments.