Upload a FASTA file with multiple sequences for batch prediction. Maximum 100 sequences per batch.
Cross-taxon generalization: how well the model transfers to each taxonomic group. gen_ratio = taxon micro-Fmax ÷ insect test micro-Fmax (insect baseline ≈ 0.953). Target: ≥ 0.90 (strong), ≥ 0.85 (acceptable). CAFA5-protocol metrics shown where available.
Loading generalization data…
Author
ProtFunc was designed, trained, and deployed by Siddhant Bhat as sole author. Model architecture, training data curation, evaluation against CAFA5 benchmarks, and webapp implementation are all original work.
What is ProtFunc?
ProtFunc predicts Gene Ontology Molecular Function (GO:MF) terms for protein sequences. It uses ESM-2, a protein language model trained on hundreds of millions of sequences, to generate embeddings that are passed through a trained classifier with attention pooling.
How to Use
- Paste your sequence in plain amino acid or FASTA format in the input box
- Multiple sequences can be submitted at once
- Maximum sequence length: 2500 amino acids
- Click Predict Functions or press Cmd/Ctrl + Enter
- For 3D structure + saliency: enter a UniProt accession (e.g.
P04637), then click View Saliency
Confidence Levels
- High (75%+) — Strong prediction, reliable
- Medium (55–75%) — Moderate confidence, hidden by default
- Low (<55%) — Uncertain, use as supplementary signal only
Model Information
ProtFunc uses ESM-2 (esm2_t12_35M, 480d embeddings) + 11 physicochemical features feeding an ImprovedResidualMLP (2048-hidden, 8 ResBlocks, ~4,200 GO-MF outputs). The model is fine-tuned jointly on insect and mammal proteins to support cross-taxon generalization.
CAFA Benchmark Compliance
Evaluations follow the CAFA5 protocol (protein-centric Fmax, AUPR, Smin, coverage). The unified model achieves micro-Fmax 0.939 (insect) and 0.804 (mammal, enriched GOA labels).
Taxon Coverage
Research Roadmap
- AlphaFold embeddings — Augment ESM-2 with structural features from AF2 pLDDT + contact maps for better specificity on rare MF terms
- Fish and bird fine-tuning — Extend the multi-taxon corpus with fish (Danio, Oreochromis) and avian (Gallus) reviewed SwissProt entries
- Transformer pooling — Replace mean-pool with a learnable attention pooler over ESM-2 residue tokens
- CAFA5 macro-Fmax — Target macro-Fmax improvement via threshold-free methods (AUROC ranking) for rare GO leaf terms
- All Metazoa — Scale to full TrEMBL Metazoa with semi-supervised label propagation via GO DAG