// StartMathJax Script window.MathJax = {loader: {load: [ 'input/asciimath', 'ui/lazy', 'output/chtml', 'ui/menu']} }; (function() { var script = document.createElement('script'); script.src = "https://cdn.jsdelivr.net/npm/mathjax@3/es5/startup.js"; script.async = true; document.head.appendChild(script); })(); // UpdateTypeset Script config = { attributes: true, childList: true, subtree: true }; // Callback function to execute when mutations are observed callback = (mutationList, observer) => { for (mutation of mutationList) { if (mutation.type === 'childList') { console.log('A child node has been added or removed.'); MathJax.typeset(); } else if (mutation.type === 'attributes') { console.log(`The ${mutation.attributeName} attribute was modified.`); } } }; // Create an observer instance linked to the callback function observer = new MutationObserver(callback); document.onreadystatechange = () => { if (document.readyState === 'complete') { console.log("Loaded fully according to readyState") targetNode = document.getElementById('content-wrapper') console.log(targetNode) // Start observing the target node for configured mutations observer.observe(targetNode, config); } }
top of page
Ying Liu

What is Pfam database?

Pfam is a protein domain family database. In the database, each sample is a segmentation of protein, domain. Based on the characterisation of the protein segments, these domains were classified into different families.


There are two parts of this Pfam database: part A and part B. Pfam-A is curated and contains well-characterised protein domain families with high quality alignments. They were manually checked seeded alignments. All members of the seed were also aligned using Hidden Markov Model (HMM). Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains.


What is seeded alignment?


When we align two two sequence, The general patterns of these two sequence will be matched with filled gaps between the pattern. The patterns here are commonly called seeds. Seeds have been used not only for large-scale local alignment but also as anchor points in whole-genome and multiple sequence alignment algorithms.


What does Domainer algorithm do?


The Domainer algorithm performs clustering of domain families based on all versus all Blastp matching. It is a fully automatic approach that was used for building the ProDom database. The clustering level of Domainer depends on the score level of accepted pairwise Blastp matches. The domain boarders are inferred by analysing the extent of the BLAST matches and from the NH- and COOH-terminal ends. The main problem with this method is that it does not scale well and it is sensitive to incorrect data.


Pfam-A


In the Pfam-A database, we have aligned protein sequence data as well as the original sequence. Figure below showed us the pattern of seeds in one protein domain family. Each colour is a numerically encoded unit, amino acid, and the number 0 is the filler gaps between seeds.




3 views0 comments

Comments


bottom of page