// StartMathJax Script window.MathJax = {loader: {load: [ 'input/asciimath', 'ui/lazy', 'output/chtml', 'ui/menu']} }; (function() { var script = document.createElement('script'); script.src = "https://cdn.jsdelivr.net/npm/mathjax@3/es5/startup.js"; script.async = true; document.head.appendChild(script); })(); // UpdateTypeset Script config = { attributes: true, childList: true, subtree: true }; // Callback function to execute when mutations are observed callback = (mutationList, observer) => { for (mutation of mutationList) { if (mutation.type === 'childList') { console.log('A child node has been added or removed.'); MathJax.typeset(); } else if (mutation.type === 'attributes') { console.log(`The ${mutation.attributeName} attribute was modified.`); } } }; // Create an observer instance linked to the callback function observer = new MutationObserver(callback); document.onreadystatechange = () => { if (document.readyState === 'complete') { console.log("Loaded fully according to readyState") targetNode = document.getElementById('content-wrapper') console.log(targetNode) // Start observing the target node for configured mutations observer.observe(targetNode, config); } }
top of page
Ying Liu

A summary of research paper on membrane protein classification

Original paper can be found here.


Input features


There are two main features about protein which were used for the classification. One is sequence information.


Sequence information


Protein are made from 20 different type of amino acid. These acids form up a molecular chain in different order, direction and numbers with the total acid number ranging from 50 to over 1500. The protein less than 1500 acids take part 98% of the protein found. Hence the majority protein can be represented by a matrix of 1500 by 20 where the 1500 is the maximum number of acid number and the 20 is one-hot encoding of each amino acid.


There are many feature extraction methods which are based on sequence information and used for the classification:

  • The pseudo-amino acid composition (PseAAC)

  • Local amino acid composition (LAAC)

  • Local dipeptide composition (LDC)

  • Global descriptor (GD)

  • Lempel-Ziv complexity (LZC)

  • Autocorrelation descriptor (AD)

  • Sequence-order descriptor (SD)

  • Hilbert-Huang transform (HHT)

  • Peptide composition method

  • Dipeptide composition (DipC)

  • Tripeptide composition (TipC)


Evolutionary information


This information mainly refers to the position-specific scoring matrix (PSSM). It forms up a matrix of 1500  20 where the 20 refers to the 20 scoring value of one acid to all 20 acids.

When protein does not have 1500 acids, 0 were added as the padding for both features.


Method


This paper proposed two deep learning methods to train the sequence features and evolutionary features separately for protein type classification. The last layer of these two deep learning methods are both softmax function which give a value of probability of their output. These probability value were then combined and fed to a meta-classifier to train for the final output.



Sequence feature model


One convolutional layer, one bi-LSTM, followed by softmax function.



Evolutionary feature model


Two convolutional layer with average pooling, followed by PrimaryCaps.



Result


Claimed better result especially with another feature extraction method with Auto-encoder.


 

What is PrimaryCaps


It is functioning very similar to Convolutional Neural Network (CNN). In the case of CNN, the location information of each "pixel" is gradually lost due to the convolution and sometimes, the pooling. Primary caps starts to capture these information by representing the node information using vector instead of scalar value. For example, one node contains a scalar value of  . We then create one capsule which contains 8 node to represent a vector of 8 dimension. This is called the PrimaryCaps. The weighted sum of the PrimaryCaps will then be used to decide the protein type in this case. If the information of one location take a major contribution in deciding the protein type. The weight of this capsule will increase gradually during training. This works very similar to attention mechanism in sequence-to-sequence model.

3 views0 comments

Recent Posts

See All

Comments


bottom of page