// StartMathJax Script window.MathJax = {loader: {load: [ 'input/asciimath', 'ui/lazy', 'output/chtml', 'ui/menu']} }; (function() { var script = document.createElement('script'); script.src = "https://cdn.jsdelivr.net/npm/mathjax@3/es5/startup.js"; script.async = true; document.head.appendChild(script); })(); // UpdateTypeset Script config = { attributes: true, childList: true, subtree: true }; // Callback function to execute when mutations are observed callback = (mutationList, observer) => { for (mutation of mutationList) { if (mutation.type === 'childList') { console.log('A child node has been added or removed.'); MathJax.typeset(); } else if (mutation.type === 'attributes') { console.log(`The ${mutation.attributeName} attribute was modified.`); } } }; // Create an observer instance linked to the callback function observer = new MutationObserver(callback); document.onreadystatechange = () => { if (document.readyState === 'complete') { console.log("Loaded fully according to readyState") targetNode = document.getElementById('content-wrapper') console.log(targetNode) // Start observing the target node for configured mutations observer.observe(targetNode, config); } }
top of page

A summary of research paper in Unified Representation (UniRep) of Protein

The original paper can be found here. All figures and tables are from paper.


What is UniRep?


In this paper, the author applied deep learning to unlabelled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. The data-driven approach which is a unified representation of sequence-based protein is a unified representation (UniRep).



The UniRep is a recurrent neural network (RNN), more specifically, a multiplicative long/short term memory (mLSTM) RNN method. It summarises arbitrary protein sequences into fixed-length vectors approximating fundamental protein features. This length vector representation is the globally averaging intermediate mLSTM numerical summaries (the hidden states)


How to use UniRep


The original UniRep code is published on GitHub. There is also a JAX version, which is easily customisable, with additional utility APIs that support protein engineering workflows.


Here, I am using the original UniRep code in a docker container. The pipeline in UniRep code is as follows:


  1. set up model with trained weights. the final hidden layer has either 1900 or 64 units.

  2. Import protein sequence

  3. Check if sequence contain any invalid characters

  4. Transform sequence character to integer numbers.

  5. pad each sequence to a fixed length in batch.

  6. Extract the final hidden layer tensor from the trained mode.

  7. Use this final hidden layer tensor as input, train for the top model weights, or train all weights.

2 views0 comments

Comments


bottom of page