Statistical methods for protein sequences

Regression with protein sequence covariates

The R package krm provides a way to fit regression model with protein sequence covariates through a kernel-based random effect model.


Clustering protein sequences into subfamilies

rBHP is a general clustering/mixture modeling algorithm that is based on randomized Bottom-up Hierarchical clustering Pruned (rBHP) splits.

cHMM is a mixture model based clustering method for identifying protein subfamily, and it uses rBHP as part of the inference machinery. 

Brief Guide

  • Type cbclust.exe to see help message
  • To run cHMM, do "cbclust.exe -m ProteinSequence ..."