Regression with protein sequence covariates
The R package krm provides a way to fit regression model with protein sequence covariates through a kernel-based random effect model.
- Fong, Y.‡, Datta, S.‡, Georgiev, I., Kwong, P., Tomaras, G. (2014) Mutual information kernel logistic models with application in HIV vaccine studies, Biostatistics, in press. (‡ equal contribution)
Clustering protein sequences into subfamilies
rBHP is a general clustering/mixture modeling algorithm that is based on randomized Bottom-up Hierarchical clustering Pruned (rBHP) splits.
cHMM is a mixture model based clustering method for identifying protein subfamily, and it uses rBHP as part of the inference machinery.
- Type cbclust.exe to see help message
- To run cHMM, do "cbclust.exe -m ProteinSequence ..."
- Fong, Y., Wakefield, J. and Rice, K. (2012) An Efficient Markov Chain Monte Carlo Method for Mixture Models by Neighborhood Pruning, Journal of Computational and Graphical Statistics, 21:1.
- Fong, Y., Wakefield, J. and Rice, K. (2010) Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification, Biostatistics, 11:1.