DONATE

Publications

by Keyword: Principal components analysis

Pairo, E., Marco, S., Perera, A., (2010). A subspace method for the detection of transcription factor binding sites BIOINFORMATICS 2010. Proceedings of the First International Conference on Bioinformatics BIOINFORMATICS 2010. First International Conference on Bioinformatics (ed. Fred, A., Filipe, J., Gamboa, H.), INSTICC Press (Valencia, Spain) , 102-107

Transcription Factor binding sites are short and degenerate sequences, located mostly at the promoter of the gene, where some proteins bind in order to regulate transcription. Locating these sequences is an important issue, and many experimental and computational methods have been developed. Algorithms to search binding sites are usually based on Position Specific Scoring Matrices (PSSM), where each position is treated independently. Mapping symbolical DNA to numerical sequences, a detector has been built with a Principal Component Analysis of the numerical sequences, taking into account covariances between positions. When a treatment of missing values is incorporated the Q-residuals detector, based on PCA, performs better than a PSSM algorithm. The performance on the detector depends on the estimation of missing values and the percentage of missing values considered in the model.

JTD Keywords: Binding sites, BPCA, Missing values, Numerical DNA, Principal components analysis, Transcription factors


Pairo, E., Marco, S., Perera, A., (2009). A preliminary study on the detection of transcription factor binding sites Biosignals 2009: Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing 2nd International Conference on Bio-Inspired Systems and Signal Processing (ed. Encarnacao, P., Veloso, A.), Insticc-Inst Syst Technologies Information Control & Communication (Oporto, Portugal) , 506-509

Transcription starts when multiple proteins, known as transcription factors recognize and bind to transcription start site in DNA sequences. Since mutation in transcription factor binding sites are known to underlie diseases it remains a major challenge to identify these binding sites. Conversion from symbolic DNA to numerical sequences and genome data make it possible to construct a detector based on a numerical analysis of DNA binding sites. A subspace model for the TFBS is built. TFBS will show a very small distance to this particular subspace. Using this distance binding sites are distinguished from random sequences and from genome data.

JTD Keywords: Transcription factors, Binding sites, Principal components analysis