Owing to the strict relationship between protein structure and function, the prediction of protein tertiary structure has become one of the most important tasks in recent years. Despite recent advances, building the complete protein tertiary structure is still not a tractable task in most cases; in the absence of a clear homology relationship the problem is often decomposed into smaller sub tasks, including the prediction of the secondary structure. Notwithstanding the large variety of dierent strategies proposed over the years, secondary structure prediction is still an open problem, and few advances in the field have been made in recent times. In this thesis, the problem of secondary structure prediction is firstly analyzed, identifying five different information sources related to the biological essence of the problem, in order be exploited in a learning system. After describing a general software architecture and framework aimed at dealing with the issues related to the engineering and set up of prediction systems applied to real-world problems, dierent techniques based on the encoding and decoding of biological information, together with custom software architectures, are presented. The different proposals are assessed experimentally. The best improvements are consistent with the recent advances in the field (about 1-2% in the last ten years), conforming the validity of the assumption that the correlation sources identified can be further exploited to improve predictions.
Protein secondary structure prediction: novel methods and software architectures
-
2011-03-02
Abstract
Owing to the strict relationship between protein structure and function, the prediction of protein tertiary structure has become one of the most important tasks in recent years. Despite recent advances, building the complete protein tertiary structure is still not a tractable task in most cases; in the absence of a clear homology relationship the problem is often decomposed into smaller sub tasks, including the prediction of the secondary structure. Notwithstanding the large variety of dierent strategies proposed over the years, secondary structure prediction is still an open problem, and few advances in the field have been made in recent times. In this thesis, the problem of secondary structure prediction is firstly analyzed, identifying five different information sources related to the biological essence of the problem, in order be exploited in a learning system. After describing a general software architecture and framework aimed at dealing with the issues related to the engineering and set up of prediction systems applied to real-world problems, dierent techniques based on the encoding and decoding of biological information, together with custom software architectures, are presented. The different proposals are assessed experimentally. The best improvements are consistent with the recent advances in the field (about 1-2% in the last ten years), conforming the validity of the assumption that the correlation sources identified can be further exploited to improve predictions.File | Dimensione | Formato | |
---|---|---|---|
PhD_Filippo_G_Ledda.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Dimensione
6.93 MB
Formato
Adobe PDF
|
6.93 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.