Background. Our genome contains human endogenous retroviruses (HERVs) that are sequences derived from retroviral infection. HERVs often are classified according to the sequence of the primer binding site (PBS) where binds cellular tRNA for the reverse transcription process (Cohen & Larsson, 1988). HERV-K, where “K” derives from the PBS lysine. HERV-K (HML-2) is the most recently integrated group and the best preserved provirus in the human DNA. HERV-K family are divided into two type: type I has a 292-bp deletion and encodes the accessory Np9 protein while type II encodes the accessory Rec protein, Rec and Np9 proteins are associated with the tumor development (Bannert & Kurth, 2006). It has been shown that the position of integration of the retroviral sequences can lead to three effects: beneficial, harmful, or neutral (Bannert & Kurth, 2006). To have a more exhaustive analysis of the 98 HML-2 sequences identified by ReTe we analyses three important aspect that characterize the HML-2 proviruses : PBS type, Rec protein and their localization respect to genes. Material and Method. Human genome was analyzed with RetroTector (ReTe) (Sperber, Airola, Jern, & Blomberg, 2007) version 1.01. ReTe was run on a machine with 4 6-core Xeon processors, 2.66Ghz each, 256 Gb of RAM and 4 Tb of disks, with an estimated execution time of 1-2 days. Sequences alignment of PBS and REC sequences were performed using the MEGA software (version 5.2) and WebLogo http://weblogo.threeplusone.com/create.cgi . Results. We used the model-based software ReTe and identified 98 HML-2 proviruses, 21 of which were new possible candidates HML-2 sequences not detected before probably because the cut-off of Blast or Blat used was too high (data submitted). HML-2 sequences are distributed in all chromosomes with the exception of chromosomes 13, 16 and 18. We identified that 36 sequences have maintained the PBS K (Lys), 1 sequence PBS W, 4 sequence PBS S, 1 sequence PBS H, 1 sequence PBS R, 1 sequence PBS L, 1 sequence PBS T, 1 sequence PBS P. We analyzed the Rec and Np9 sequences and we found that 59 of 98 HML-2s have Rec sequences. We analyzed the length of the 98 HML-2 sequences observing that the 46% of them have a length within the 8,000-10,000 bp range and despite the considerable length are located more than 5 kb from genes and only five HML2 elements overlapping known genes. Conclusions. We identified by an model-based approach 98 HERV-K HML2 sequences. The identification of HML-2s in the human genome, their complete analysis and their position respect to the genes may support further studies that could ascertain the possible physiological or pathological role of the HML-2 expressed proteins.

ANALYSIS OF 98 HERV-K(HML-2) CONTAINING PROVIRUSES IDENTIFIED IN THE HUMAN GENOME ASSEMBLY GRCH37/HG19 BY RETROTECTOR AND THEIR GENOMIC CONTEXT

CADEDDU, MARTA;GRANDI, NICOLE;TRAMONTANO, ENZO
2014-01-01

Abstract

Background. Our genome contains human endogenous retroviruses (HERVs) that are sequences derived from retroviral infection. HERVs often are classified according to the sequence of the primer binding site (PBS) where binds cellular tRNA for the reverse transcription process (Cohen & Larsson, 1988). HERV-K, where “K” derives from the PBS lysine. HERV-K (HML-2) is the most recently integrated group and the best preserved provirus in the human DNA. HERV-K family are divided into two type: type I has a 292-bp deletion and encodes the accessory Np9 protein while type II encodes the accessory Rec protein, Rec and Np9 proteins are associated with the tumor development (Bannert & Kurth, 2006). It has been shown that the position of integration of the retroviral sequences can lead to three effects: beneficial, harmful, or neutral (Bannert & Kurth, 2006). To have a more exhaustive analysis of the 98 HML-2 sequences identified by ReTe we analyses three important aspect that characterize the HML-2 proviruses : PBS type, Rec protein and their localization respect to genes. Material and Method. Human genome was analyzed with RetroTector (ReTe) (Sperber, Airola, Jern, & Blomberg, 2007) version 1.01. ReTe was run on a machine with 4 6-core Xeon processors, 2.66Ghz each, 256 Gb of RAM and 4 Tb of disks, with an estimated execution time of 1-2 days. Sequences alignment of PBS and REC sequences were performed using the MEGA software (version 5.2) and WebLogo http://weblogo.threeplusone.com/create.cgi . Results. We used the model-based software ReTe and identified 98 HML-2 proviruses, 21 of which were new possible candidates HML-2 sequences not detected before probably because the cut-off of Blast or Blat used was too high (data submitted). HML-2 sequences are distributed in all chromosomes with the exception of chromosomes 13, 16 and 18. We identified that 36 sequences have maintained the PBS K (Lys), 1 sequence PBS W, 4 sequence PBS S, 1 sequence PBS H, 1 sequence PBS R, 1 sequence PBS L, 1 sequence PBS T, 1 sequence PBS P. We analyzed the Rec and Np9 sequences and we found that 59 of 98 HML-2s have Rec sequences. We analyzed the length of the 98 HML-2 sequences observing that the 46% of them have a length within the 8,000-10,000 bp range and despite the considerable length are located more than 5 kb from genes and only five HML2 elements overlapping known genes. Conclusions. We identified by an model-based approach 98 HERV-K HML2 sequences. The identification of HML-2s in the human genome, their complete analysis and their position respect to the genes may support further studies that could ascertain the possible physiological or pathological role of the HML-2 expressed proteins.
2014
HML2; HERV
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/68343
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact