Identification of genotypes of hepatitis C virus by sequence comparisons in the core, E1 and NS-5 regions.
Simmonds P., Smith DB., McOmish F., Yap PL., Kolberg J., Urdea MS., Holmes EC.
Isolates of hepatitis C virus (HCV) show considerable nucleotide sequence variability throughout the genome. Comparisons of complete genome sequences have been used as the basis of classification of HCV into a number of genotypes that show 67 to 77% sequence similarity. In order to investigate whether sequence relationships between genotypes are equivalent in different regions of the genome, we have carried out formal sequence analysis of variants in the 5' non-coding region (5'NCR) and in the genes encoding the core protein, an envelope protein (E1) and a non-structural protein (NS-5). In the E1 region, variants grouped into a series of six major genotypes and a series of subtypes that could be matched to the phylogenetic groupings previously observed for the NS-5 region. Furthermore, core and E1 sequences showed three non-overlapping ranges of sequence similarity corresponding to those between different genotypes, subtypes and isolates previously described in NS-5. Each major genotype could also be reliably identified by sequence comparisons in the well conserved 5'NCR, although many subtypes, such as 1a/1b, 2a/2c and some of those of type 4, could not be reliably distinguished from each other in this region. These data indicate that subgenomic regions such as E1 and NS-5 contain sufficient phylogenetic information for the identification of each of the 11 or 12 known types and subtypes of HCV. No evidence was found for variants of HCV that had sequences of one genotype in the 5'NCR but of a different one in the E1 or NS-5 region. This suggests that recombination between different HCV types is rare or non-existent and does not currently pose a problem in the use of subgenomic regions in classification.