Users can download KOREF (also known as SJK) genomic sequences and related information from this page. We hope that openfreely sharing full Korean genome sequence information will help biological research.
KOREF: the Korean Reference Genome Sequence [Download]
The sequences are Korean genomic sequences, which are in FASTA format. They are divided into human chromosomes. A sequence in FASTA format consists of one line starting with a ">" sign, followed by lines of sequence data: A, C, G, and T (identified regions) and N (for un-identified regions). (Fig. 1) We use fa, mpfa, fsa as extensions of sequence files and users can see the files with text editors such as EditPlus, Notepad, and WordPad. Since our FASTA files are zip-compressed using WinZip program, users have to decompress them before using the sequences.
Polymorphism data [Download]
These data are of genomic variation information of KOREF project. The variation data are the results of MAQ program. The data were built based on dbSNP SNPs (ver. 129) and NCBI reference genome build 36. They consist of approximately 3 million SNPs. It include 360,000 KOREF-specific SNPs. Since the data were compressed, the users have to decompress them before use. The format of our data is described below.
>chr10 KSJ KSJgt 62221 62221 . + . ID=ksj:KRS1;allele=T/A;ref=T;sup1=5;sup2=1;
The data are described as GFF3 format, in which each column represents chromosome number, source, feature, start-position, end-position, score, direction, face, and attribute. The last column can be divided by “;” and each represents polymorphism number and SNP type.
They consist of three files as follows..
1. SNP.txt [Download]
This file consists of the 40,254 SNP markers for 86 people including 20 each from African, American, Japanese, and Chinese, and six persons below. We used NCBI build 36. All the data were extracted from the following sources.
* HapMap samples: HapMap phase III 1.6~M markers
* Craig Ventor: ftp://ftp.jcvi.org/pub/data/huref/
* James Watson: http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/
* Yang Huangming: http://yh.genomics.org.cn/download.jsp#rd
* Kim, Sungjin: Korean genome variations checked with Affy Genome-Wide Human SNP Array 6.0
* Mother of Kim, Sungjin : Affy Genome-Wide Human SNP Array 6.0
2. genetic_distance.txt [Download]
This file consist of matrix-format data, which are from ASD (Allele Sharing Distance) (Mountain and Cavalli-Sforza, 1996, Multilocus Genotypes, a Tree of individuals, and Human Evolutionary History. Am. J. Hum. Genet. 61, 705-718).
3. phylogenic_tree.nwk [Download]
This file is phylogenic results of newick and nwk programs. Users can see the files using TreeView and MEGA programs.