GitHub - aigerabae/gwas_real_data: This repository described the work I did on GWAS data for 300 healthy Kazakhs. Includes archived scripts no longer in use

This is a methods section for GWAS study using 300 Kazakh genomes.

I) Information about sequencing

II) Raw data processing

Normalized signal intensity and genotype were computed using Illumina’s GenomeStudio v.2 software.

Make genotype calls across all samples using a standard Infinium Bead Chip cluster file.
The standard cluster file (*egt file) supplied by Illumina for each Infinium BeadChip type is generated using a diverse set of more than 200 HapMap1DNA samples in an Illumina laboratory. Some SNP probes (include custom probes) were clustered manually for reduce the number of spurious region calls, and increase the accuracy of the results. custom cluster tech note

If a sample passes the intact DNA sample QC criteria but when the callrate is significantly lower than other samples, then the sample can be re-experiment is some cases. Call rates were consistently high in the experiment; no samples were excluded at this stage

Call rate, p10 GC, GenCall Score are available in phenotypes.tsv
Call Rate: Percentage of SNPs (expressed as a decimal) whose GenCall score is greater than the specified threshold.
p10 GC: 10th percentile GenCall score over all SNPs for this sample.
GenCall Score_: quality metric that indicates the reliability of each genotype call.

Genotype matrix export Make a text file that contains the genotype of entire samples and probes.
Make input files for third party tools: *.ped & *.map files to execute PLINK.

III) Data processing in PLINK

prepared .map file was used for further analysis
prepared .ped file needed further processing to include phenotypes (analysis shown in plink_qc1.md)
prepared .ped and .map files were processed for quality control (analysis shown in plink_qc2.md)
PCA, ROH, Fst and ADMIXTURE analyses were performed using other populations from Human Genome Diversity Project (HGDP) (analysis shown in plink_HGDP.md)

Name		Name	Last commit message	Last commit date
Latest commit History 313 Commits
ARCHIVE_plink_HGDP.md		ARCHIVE_plink_HGDP.md
README.md		README.md
annovared.md		annovared.md
average_plot_admixture.py		average_plot_admixture.py
create_admixture_table.py		create_admixture_table.py
genome_studio.md		genome_studio.md
gnomad_data.md		gnomad_data.md
helper_scripts.zip		helper_scripts.zip
ibd_average.py		ibd_average.py
installation_ubuntu.md		installation_ubuntu.md
matrices.md		matrices.md
phenotypes.tsv		phenotypes.tsv
plink_qc1.md		plink_qc1.md
plink_qc2.md		plink_qc2.md
plot_adxmixture.py		plot_adxmixture.py
plot_eigenvec.py		plot_eigenvec.py
plot_fst_heatmap.py		plot_fst_heatmap.py
qpAdm.md		qpAdm.md
ref_populations.md		ref_populations.md
remove_relatives.sh		remove_relatives.sh
reprodicibility.md		reprodicibility.md
samples.xlsx		samples.xlsx
to_plink.md		to_plink.md

Provide feedback