This script finds tag SNPs for specified HLA allele using the 1000 Genomes reference panel and PLINK. The panel comes from the HLA-TAPAS project.
You will need bash to run the script, along with plink.
- Install PLINK (v1.90 or higher recommended) from https://www.cog-genomics.org/plink/.
- Bash: in linux or macOS, bash is usually pre-installed. For Windows, consider using WSL (Windows Subsystem for Linux).
The installation is straightforward. You can clone the repository directly into your desired location.
# Clone the repository
cd ./local/share/
git clone --recurse-submodules https://github.com/nmendozam/tagHLAs.git
chmod +x tagHLAs/*.sh
# Add to PATH (optional)
echo "$PATH:$(pwd)/tagHLAs" >> ~/.bashrcImportant
If you do not use the --recurse-submodules flag when cloning, you will need to initialize and update the submodules manually.
This is important because the reference panel is stored as a submodule.
Something important to note is that the allele names in the panel have a prefix of HLA_ instead of HLA-. So for example, if you want to find tags for HLA-A*01:01:01, you should provide HLA_A*01:01:01 as input.
list_tags.sh -a "HLA_A*01:01:01" --out tags_output.txtThe default reference panel is the 1000 Genomes VCF included in the repository. If you want to use a different reference panel, provide the path with the -p option.
list_tags.sh -a "HLA_A*01:01:01" -p /path/to/custom_panel.vcf.gz --out tags_output.txtTo see all available options, run:
list_tags.sh -hYou can list all available alleles in the reference panel using the list_alleles.sh script:
list_alleles.shTip
If you have any issues or questions, please open an issue.
If you are looking into creating your own reference panels, please refer to the HLA-TAPAS documentation. The panels created with this tool are directly compatible with the list_tags.sh script.