diff --git a/POSEIDON_yml_fields.tsv b/POSEIDON_yml_fields.tsv
index d267b9c..3a9b4e0 100644
--- a/POSEIDON_yml_fields.tsv
+++ b/POSEIDON_yml_fields.tsv
@@ -8,11 +8,17 @@ email	1	contributor	email of one contributor	String	Email	TRUE
 orcid	1	contributor	orcid of one contributor	String	ORCID	FALSE
 packageVersion	0		package version (should be changed/incremented when the package is changed)	String	X.Y.Z	TRUE
 lastModified	0		date of last modification of the package (should be updated when the package is changed)	Date	YYYY-MM-DD	FALSE
+license	0		data license section			FALSE
+name	1	license	short name of data license that applies for this package, usually a Creative Commons license	String		TRUE
+url	1	license	URL to the license 	String	Path	TRUE
+file	1	license	relative path to a license file (usually not necessary, the name is sufficient for standard licenses)	String	Path	FALSE
 genotypeData	0		genotype data section			TRUE
-format	1	genotypeData	genotype data file format	String	EIGENSTRAT;PLINK	TRUE
-genoFile	1	genotypeData	relative path to the geno file	String	Path	TRUE
-genoFileChkSum	1	genotypeData	md5 checksum of the geno file	String	md5 hash	FALSE
-snpFile	1	genotypeData	relative path to the snp file	String	Path	TRUE
+referenceGenomeAssembly	1	genotypeData	reference genome name of the reference genome used, e.g. GRCh37	String		FALSE
+referenceGenomeAssemblyURL	1	genotypeData	reference assembly accession URL from a public database, such as NCBI or Ensembl	String	URL	FALSE
+format	1	genotypeData	genotype data file format	String	EIGENSTRAT;PLINK;VCF	TRUE
+genoFile	1	genotypeData	relative path to the genotype file. If gzipped, MUST end with *.gz	String	Path	TRUE
+genoFileChkSum	1	genotypeData	md5 checksum of the genotype file	String	md5 hash	FALSE
+snpFile	1	genotypeData	relative path to the snp file. If gzipped, MUST end with *.gz	String	Path	TRUE
 snpFileChkSum	1	genotypeData	md5 checksum of the snp file	String	md5 hash	FALSE
 indFile	1	genotypeData	relative path to the ind file	String	Path	TRUE
 indFileChkSum	1	genotypeData	md5 checksum of the ind file	String	md5 hash	FALSE
diff --git a/README.md b/README.md
index 90536d7..b660974 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,8 @@
-## The Poseidon Standard v2.7.1
+## The Poseidon Standard v3.0.0
 
-Poseidon is a solution for archaeogenetic genotype data organisation. This standard defines the core components of the Poseidon package.
+Poseidon is a solution for archaeogenetic genotype data organisation. It is geared towards human data, but is to a large extent species-agnostic and can be used to track archaeogenetic data also of non-human species.
+
+This standard defines a data structure: the **Poseidon package**. A Poseidon package stores genotype data with meta- and context information.
 
 A .pdf version of the latest instance of this document can be downloaded [here](https://github.com/poseidon-framework/poseidon-schema/blob/master/poseidon_package_specification.pdf).
 
@@ -10,14 +12,26 @@ A changelog documents the changes across different schema versions [here](https:
 
 The key words *MUST*, *MUST NOT*, *REQUIRED*, *SHALL*, *SHALL NOT*, *SHOULD*, *SHOULD NOT*, *RECOMMENDED*, *MAY*, and *OPTIONAL* in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
 
+### Primary entities of a Poseidon package
+
+The main operational entities in a Poseidon package are discrete sets of genotype data attributed to a single human or non-human individual, scientifically generated for archaeogenetic research questions. Within a Poseidon package each of these sets gets attributed a unique identifier: the `Poseidon_ID`.
+
+Generally, archaeogenetics operates on depositional contexts, e.g. graves, with one or multiple (ancient) human or non-human individuals. Usually, it is possible to attribute the (skeletal) remains within these contexts to individuals based on archaeological evidence and physical-anthropological analysis. Each individual can get sampled one or multiple times, either by directly probing their preserved tissue, or by sampling any reagent that contains their DNA (through whatever pathway or taphonomic process). From one such sample one or multiple extracts can be derived, which can be transformed into one or multiple libraries, which may or may not be subjected to a DNA capture protocol and then sequenced one or multiple times. The raw sequencing data can undergo various different forms of computational processing and eventually genotyping to produce the data relevant for most derived analyses and thus stored in a Poseidon package.
+
+While the wetlab-processes yield a relatively predictable tree of separate physical and digital products for any given sample, the computational data-processing breaks the conceptual tree-ness by allowing for arbitrary conflation of sequencing data obtained through potentially separate means: Data from different libraries, for example, may be merged if they are from the same individual, even if they are not from the same sample.
+
+`Poseidon_ID`s therefore represent one consciously selected end-point in the complex data preparation graph laid out above. Typically this end-point corresponds to an optimal result for a given individual, research question and publication.
+
+For the sake of convenience and despite the lack of conceptual clarity, below we sometimes use the term *sample* to denote `Poseidon_ID` entities. Data aggregation on the level of physical samples is often sensible, and the term is conventionally used for analysis endpoints in the community of practice.
+
 ### The Poseidon package structure
 
-A Poseidon package stores genotype data with context information for DNA samples from (ancient) (human) individuals. Packages are defined by the POSEIDON.yml file, which holds relative paths to all other files in a package.
+A Poseidon package is defined by the POSEIDON.yml file, which holds relative paths to all other files in the package.
 
 A package therefore MUST contain:
 
 - A `POSEIDON.yml` file to formally define the package
-- Genotype data in PLINK or EIGENSTRAT format
+- Genotype data in PLINK, EIGENSTRAT or VCF format
 
 It SHOULD additionally contain:
 
@@ -44,7 +58,11 @@ Switzerland_LNBA_Roswita/README.md
 Switzerland_LNBA_Roswita/CHANGELOG.md
 ```
 
-All text files in the package MUST be UTF-8 encoded.
+### Text encoding
+
+All text files in the package MUST be UTF-8 encoded. They SHOULD use Unix-style line endings, so a single Line Feed (LF, `\n`) character, NOT a Carriage Return and Line Feed (CRLF) pair (`\r\n`) as in MS DOS and Windows.
+
+`Poseidon_ID`s and `Group_Name`s, so the primary sample and group identifiers across `.janno`, `.ssf`, and genotype data files, MUST contain only characters of a subset of the 7-bit ASCII code set. Specifically the alphanumeric characters `A-Z`, `a-z`, `0-9`, and the symbols `_` (underscore), `-` (hyphen-minus), and `.` (period, dot or full stop).
 
 ### The `POSEIDON.yml` file
 
@@ -67,6 +85,10 @@ contributor:
     email: paul.panther@example.edu
 packageVersion: 1.1.2
 lastModified: 2021-01-28
+license:
+  name: CC BY 4.0
+  url: https://creativecommons.org/licenses/by/4.0/
+  file: license.md
 genotypeData:
   format: PLINK
   genoFile: Switzerland_LNBA_Roswita.bed
@@ -117,17 +139,33 @@ When the `packageVersion` is changed, then the `lastModified` date MUST be updat
 
 Packages SHOULD start at `packageVersion` `0.1.0`.
 
+### Data licensing and the license.md file
+
+Data licences are a common way to grant the public permission to use a dataset under copyright law.
+
+Poseidon packages MAY specify a license, and if so, SHOULD use [Creative Commons licences](https://creativecommons.org/share-your-work/cclicenses).
+
+Licences are documented in the `POSEIDON.yml` file in the `license` section, either with just the `name`, or with a license `file`, or with both the `name` and a `file`. `name` SHOULD include a short string with name and version of the license, e.g. `CC BY 4.0`. The `file`, typically named `license.md`, MAY include the full text of a license, or a short notifier further contextualizing the entry in the `name` field. For example:
+
+```default
+The Poseidon package Switzerland_LNBA_Roswita © 2021 by Roswita Malone is licensed under Creative Commons Attribution 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
+```
+
 ### Genotype data
 
-Genotype data in Poseidon packages is stored either in (binary) PLINK or EIGENSTRAT format.
+Genotype data in Poseidon packages is stored either in (binary) PLINK, EIGENSTRAT or Variant Call Format (VCF).
+
+|   | PLINK (binary) | EIGENSTRAT | VCF |
+|---|---|---|---|
+| genotype file | [`.bed` (binary biallelic genotype table) or `.bed.gz`](https://www.cog-genomics.org/plink/1.9/formats#bed) | [`.geno` (genotype file) or `.geno.gz`](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67) | [`.vcf` or `.vcf.gz`](https://samtools.github.io/hts-specs/VCFv4.2.pdf) |
+| SNP file  | [`.bim` (extended MAP file) or `.bim.gz`](https://www.cog-genomics.org/plink/1.9/formats#bim) | [`.snp` (snp file) or `.snp.gz`](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67) |  |
+| individual file  | [`.fam` (sample information)](https://www.cog-genomics.org/plink/1.9/formats#fam) | [`.ind` (indiv file)](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67) |  |
 
-|   | PLINK (binary) | EIGENSTRAT |
-|---|---|---|
-| genotype file | [`.bed` (binary biallelic genotype table)](https://www.cog-genomics.org/plink/1.9/formats#bed) | [`.geno` (genotype file)](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67)
-| SNP file  | [`.bim` (extended MAP file)](https://www.cog-genomics.org/plink/1.9/formats#bim) | [`.snp` (snp file)](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67) |
-| individual file  | [`.fam` (sample information)](https://www.cog-genomics.org/plink/1.9/formats#fam) | [`.ind` (indiv file)](https://github.com/DReichLab/EIG/blob/fb4fb59065055d3622e0f97f0149588eae630a3e/CONVERTF/README#L67) |
+Both PLINK and EIGENSTRAT formats require three files to be specified. In PLINK, the genotype file is binary (with 2 bits per genotype), while in Eigenstrat, the genotype file is text-based (with 8 bits per genotype). The SNP and individual files are text-based for both formats (see links behind the file endings in the table above). The EIGENSTRAT format specifically is common within archaeogenetics, compatible with many important tools, e.g. [EIGENSOFT](https://github.com/DReichLab/EIG) and [ADMIXTOOLS](https://github.com/DReichLab/AdmixTools). Finally, the VCF format is the most formally specified format, with properly versioned specifications being released regularly. VCF is well established in the wider genetics community and the de-facto standard to store variants in the field of medical genetics.
 
-In addition to these files (and optionally their checksums), the POSEIDON.yml file SHOULD also provide a `snpSet` entry which determines the shape of the genotype file.
+VCF files, as well as genotype and SNP files in PLINK and EIGENSTRAT can be stored in gzipped form, signifified by an additional file ending (`*.gz`).
+
+To make VCF files fully convertible to PLINK and EIGENSTRAT, they MUST be biallelic and contain only genotypes coded as `0/0`, `0/1`, `1/1`, `./.`. Furthermore, they CAN encode group names and genetic sex for all samples through special header fields `##group_names=name1,name2,...` and `##genetic_sex=F,U,M,...`, respectively. If these fields are not present, then group names are assumed to be "unknown" and genetic sex "U" (unknown) for all samples.
 
 ###  The `.janno` file
 
@@ -135,9 +173,10 @@ The `.janno` file is a tab-separated text file with a header line. It holds cont
 
 - A set of strictly defined core variables (defined by column name) and their possible content are documented here: [janno_columns.tsv](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv)
 - A `.janno` file MAY have all of these core variables, or only a subset of them.
-- Only three columns MUST be present to make the file valid: **Poseidon_ID**, **Group_Name** and **Genetic_Sex**
+- Only three columns MUST be present to make the file valid: **Poseidon_ID**, **Group_Name** and **Genetic_Sex**.
 - Arbitrary columns not defined here MAY be added as long as their column names do not clash with the defined ones.
-- The column order is irrelevant.
+- Arbitrary, additional free-text information directly related to a column **<Column_Name>** from the set of specified core variables in [janno_columns.tsv](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv) SHOULD be added in a column whose name has the form **<Column_Name>_Note**. Example: `Contamination_Note`.
+- The column order is not fixed, but MAY follow the order in [janno_columns.tsv](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv). **<Column_Name>_Note** columns SHOULD be placed directly after the respective column they are refering to.
 - If information is unknown or a variable does not apply for a certain sample, then the respective cell(s) MAY be filled with `n/a` or simply an empty string.
 - The order of the samples (rows) in the `.janno` file MUST be equal to the order in the genetic data files (`.ind`, `.fam`) in the package.
 - The values in the columns **Poseidon_ID**, **Group_Name** and **Genetic_Sex** MUST be equal to the terms used in the genetic data files (`.ind`, `.fam`).
@@ -206,3 +245,16 @@ The `.ssf` file is another tab-separated text file with a header line. It stores
 - If information is unknown or a variable does not apply, then the respective cell(s) MAY be filled with `n/a` or simply an empty string.
 - Multiple predefined columns of the `.ssf` file are list columns that can hold multiple values (either strings or numerics) separated by `;`.
 - The decimal separator for all floating point numbers MUST be `.`.
+
+### Details
+
+#### The `Capture_Type` .janno column
+
+The following protocols are specified:
+
+- `Shotgun`: Sequencing without any enrichment (whole genome sequencing, screening etc.).
+- `1240K`: Target enrichment with hybridization capture optimised for sequences covering the 1240k SNP array, see [@Fu2015](https://doi.org/10.1038/nature14558), [@Haak2015](https://doi.org/10.1038/nature14317), [@Mathieson2015](https://doi.org/10.1038/nature16152).
+- `ArborComplete`, `ArborPrimePlus`, `ArborAncestralPlus`: Target enrichment with hybridization capture as provided by Arbor Biosciences in three different kits branded [myBaits Expert Human Affinities](https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-expert/mybaits-expert-human-affinities).
+- `TwistAncientDNA`: Target enrichment with hybridization capture as provided by Twist Bioscience [@Rohland2022](https://doi.org/10.1101/gr.276728.122).
+- `WISC2013`: Whole genome capture as described by [@Carpenter2013](10.1016/j.ajhg.2013.10.002).
+- `OtherCapture`: Target enrichment with hybridization capture for any other set of sequences.
diff --git a/changelog.md b/changelog.md
index 54df5d9..11732b6 100644
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,64 @@
 # Changelog
 
+### 2.7.1 -> 3.0.0 [breaking]
+
+#### General changes
+
+- Introcuded a specific, limited character set for `Poseidon_ID`s and `Group_Name`s (in the .janno file, the .ssf file, and the genotype data): The ASCII characters `A-Za-z0-9_-.`.
+- Allowed another genotype data format next to (binary) PLINK and EIGENSTRAT: the Variant Call Format (VCF).
+- Specified a mechanism to store genotype data in a more space-efficient gzipped form.
+
+#### Clarifications
+
+- Clarified the exact meaning of a `Poseidon_ID` and the entity in genotype and context data it represents.
+- Clarified the suitability of the Poseidon standard for non-human data: `[Poseidon] is geared towards human data, but is to a large extent species-agnostic and can be used to track archaeogenetic data also of non-human species.`
+- Clarified that text files in Poseidon packages should use Unix-style line endings.
+
+#### Changes to the `POSEIDON.yml` file
+
+- Added the optional section `license` with the fields `name` and `file` to specify a data license for a package.
+- Added two optional fields within the `genotypeData` structure:
+  - `referenceGenomeAssembly`, the reference genome name of the reference genome used, e.g. GRCh37
+  - `referenceGenomeAssemblyURL`, the reference assembly accession URL from a public database, such as NCBI or Ensembl
+- Modified the definition of the `genoFile` and `snpFile` fields to cover the case of gzipped data, for which the respective file names must end with `*.gz`.
+
+#### Changes to the `.janno` file
+
+##### Replaced columns
+
+- Replaced `Source_Tissue` with `Source_Material` and `Source_Material_Note`.
+
+##### Added columns
+
+- Added a column `Individual_ID` as an identifier on the level of (human/animal) individuals.
+- Added a column for the sampled `Species`, to make the schema more explicitly species-agnostic.
+- Added a column `Alternative_IDs_Context` to document what exactly the "foreign keys" in `Alternative_IDs` are referring to. This is a list column with the same number and order of entries as `Alternative_IDs`.
+- Added a `Custodian_Institution` column that documents the institution that curated the sampled remains at the time of sampling, with name, city and country.
+- Added four list columns to describe the cultural eras and archaeological cultures a sample is associated with: `Cultural_Era` + `Cultural_Era_URL` and `Archaeological_Culture` + `Archaeological_Culture_URL`.
+- Added the columns `Chromosomal_Anomalies` and `Chromosomal_Anomalies_Note` for genetic anomalies on the chromosome level detected for the sample. This includes extra, missing or irregual portions of chromosomal DNA like in gonosomal and autosomal aneuploidies. `Chromosomal_Anomalies` is not limited to a specific set of options, but a common notation is recommended (e.g. `XXY`, `XYY`, `XXX`, `X0`, `Trisomy21`, `Trisomy18`).
+
+##### Changed columns
+
+- Introcuded a specific, limited character set for the `Poseidon_ID` and `Group_Name` column: The ASCII characters `A-Za-z0-9_-.`.
+- Adjusted the definition of the `Group_Name` column. The role of population labels as general analysis labels was emphasised, and the original recommendation for the geographic-temporal nomenclature proposed by Eisenmann et al. 2018 toned down.
+- Changed the definition of the `Relation_` columns (`Relation_To`, `Relation_Degree`, `Relation_Type`) to operate on the level of individuals, not samples (`Individual_ID`, instead of `Poseidon_ID`).
+- Made the `Collection_ID` column a list column that allows multiple entries separated by `;`.
+- Removed `ReferenceGenome` as an option for the `Capture_Type` column and further clarified its definition.
+- Changed the scaling of the columns `Endogenous` and `Damage` from percent (0-100) to fractions (0-1).
+- Allowed multiple values in the `Damage` column for estimates per library.
+- Slightly adjusted the definitions of `MT_Haplogroup` and `Y_Haplogroup` to better account for non-human data.
+- Added the option `WISC2013` to `Capture_Type`.
+
+##### Removed columns
+
+- Removed all explicitly defined `_Note` columns. The schema allows arbitrary additional columns since v2.2.0; a specification of free-text fields is not necessary.
+
+#### Changes to the `.ssf` file
+
+##### Added columns
+
+- Added a `submitted_md5` column, which records the md5sum of the file in the `submitted_ftp` column.
+
 ### 2.7.0 -> 2.7.1 [not breaking]
 
 Only changes to the definition of the Sequencing Source File (`.ssf`):
diff --git a/janno_columns.tsv b/janno_columns.tsv
index 0ea8442..20a3e26 100644
--- a/janno_columns.tsv
+++ b/janno_columns.tsv
@@ -1,13 +1,20 @@
 janno_column_name	description	data_type	multi	choice	range	choice_options	range_lower	range_upper	mandatory	unique
-Poseidon_ID	sample identifier as defined by the genetics laboratory (e.g. I1234, BOT001), must fit to the values in the Poseidon package .fam/.ind file, must be unique within one package, if multiple datasets exist for the same individual different Poseidon_IDs are required	String	FALSE	FALSE	FALSE				TRUE	TRUE
-Genetic_Sex	genetic sex of the individual derived from this sample, only F, M or U because the EIGENSTRAT and PLINK formats only support these three, edge cases (e.g. XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a Note added	Char	FALSE	TRUE	FALSE	F;M;U			TRUE	FALSE
-Group_Name	meaningful population/group identifiers for the sample, should follow the geographic-temporal nomenclature proposed by Eisenmann et al. 2018 (https://doi.org/10.1038/s41598-018-31123-z), multiple entries separated by ;, the first value must be equal the group name in the .fam/.ind file	String	TRUE	FALSE	FALSE				TRUE	FALSE
-Alternative_IDs	alternative identifiers for the same sampled individual, e.g. IDs in other databases or popular names like Ötzi/Iceman	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Relation_To	other samples (by Poseidon_ID) that are related/identical to this sample, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Relation_Degree	relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations	String	TRUE	TRUE	FALSE	identical;first;second;thirdToFifth;sixthToTenth;unrelated;other			FALSE	FALSE
+Poseidon_ID	sample identifier as defined by the genetics laboratory (e.g. I1234, BOT001), must contain only the ASCII characters “A-Za-z0-9_-.”, must fit to the values in the Poseidon package .fam/.ind file, must be unique within one package, if multiple datasets exist for the same individual different Poseidon_IDs are required	String	FALSE	FALSE	FALSE				TRUE	TRUE
+Genetic_Sex	genetic sex of the individual derived from this sample, only F, M or U because the EIGENSTRAT and PLINK formats only support these three, edge cases (e.g. XXY, XYY, X0) can be documented in Chromosomal_Anomalies	Char	FALSE	TRUE	FALSE	F;M;U			TRUE	FALSE
+Group_Name	meaningful population/group identifiers for the sample, must contain only the ASCII characters “A-Za-z0-9_-.”, can follow the geographic-temporal nomenclature proposed by Eisenmann et al. 2018 (https://doi.org/10.1038/s41598-018-31123-z), or communicate additional categories that are meaningful for groupings in specific analyses, such as cultural labels, outlier status or relatedness to other samples, multiple entries separated by ;, the first value must be equal the group name in the .fam/.ind file	String	TRUE	FALSE	FALSE				TRUE	FALSE
+Individual_ID	identifier for the sampled individual	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Species	Species name of the sample. Should follow binomial nomenclature as standard in Biology, e.g. Homo sapiens.	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Alternative_IDs	alternative identifiers for the same sampled individual, e.g. IDs in other databases or popular names like Ötzi/Iceman. Alternative_IDs and Alternative_IDs_Context must have the same number and order of entries	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Alternative_IDs_Context	context of Alternative_IDs, so e.g. the name of the database (like AADRv62) where a respective identifier is used, or just “popular” for a common non-scientific name used in media and public discussions	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Relation_To	other individuals (by Individual_ID) that are related to the individual this sample derived from, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Relation_Degree	relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations. Here, identical refers to identical twins. Samples from the same individual should be encoded through a common Individual_ID.	String	TRUE	TRUE	FALSE	identical;first;second;thirdToFifth;sixthToTenth;unrelated;other			FALSE	FALSE
 Relation_Type	relationship type for relatives mentioned in Related_To (e.g. sister_of, child_of, nephew_of), multiple values separated by ; in the same order as Related_To in case of multiple relations	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Relation_Note	arbitrary comments about the genetic relationships of the sampled individual	String	FALSE	FALSE	FALSE				FALSE	FALSE
-Collection_ID	alternative sample identifier shared by the provider/owner of the sample (e.g. grave 40 skeleton 2)	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Collection_ID	alternative sample identifiers shared by the provider/owner of the sample (e.g. grave 40 skeleton 2), multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Custodian_Institution	institution that curated the sampled remains at the time of sampling, with name, city and country, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Cultural_Era	the cultural eras approximating the period in which the sampled individual lived (e.g. Danish Bronze Age, Pre-Pottery Neolithic A), if possible taken from an established space-time gazetteer like ChronOntology (https://chronontology.dainst.org) or PeriodO (https://perio.do), multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Cultural_Era_URL	urls pointing to definitions of cultural eras, so permalinks complementing the human-readable identifiers in Cultural_Era (e.g. https://n2t.net/ark:/99152/p0zj6g8ks9s, https://chronontology.dainst.org/period/Gx4uxaeTCbbg), multiple entries separated by ; in the same order as Cultural_Era	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Archaeological_Culture	the archaeological cultures, technocomplexes, (pottery) styles or political entities the sampled individual can be associated to (e.g. Hallstatt culture (Hungary), Neo-Assyrian Empire), if possible taken from an established space-time gazetteer like ChronOntology (https://chronontology.dainst.org) or PeriodO (https://perio.do), multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Archaeological_Culture_URL	urls pointing to definitions of archaeological cultures, so permalinks complementing the human-readable identifiers in Archaeological_Culture (e.g. https://n2t.net/ark:/99152/p0nxc78fxgt, https://chronontology.dainst.org/period/bvLwqFcGyoaL), multiple entries separated by ; in the same order as Archaeological_Culture	String	TRUE	FALSE	FALSE				FALSE	FALSE
 Country	present-day political country of origin for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
 Country_ISO	present-day political country expressed in ISO 3166-1 alpha-2 country codes	String	FALSE	FALSE	FALSE				FALSE	FALSE
 Location	unspecified location information for the sample, e.g. administrative or topographic region or mountains/rivers/lakes/cities nearby	String	FALSE	FALSE	FALSE				FALSE	FALSE
@@ -21,25 +28,24 @@ Date_C14_Uncal_BP_Err	standard deviation (1-sigma ±) for the uncalibrated C14 a
 Date_BC_AD_Start	lower (older) bound for the age of the sample in years BC/AD, negative numbers for BC, positive numbers for AD, in case of C14 dates 2-sigma post calibration interval, 2000 for modern samples	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
 Date_BC_AD_Median	median age of the sample in years BC/AD, for C14-dated samples median, for contextually dated samples simple mid-point of the archaeological intervals, 2000 for modern samples	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
 Date_BC_AD_Stop	upper (more recent) bound for the age of the sample in years BC/AD, counter point to Date_BC_AD_Start	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
-Date_Note	arbitrary comments about the dating information for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
-MT_Haplogroup	mitochondrial haplogroup derived for the sample as specified on phylotree.org and as reported by the Haplofind or Haplogrep software tools	String	FALSE	FALSE	FALSE				FALSE	FALSE
-Y_Haplogroup	Y-chromosome haplogroup derived for the sample following a syntax with the main branch + the most terminal derived Y-SNP (e.g. R1b-P312)	String	FALSE	FALSE	FALSE				FALSE	FALSE
-Source_Tissue	skeletal element, tissue or other material sampled, the specific bone should be reported after an underscore (e.g. bone_phalanx), multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+Chromosomal_Anomalies	genetic anomalies on the chromosome level detected for the sample, so extra, missing or irregual portions of chromosomal DNA, including gonosomal and autosomal aneuploidies, multiple entries separated by ; Suggestions include XXY, XYY, XXX, X0, Trisomy21, Trisomy18	String	TRUE	FALSE	FALSE				FALSE	FALSE
+MT_Haplogroup	mitochondrial haplogroup derived for the sample. For human data, this should follow the specification on phylotree.org and as reported by the Haplofind or Haplogrep software tools	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Y_Haplogroup	Y-chromosome haplogroup derived for the sample. For human data this should follow a syntax with the main branch + the most terminal derived Y-SNP (e.g. R1b-P312)	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Source_Material	sampled material, multiple entries separated by ;	String	TRUE	TRUE	FALSE	petrous;bone;tooth;hair;soft;sediment;other			FALSE	FALSE
 Nr_Libraries	number of libraries produced for the sample	Integer	FALSE	FALSE	FALSE				FALSE	FALSE
 Library_Names	identifiers of the libraries used to generate the genotype data for the sample, multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Capture_Type	specifics of the data generation method (e.g. capture method) for the individual libraries generated for the sample, multiple values separated by ;	String	TRUE	TRUE	FALSE	Shotgun;1240K;ArborComplete;ArborPrimePlus;ArborAncestralPlus;TwistAncientDNA;OtherCapture;ReferenceGenome			FALSE	FALSE
-UDG 	udg treatment for the libraries, mixed in case multiple libraries with different UDG treatment were merged	String	FALSE	TRUE	FALSE	minus;half;plus;mixed			FALSE	FALSE
+Capture_Type	capture method for the individual libraries generated for the sample, multiple values separated by ;	String	TRUE	TRUE	FALSE	Shotgun;1240K;ArborComplete;ArborPrimePlus;ArborAncestralPlus;TwistAncientDNA;WISC2013;OtherCapture			FALSE	FALSE
+UDG	udg treatment for the libraries, mixed in case multiple libraries with different UDG treatment were merged	String	FALSE	TRUE	FALSE	minus;half;plus;mixed			FALSE	FALSE
 Library_Built	strandedness of the libraries, “mixed” in case multiple libraries with different protocols were merged	String	FALSE	TRUE	FALSE	ds;ss;mixed			FALSE	FALSE
 Genotype_Ploidy	ploidy of the genotypes for the sample	String	FALSE	TRUE	FALSE	diploid;haploid			FALSE	FALSE
 Data_Preparation_Pipeline_URL	url pointing to a description of the computational pipeline used to generate the genotype data from the source data	String	FALSE	FALSE	FALSE				FALSE	FALSE
-Endogenous	% endogenous DNA as estimated from SG libraries (before capture) as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries only the highest values should be reported	Float	FALSE	FALSE	TRUE		0	100	FALSE	FALSE
+Endogenous	fraction (ranging between 0 and 1) of endogenous DNA estimated from SG libraries (before capture) as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries only the highest values should be reported	Float	FALSE	FALSE	TRUE		0	1	FALSE	FALSE
 Nr_SNPs	number of non-missing SNPs for the sample, counted on the SNP-set stored in the Poseidon package	Integer	FALSE	FALSE	FALSE				FALSE	FALSE
 Coverage_on_Target_SNPs	average X-fold coverage across targeted SNP sites after quality filtering	Float	FALSE	FALSE	FALSE				FALSE	FALSE
-Damage	% damage on the 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries a value from the merged read alignment should be reported	Float	FALSE	FALSE	TRUE		0	100	FALSE	FALSE
+Damage	fraction (ranging between 0 and 1) of damage on the 5' end of the libraries used for sequencing, in case of multiple libraries either report multiple values separated by ;, or a single value from the merged read alignment	Float	TRUE	FALSE	TRUE		0	1	FALSE	FALSE
 Contamination	(modern) contamination of the sample as measured by the method in Contamination_Meas, multiple values separated by ; (for different methods, in case of multiple libraries report a value from the merged read alignment), the variables Contamination, Contamination_Err and Contamination_Meas must have the same number and order of (non-n/a) entries	String	TRUE	FALSE	FALSE				FALSE	FALSE
 Contamination_Err	(modern) contamination estimate error of the sample	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Contamination_Meas	method to measure contamination, should be a software tool (ANGSD, Schmutzi, …) and the respective software versions, details should go to Contamination_Note	String	TRUE	FALSE	FALSE				FALSE	FALSE
-Contamination_Note	arbitrary comments about the contamination estimation	String	FALSE	FALSE	FALSE				FALSE	FALSE
+Contamination_Meas	method to measure contamination, should be a software tool (ANGSD, Schmutzi, …) and the respective software versions, details can go to a Contamination_Note column	String	TRUE	FALSE	FALSE				FALSE	FALSE
 Genetic_Source_Accession_IDs	ENA or SRA accession identifiers pointing to the source data used to generate the genotyping data for the sample, multiple values separated by ;, if multiple are given they should be arranged by descending specificity (e.g. project id > sample id > sequencing run id)	String	TRUE	FALSE	FALSE				FALSE	FALSE
 Primary_Contact	project lead or first author who generated and published the data for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
 Publication	bibtex keys for the publications where a sample was published (e.g. “AuthorJournalYear“) or “unpublished“, multiple values separated by ;, all must be present with complete BibTeX entries in the Poseidon package’s .bib file	String	TRUE	FALSE	FALSE				FALSE	FALSE
diff --git a/poseidon_package_specification.pdf b/poseidon_package_specification.pdf
index 9f2218e..51aa6ee 100644
Binary files a/poseidon_package_specification.pdf and b/poseidon_package_specification.pdf differ
diff --git a/ssf_columns.tsv b/ssf_columns.tsv
index 4993d21..9f49346 100644
--- a/ssf_columns.tsv
+++ b/ssf_columns.tsv
@@ -21,3 +21,4 @@ fastq_bytes	number of bytes in the FASTQ files, multiple entries separated by ;,
 fastq_md5	md5 hashes of the FASTQ files, multiple entries separated by ;, must be in the same order as the ftp and/or aspera links	String	TRUE	FALSE	FALSE				FALSE	FALSE
 read_count	number of reads in the sequencing entity	Integer	FALSE	FALSE	TRUE		0	Inf	FALSE	FALSE
 submitted_ftp	urls to the originally submitted files before they got converted to FASTQ in the INSDC databases, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
+submitted_md5	md5 hashes of the originally submitted files before they got converted to FASTQ in the INSDC databases, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE