-
Notifications
You must be signed in to change notification settings - Fork 79
Closed
Milestone
Description
I get strange results when trying to calculate the AFS for the following dataset:
// Same as the example from the PLOS paper but with back and recurring mutations
/*
0.25┊ 8 ┊ ┊ ┊
┊ ┏━┻━┓ ┊ ┊ ┊
0.20┊ ┃ ┃ ┊ ┊ 7 ┊
┊ ┃ ┃ ┊ ┊ ┏━┻━┓ ┊
0.17┊ 6 ┃ ┊ 6 ┊ ┃ ┃ ┊
┊ ┏━┻┓ ┃ ┊ ┏━┻━┓ ┊ ┃ ┃ ┊
0.09┊ ┃ 5 ┃ ┊ ┃ 5 ┊ ┃ 5 ┊
┊ ┃ ┏┻┓ ┃ ┊ ┃ ┏━┻┓ ┊ ┃ ┏━┻┓ ┊
0.07┊ ┃ ┃ ┃ ┃ ┊ ┃ ┃ 4 ┊ ┃ ┃ 4 ┊
┊ ┃ ┃ ┃ ┃ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊
0.00┊ 0 1 3 2 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊
0.00 2.00 7.00 10.00
*/
char const* multi_tree_back_recurrent_nodes = "1 0 -1 0\n"
"1 0 -1 0\n"
"1 0 -1 1\n"
"1 0 -1 1\n"
"0 0.071 -1 -1\n"
"0 0.090 -1 -1\n"
"0 0.170 -1 -1\n"
"0 0.202 -1 -1\n"
"0 0.253 -1 -1\n";
char const* multi_tree_back_recurrent_edges = "2 10 4 2\n"
"2 10 4 3\n"
"0 10 5 1\n"
"0 2 5 3\n"
"2 10 5 4\n"
"0 7 6 0,5\n"
"7 10 7 0,5\n"
"0 2 8 2,6\n";
char const* multi_tree_back_recurrent_sites = "1 0\n"
"4.5 0\n"
"8.5 0\n";
/* site, node, derived_state, [parent, time] */
char const* multi_tree_back_recurrent_mutations = "0 6 1\n" // to derived state
"0 5 0\n" // back to ancestral state
"0 3 1\n" // and once more to the derived state
"1 5 1\n" // to derived state
"1 4 0\n" // back to ancestral state
"2 4 1\n"; // to derived state
/* Two (diploid) individuals */
char const* multi_tree_back_recurrent_individuals = "0 0.2,1.5 -1,-1\n"
"0 0.0,0.0 -1,-1\n";
// [...]
tsk_treeseq_t tskit_tree_sequence;
tsk_id_t samples[] = {0, 1, 2, 3};
tsk_size_t sample_set_sizes[] = {4, 0};
double ref_result[5];
int ret;
tsk_treeseq_from_text(
&tskit_tree_sequence,
10,
multi_tree_back_recurrent_nodes,
multi_tree_back_recurrent_edges,
NULL,
multi_tree_back_recurrent_sites,
multi_tree_back_recurrent_mutations,
multi_tree_back_recurrent_individuals,
NULL,
0
);
// Polarised AFS
ret = tsk_treeseq_allele_frequency_spectrum(
&tskit_tree_sequence,
1,
sample_set_sizes,
samples,
0,
NULL,
TSK_STAT_POLARISED,
ref_result
);
assert(ret == 0);
fmt::print("tskit AFS: {}\n", fmt::join(ref_result, ", "));
// Output: tskit AFS: 0, 0, 1, 1, 0
// Expected output: tskit AFS: 0, 1, 2, 0, 0Am I doing something wrong when calling the tskit functions? Shouldn't the entries in the AFS histogram always sum up to the number of sites (3 in this case)?
I also find something else unintuitive, maybe you can help: If all sites are in the derived state, why is the site counted as if all sites were in ancestral state?
Metadata
Metadata
Assignees
Labels
No labels