Ordering, weighting, and polymorphisms #190
-
|
Hi Martin - I've been looking into TreeSearch and associated packages as the downstream tools for analysis and visualisation look really useful indeed. Many congratulations and thanks for an excellent package. I find myself wondering how best to proceed, given my phylogeny has the following complications:
If I use TreeSearch, I note I can create a new PhyDat using TreeTools::Decompose() to define my ordered characters, which is great. But I was hoping I could ask for your expertise regarding the implicartions of doing so:
Thanks in advance for any guidance you can give (and I don't expect miracles, of course - TNT for instance can't handle polymorphisms in an ordered character). |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
|
Hi Russell, Glad that the package looks useful. I can provide my take on your questions, but these thoughts are always going to be subjective – whilst a probabilistic setting can often offer a mathematical definition of the 'correct' treatment and a mechanism for deciding between alternative models, in parsimony things become a bit more philosophical (and, I sometimes feel, subjective). On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more – at least, there are no criteria by which a weighting scheme can be determined. This said, it does feel warranted in certain cases. One example might be the implementation of continuous characters that (to simplify slightly) treats a continuous character as a 65536-state ordered character – such characters can then overwhelm any signal in rest of a matrix of a few hundred characters. A real world case is a lobopodian phylogeny that includes ratio-based continuous characters – the resulting tree results in a prominently different topology from other studies that ends up separating 'long thin' lobopodians from 'short fat' ones. What can be interesting to do under parsimony is to explore which characters are responsible for certain aspects of tree topology; re-weighting characters is a possible way to do this in an informal way. The simplest way to implement this at present is to manipulate the input data in R, by creating two copies of any character you want to double-weight, mutatis mutandis. You could do this by duplicating rows of the input matrix, or modifying the On (2), again, implied weights diverges from the principle that 'only a tree with the fewest steps is scientifically valid' that underpins parsimony; once this principle is abandoned, I don't see any logical basis for preferring a particular weighting schema. Either of the treatments of ordered characters (with or without decomposition) are equally 'invalid', just as there is no empirically 'correct' concavity constant. Implied weighting again imposes a model of the evolutionary process, but as this model is non-probabilistic there is no way to compare the quality of models beyond intuition and empirical analyses. If the approaches give different results, there's no objective reason to prefer one result to the other. (3) does have a straightforward answer (or possibly two?):
|
Beta Was this translation helpful? Give feedback.
-
|
Hi Martin, Thanks ever so much for this comprehensive set of thoughts. I kind of anticipated that there wouldn't be a single correct solution! As such, your thoughts are much appreciated. I note, in case of interest, that this also touches on the question: -- On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more This not an answer I had anticipated, but now you say it it makes sense. It is also a justification for not reweighting the characters. In light of your comments, the paper above (which highlights that at 10% ordered characters this might have an impact; my percentage of ordered characters is much lower), and the limited number of states I have in my ordered characters (max is 20, then 8, then 3 at most), I suspect that in reality it won't make much difference. I will run with rewighting and without and compare, then decide what to do from there. -- My gut feels uncomfortable with the imbalance here – but if I addressed this by reweighting characters, essentially I would be designing my own model of evolution. Also a good note of caution - given this is an arachnid phylogeny and I foresee much discussion of topologies, I can always present both with this caveat (I will be presenting a probabilistic tree too). -- modifying the weights and index attributes of a phyDat object. This was my thought too - I stalled as I failed to find documentation of phyDats that confirmed that the weights in the object did in fact relate to the character weight (it felt like a safe assumption, but I also wanted to be sure!) or that TreeSearch would integrate these into its search if I did modify them - I didn't see this in the bits of code I was looking at, but might have been looking in the wrong place. Can you confirm it does? -- (Perhaps a cautionary rant in the documentation would mitigate this danger.) For what it is worth, I think this would be a useful contribution given how rarely this is actually discussed in the literature - I have struggled somewhat to find much on this: even when weightedd characters are applied in a given study, that tends to be done without a justification (I suspect I have been guilty of this myself in the past) -- On (2), again, implied weights diverges from the principle that 'only a tree with the fewest steps is scientifically valid' that underpins parsimony; once this principle is abandoned, I don't see any logical basis for preferring a particular weighting schema. Either of the treatments of ordered characters (with or without decomposition) are equally 'invalid', just as there is no empirically 'correct' concavity constant. Implied weighting again imposes a model of the evolutionary process, but as this model is non-probabilistic there is no way to compare the quality of models beyond intuition and empirical analyses. If the approaches give different results, there's no objective reason to prefer one result to the other. Thanks, this is very clear. -- Of course, under TNT you would be using a different treatment of inapplicable characters, which I suspect would have a much more prominent influence on tree topology than how an ordered character is treated. Indeed. And I assume if I wanted to test the impact of deecomposition under IW using both ways of treating inapplicables, I could just create implement the traditional approach as a custom optimality criterion in treesearch and do it as part of the same pipeline rather than having to employ two different packages, which is attractive. -- (3) does have a straightforward answer (or possibly two?): I think mine is the latter case. As such, I shall decompose and see if any errors occur! Thanks again for the food for thought. Much appreciated - and for writing this package. |
Beta Was this translation helpful? Give feedback.
-
|
-- the thing to watch out for is that matrices are by default compressed, so two characters coded as 1100 will be represented in a compressed phyDat object as a single entry with weight two. Thanks for the heads up - this requires care when accessing individual characters to weight! -- The easiest way to use the Fitch algorithm in TreeSearch is to replace all 'doh, of course. Man, I'm overthinking things this week! |
Beta Was this translation helpful? Give feedback.
-
|
Wow, that was fast! Many thanks, I shall have a lot of fun playing with this lot - one character left to code and I start analysing it all... |
Beta Was this translation helpful? Give feedback.
Hi Russell,
Glad that the package looks useful. I can provide my take on your questions, but these thoughts are always going to be subjective – whilst a probabilistic setting can often offer a mathematical definition of the 'correct' treatment and a mechanism for deciding between alternative models, in parsimony things become a bit more philosophical (and, I sometimes feel, subjective).
On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more – at least, there are no criteria by which a weighting scheme can be determined. This said, it does fee…