Ordering, weighting, and polymorphisms #190

RussellGarwood · 2025-06-25T16:23:46Z

RussellGarwood
Jun 25, 2025

Hi Martin - I've been looking into TreeSearch and associated packages as the downstream tools for analysis and visualisation look really useful indeed. Many congratulations and thanks for an excellent package. I find myself wondering how best to proceed, given my phylogeny has the following complications:

It has a (limited) number of ordered multistate characters
I have been considering adjusting the weighting so those multistate characters don't have a larger impact on the resulting topology than non multistate characters (I am very open to discussion of whether this is a valid thing to do, of course).
One of my ordered multistate characters also has polymorphisms in it (because scorpions are annoying and individuals can have different numbers of lateral eyes on the left and right side).

If I use TreeSearch, I note I can create a new PhyDat using TreeTools::Decompose() to define my ordered characters, which is great. But I was hoping I could ask for your expertise regarding the implicartions of doing so:

Your view on whether reweighting ordered multistate characters is a sensible approach would be warmly welcomed: I haven't found much literature on this. If I were to choose to do so, is there a way I can add weighting. Sorry if this is obviously addressed somewhere and I have missed it.
If I also want to present an implied weights analysis, and I choose to continue with ordered characters, would it be wise to run these in TNT given your comment "(This equivalence is not exact under implied weights or under probabilistic tree inference methods.)" - I can obviously just run both and compare, but I am interested in your take.
Is there any way to address the position point 3 above places me in. It strikes me, in the situation, as reasonable to just replace the polymorphism with that which implies the less number of state changes in the search, but I imagine I would need to do this in the matrix, before calling decompose.

Thanks in advance for any guidance you can give (and I don't expect miracles, of course - TNT for instance can't handle polymorphisms in an ordered character).

Answered by ms609

Jun 26, 2025

Hi Russell,

Glad that the package looks useful. I can provide my take on your questions, but these thoughts are always going to be subjective – whilst a probabilistic setting can often offer a mathematical definition of the 'correct' treatment and a mechanism for deciding between alternative models, in parsimony things become a bit more philosophical (and, I sometimes feel, subjective).

On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more – at least, there are no criteria by which a weighting scheme can be determined. This said, it does fee…

View full answer

ms609 · 2025-06-26T08:49:46Z

ms609
Jun 26, 2025
Maintainer

Hi Russell,

Glad that the package looks useful. I can provide my take on your questions, but these thoughts are always going to be subjective – whilst a probabilistic setting can often offer a mathematical definition of the 'correct' treatment and a mechanism for deciding between alternative models, in parsimony things become a bit more philosophical (and, I sometimes feel, subjective).

On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more – at least, there are no criteria by which a weighting scheme can be determined. This said, it does feel warranted in certain cases. One example might be the implementation of continuous characters that (to simplify slightly) treats a continuous character as a 65536-state ordered character – such characters can then overwhelm any signal in rest of a matrix of a few hundred characters. A real world case is a lobopodian phylogeny that includes ratio-based continuous characters – the resulting tree results in a prominently different topology from other studies that ends up separating 'long thin' lobopodians from 'short fat' ones.
My gut feels uncomfortable with the imbalance here – but if I addressed this by reweighting characters, essentially I would be designing my own model of evolution. In a parsimony framework, I don't see any criterion for evaluating whether one model should be preferred over another. If I'm developing my own models, a probabilistic framework feels more appropriate; I can compare fit, marginal likelihoods, posterior predictive power, and other properties to evaluate whether a model is suited to my dataset.

What can be interesting to do under parsimony is to explore which characters are responsible for certain aspects of tree topology; re-weighting characters is a possible way to do this in an informal way. The simplest way to implement this at present is to manipulate the input data in R, by creating two copies of any character you want to double-weight, mutatis mutandis. You could do this by duplicating rows of the input matrix, or modifying the weights and index attributes of a phyDat object. It would be reasonably straightforward to provide a more user-friendly way of doing this, but I'm reluctant to, due to the danger that it might tempt users to make unjustified decisions without understanding their consequences.
(Perhaps a cautionary rant in the documentation would mitigate this danger.) And there are better ways to explore the link between characters and splits (indeed one of my own devising is almost ready to merge into TreeSearch, once I convince myself of its validity).

On (2), again, implied weights diverges from the principle that 'only a tree with the fewest steps is scientifically valid' that underpins parsimony; once this principle is abandoned, I don't see any logical basis for preferring a particular weighting schema. Either of the treatments of ordered characters (with or without decomposition) are equally 'invalid', just as there is no empirically 'correct' concavity constant. Implied weighting again imposes a model of the evolutionary process, but as this model is non-probabilistic there is no way to compare the quality of models beyond intuition and empirical analyses. If the approaches give different results, there's no objective reason to prefer one result to the other.
Of course, under TNT you would be using a different treatment of inapplicable characters, which I suspect would have a much more prominent influence on tree topology than how an ordered character is treated.

(3) does have a straightforward answer (or possibly two?):

if you are treating polymorphic characters as polymorphisms, then there is no mechanism for doing this within TreeSearch.
if you are treating polymorphic characters as ambiguous (in which case the algorithm will determine which state implies the lowest number of changes in the search), you ought to decompose the character accordingly. I don't recall whether Decompose() handles ambiguous characters – as they're not documented separately, I hope so!

0 replies

RussellGarwood · 2025-06-27T08:32:38Z

RussellGarwood
Jun 27, 2025
Author

Hi Martin,

Thanks ever so much for this comprehensive set of thoughts. I kind of anticipated that there wouldn't be a single correct solution! As such, your thoughts are much appreciated. I note, in case of interest, that this also touches on the question:
https://www.biorxiv.org/content/biorxiv/early/2025/04/26/2025.04.22.650124.full.pdf

-- On (1), the underlying principle of parsimony is that all evolutionary steps are equivalent. Reweighting individual characters is at odds with that principle, so is not really parsimony analysis any more

This not an answer I had anticipated, but now you say it it makes sense. It is also a justification for not reweighting the characters. In light of your comments, the paper above (which highlights that at 10% ordered characters this might have an impact; my percentage of ordered characters is much lower), and the limited number of states I have in my ordered characters (max is 20, then 8, then 3 at most), I suspect that in reality it won't make much difference. I will run with rewighting and without and compare, then decide what to do from there.

-- My gut feels uncomfortable with the imbalance here – but if I addressed this by reweighting characters, essentially I would be designing my own model of evolution.

Also a good note of caution - given this is an arachnid phylogeny and I foresee much discussion of topologies, I can always present both with this caveat (I will be presenting a probabilistic tree too).

-- modifying the weights and index attributes of a phyDat object.

This was my thought too - I stalled as I failed to find documentation of phyDats that confirmed that the weights in the object did in fact relate to the character weight (it felt like a safe assumption, but I also wanted to be sure!) or that TreeSearch would integrate these into its search if I did modify them - I didn't see this in the bits of code I was looking at, but might have been looking in the wrong place. Can you confirm it does?

-- (Perhaps a cautionary rant in the documentation would mitigate this danger.)

For what it is worth, I think this would be a useful contribution given how rarely this is actually discussed in the literature - I have struggled somewhat to find much on this: even when weightedd characters are applied in a given study, that tends to be done without a justification (I suspect I have been guilty of this myself in the past)

-- On (2), again, implied weights diverges from the principle that 'only a tree with the fewest steps is scientifically valid' that underpins parsimony; once this principle is abandoned, I don't see any logical basis for preferring a particular weighting schema. Either of the treatments of ordered characters (with or without decomposition) are equally 'invalid', just as there is no empirically 'correct' concavity constant. Implied weighting again imposes a model of the evolutionary process, but as this model is non-probabilistic there is no way to compare the quality of models beyond intuition and empirical analyses. If the approaches give different results, there's no objective reason to prefer one result to the other.

Thanks, this is very clear.

-- Of course, under TNT you would be using a different treatment of inapplicable characters, which I suspect would have a much more prominent influence on tree topology than how an ordered character is treated.

Indeed. And I assume if I wanted to test the impact of deecomposition under IW using both ways of treating inapplicables, I could just create implement the traditional approach as a custom optimality criterion in treesearch and do it as part of the same pipeline rather than having to employ two different packages, which is attractive.

-- (3) does have a straightforward answer (or possibly two?):

I think mine is the latter case. As such, I shall decompose and see if any errors occur!

Thanks again for the food for thought. Much appreciated - and for writing this package.

1 reply

ms609 Jun 27, 2025
Maintainer

Thanks Russell, perhaps I'll cobble together a matrix weighting function, then. TreeSearch does handle the "weights" attribute of a phyDat matrix; the thing to watch out for is that matrices are by default compressed, so two characters coded as 1100 will be represented in a compressed phyDat object as a single entry with weight two.

The easiest way to use the Fitch algorithm in TreeSearch is to replace all - entries with ? after loading the matrix in, perhaps by something like mat <- PhyDatToMatrix(phyDat); mat[mat == "-"] <- "?"; phyDat <- MatrixToPhyDat(mat)

RussellGarwood · 2025-06-27T09:53:49Z

RussellGarwood
Jun 27, 2025
Author

-- the thing to watch out for is that matrices are by default compressed, so two characters coded as 1100 will be represented in a compressed phyDat object as a single entry with weight two.

Thanks for the heads up - this requires care when accessing individual characters to weight!

-- The easiest way to use the Fitch algorithm in TreeSearch is to replace all

'doh, of course. Man, I'm overthinking things this week!

1 reply

ms609 Jun 27, 2025
Maintainer

I've put together an implementation of a Reweight() function, which you should be able to test via ms609/TreeTools#176. I've allowed for a few edge cases and this seems to work on real datasets - but do let me know how you get on with it!

RussellGarwood · 2025-06-27T12:25:20Z

RussellGarwood
Jun 27, 2025
Author

Wow, that was fast! Many thanks, I shall have a lot of fun playing with this lot - one character left to code and I start analysing it all...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ordering, weighting, and polymorphisms #190

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ordering, weighting, and polymorphisms #190

Uh oh!

RussellGarwood Jun 25, 2025

Replies: 4 comments · 2 replies

Uh oh!

Uh oh!

ms609 Jun 26, 2025 Maintainer

Uh oh!

RussellGarwood Jun 27, 2025 Author

Uh oh!

ms609 Jun 27, 2025 Maintainer

Uh oh!

RussellGarwood Jun 27, 2025 Author

Uh oh!

ms609 Jun 27, 2025 Maintainer

Uh oh!

RussellGarwood Jun 27, 2025 Author

RussellGarwood
Jun 25, 2025

Replies: 4 comments 2 replies

ms609
Jun 26, 2025
Maintainer

RussellGarwood
Jun 27, 2025
Author

ms609 Jun 27, 2025
Maintainer

RussellGarwood
Jun 27, 2025
Author

ms609 Jun 27, 2025
Maintainer

RussellGarwood
Jun 27, 2025
Author