Inquiring about the package's capabilities in specific application in Proteomics #9
Replies: 5 comments
-
|
Hi @eneskemalergin, thanks for the kind words and for taking the time to evaluate dimtensor! Your proteomics workflow sounds like a really interesting use case. Happy to give you an honest picture of where dimtensor stands for this kind of work: What dimtensor offers todayTwo-tier uncertainty propagation (v4.5.0+):
Error budget analysis — GUM-compliant sensitivity decomposition that identifies which inputs dominate total uncertainty. Useful for pinpointing where measurement improvements would have the most impact. Limitations relevant to your workflowA few things to flag honestly:
My honest takedimtensor would give you correct uncertainty propagation for individual arithmetic and aggregation steps, and the Monte Carlo tier could handle correlated inputs. But for a full hierarchical proteomics pipeline with correlation tracking across aggregation levels, you'd likely need to build meaningful scaffolding on top of dimtensor — or it might make more sense as a purpose-built solution that borrows ideas from dimtensor's approach. That said, I'd love to understand your workflow in more detail. If you could share your specific questions about correlation handling, independence assumptions, and the scale of tensors you're working with, I can give you a more concrete assessment of what would work out of the box vs. what would need extension. Feel free to post your questions here or open separate discussions for each topic — happy to dig into the details! |
Beta Was this translation helpful? Give feedback.
-
|
@marcoloco23, thanks for the breakdown. Current proteomics pipelines aggregate measured intensities into peptide-spectrum matches (PSMs). These PSMs group into peptides, which roll up into proteins. Search tools often report confidence scores to filter reliable signals, but they discard measurement variance during downstream aggregation or assume statistical independence between correlated inputs. I plan to build a pipeline where variance propagates natively through every stage. Every data point will carry its own variance. The system will execute operations like slicing, broadcasting, reshaping, and matrix multiplication across hundreds of thousands of PSMs. We must prevent value and variance arrays from desynchronizing during these steps. How does dimtensor store uncertainty internally? Do values and variances sit in separate contiguous buffers, interleave, or exist as metadata attached to the array? I need to evaluate/understand three other areas to determine if dimtensor fits this pipeline:
|
Beta Was this translation helpful? Give feedback.
-
|
Great questions @eneskemalergin — these get right to the heart of whether dimtensor fits your pipeline. Let me answer each one. How uncertainty is stored internallyValues and uncertainties live in separate contiguous NumPy arrays — the
The uncertainty array is optional (defaults to Correlation trackingThis is the biggest gap for your use case. The analytical propagation tier does not track covariances — it stores only marginal uncertainties per element. When you add or multiply two The Monte Carlo tier supports an explicit correlation matrix as input, but it doesn't propagate a covariance structure through a chain of operations — you provide the correlations upfront for a single function evaluation. For a multi-stage aggregation pipeline (fragments → PSMs → peptides → proteins), you would need to either:
Neither is automatic today. Performance at scaleFor element-wise operations (add, multiply, divide), uncertainty propagation is O(N) with small constant overhead — it's just NumPy arithmetic on the second array. Hundreds of thousands of PSMs should be fine. However, Domain operations
Bottom linedimtensor would handle value+variance co-storage, slicing, broadcasting, and element-wise propagation reliably at scale. But the three things your pipeline specifically needs — covariance tracking through aggregation stages, uncertainty-aware matmul, and inverse-variance weighted aggregation — are not implemented today. I think you'd be better served by a purpose-built solution for this. That said, if you're interested in contributing or co-designing a covariance-tracking extension, I'd be very open to that conversation. The internal architecture (separate contiguous buffers, What do you think — would a collaboration on covariance tracking be interesting, or does your timeline require a standalone solution? |
Beta Was this translation helpful? Give feedback.
-
|
Quick update @eneskemalergin — based on your questions, I've just added three features that address the practical gaps: New in latest main:
These won't solve the full covariance tracking problem for your multi-stage pipeline, but they should make the individual aggregation steps work correctly with uncertainty. Available on |
Beta Was this translation helpful? Give feedback.
-
|
Wow, thanks for the reply and the quick fixes to the updates on dot and matmul to keep uncertainty. Seems like, apart from things carrying over, a lot of the core needs are already within your project. I think I will take a look and build a quick demo/application to confirm the carryover is patched outside the dimtensor, or, if that is too much of a hindrance to performance or tracking, I will consider expanding this project before deciding on creating my own package. I will let you know about the progress (will likely work on Friday/next weekend). Appreciate the very helpful responses and willing to be of help :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @marcoloco23, very cool project.
I'm a researcher working on uncertainty propagation in proteomics, where we need to track measurement variance through multiple aggregation levels (fragments → PSMs → peptides → proteins). I'm evaluating dimtensor for this use case. I have a few questions about how it handles correlation between derived tensors, independence assumptions, and performance for large-scale tensor operations. I have couple of questions to really understand the scope and capabilities of this great-looking idea. I was ready to build a full-fledge solution but if a tool exists for it why re-invent the wheel.
Would you be open to discussing whether dimtensor could support this workflow, or if I'd be better off building a purpose-built solution?
Beta Was this translation helpful? Give feedback.
All reactions