Conversation
There was a problem hiding this comment.
Thanks for these contributions! They are much appreciated!!
I left comments directly in the code. Please have a look.
Other than that, I expect these changes will affect the quickstart and conventional cameras guide. It would be good to update these and expand the latter to reflect these novel capabilities.
Adding a few tests would also be nice, if anything as to make sure there are no future regressions.
Note: I also merged with main, and did some minimal reformatting to pass the existing tests. You'll want to pull.
| self.ts = np.linspace(0, 1, len(self.transforms)) if ts is None else np.array(ts) | ||
| self.determinants = np.linalg.det(self.transforms[:, :3, :3]) | ||
|
|
||
| k = min(len(self.transforms) - 1, k) # for small chunk_sizes |
There was a problem hiding this comment.
We should probably emit a warning here instead of silently downgrading the interpolation order.
pyproject.toml
Outdated
| dependencies = [ | ||
| "opencv-python", | ||
| "imageio", | ||
| "scikit-image", |
There was a problem hiding this comment.
Do we need this extra dependency? I think we're only using it for basic filtering/converting to grayscale, which can all be done with torch(vision) which are already bundled.
There was a problem hiding this comment.
I'm not sure if torchvision has unsharp masking implemented. skimage is pretty stable (in my experience), so maybe it's an okay dependency?
There was a problem hiding this comment.
I'm not worried about stability; scikit is excellent. I just want to minimize bloat. Isn't this basically a convolution?
There was a problem hiding this comment.
I guess so, yeah. Will add something to one of the utils files then.
There was a problem hiding this comment.
Replaced the skimage stuff with existing dependencies.
visionsim/utils/color.py
Outdated
| return img | ||
|
|
||
|
|
||
| def rgb2raw_bayer(rgb: torch.Tensor | npt.NDArray, cfa_pattern: Literal["rggb"] = "rggb") -> torch.Tensor | npt.NDArray: |
There was a problem hiding this comment.
What happens if the input has an alpha channel? Maybe also expand the docstring to describe the changes in image dimensionality this does, eg: RGB->Luma
Nitpick: can we expand the 2, so it's consistent with the other methods? Maybe just rgb_to_bayer, same for the inverse operation.
There was a problem hiding this comment.
Will add alpha-handling, and the docstring/naming updates.
visionsim/utils/color.py
Outdated
| def raw2rgb_bayer( | ||
| raw: torch.Tensor | npt.NDArray, | ||
| cfa_pattern: Literal["rggb"] = "rggb", | ||
| method: Literal["off", "bilinear", "MHC04"] = "bilinear", | ||
| ) -> torch.Tensor | npt.NDArray: |
There was a problem hiding this comment.
Maybe the method should be Literal["bilinear", "MHC04"] | None = "bilinear" to make it more clear that if it's none then this is effectively a No-OP?
There was a problem hiding this comment.
I am going to add this as a comment in the code too, but "off" is not a no-op; it still has to create a 3-channel image out of the raw image and move the mosaiced data over to the right locations. There's just no interpolation done with the "off" method.
There was a problem hiding this comment.
I see, thanks for the clarification!
| Alternative implementations are also available from OpenCV: | ||
| rgb = cv2.cvtColor(<uint16_array>, cv2.COLOR_BAYER_BG2BGR)[:,:,::-1], | ||
| and the colour-demosaicing library (https://pypi.org/project/colour-demosaicing): | ||
| rgb = demosaicing_CFA_Bayer_bilinear(raw, pattern="RGGB") | ||
| rgb = demosaicing_CFA_Bayer_Malvar2004(raw, pattern="RGGB"), | ||
| which appear to give similar results but could run faster (not benchmarked). |
There was a problem hiding this comment.
We have opencv as a dependency already, why not just use that? Alternatively, we should add a test to ensure correctness.
There was a problem hiding this comment.
From what I remember OpenCV only has the "bilinear" method implemented, and the Malvar et al. method can give better results, so having the option may still be worthwhile. This implementation is only here to avoid adding the colour-demosaicing library dependency.
There was a problem hiding this comment.
Makes sense, maybe a quick unit test is still a good idea no?
There was a problem hiding this comment.
Yes, testing is still good to do, I was only responding to "why not just use OpenCV's version".
| factor: float = 1.0, | ||
| readout_std: float = 20.0, | ||
| fwc: int | None = None, | ||
| duplicate: float = 1.0, |
There was a problem hiding this comment.
This duplicate arg was a hack, so I'm glad it's gone, but it's meant to address emulation from a short sequence, how do you deal with this?
There was a problem hiding this comment.
There's nothing here to specifically help with that; this just sums up the frames based on the chunk_size parameter, so there can be juddering-type artifacts for chunk_sizes like {2, 3, 4, ...}. I think chunk_size = 1 should work fine as that just uses the original frames as-is -- at least the frames I got from the pre-release dataset looked alright.
I would prefer to recommend to the user to use the interpolation modules or render originally at higher frame rates to avoid artifacts, rather than resorting to any hacks here.
There was a problem hiding this comment.
I agree, maybe it's worth adding this to the docs. In fact a new troubleshooting page might be a good idea.
|
Clearly I haven't gotten to it yet, but |
|
I agree an ISP module would be nice. Could also help the writing when we update the docs like you wrote in the first comment. I'll see what I can do about that, after addressing the other comments above. |
… emulate_rgb_from_sequence input; more inoffensive defaults
|
I've tried to address most of the comments related to the RGB simulation. Still have to get to the SPC sim., the main wrinkle there is the forced grayscale conversion if requested: how to handle it with possible alpha channel in input, and having a reasonable interface. If the RGB stuff looks okay then the alpha channel handling can probably be adapted from that. Adding tests is another big TODO remaining. I'm not really sure how to go about it... I suppose I could use |
|
Thanks for addressing most of my pedantic comments! I left a few more haha. For tests: Dummy data might be better, we don't want to also test the whole simulator (that's already tested over in |
visionsim/emulate/rgb.py
Outdated
| patch = linearrgb_to_srgb(patch) | ||
| patch = np.round(patch * 2**8) / 2**8 | ||
| patch = linearrgb_to_srgb(patch.astype(np.double)) | ||
| patch = np.round(patch * 255).astype(np.uint8) |
There was a problem hiding this comment.
You are right, it's not part of the CLI, but it is exposed as a public method, so we can expect users to use it directly.
(That's a good xkcd, didn't know it!)
| from scipy.ndimage import gaussian_filter | ||
|
|
||
|
|
||
| def unsharp_mask( |
There was a problem hiding this comment.
Thanks for getting rid of the extra dependency!
Quick question: Why's this unsharpening one channel at a time? Isn't gaussian_filter vectorized? can't we just do (roughly):
img_smooth = gaussian_filter(img, sigma)
return np.clip(img + (amount * (img - img_smooth)), 0, 1)There was a problem hiding this comment.
This is mostly to allow uint8 inputs, for which that expression will not work. The path for floating-point input basically does just what you wrote above.
There was a problem hiding this comment.
We could move the type-check up so that floating-point input doesn't have to be processed channel-by-channel. I can do that if it looks worth it.
| """Unsharp-masking to sharpen an image | ||
|
|
||
| Borrows interface from scikit-image's version: | ||
| <https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.unsharp_mask> |
There was a problem hiding this comment.
Shouldn't this be `<>`_ syntax? Does it render properly? This PR predates the auto-build-docs-in-pr pipeline.
visionsim/cli/emulate.py
Outdated
|
|
||
| from visionsim.emulate.rgb import emulate_rgb_from_sequence | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
This actually brings into focus another point: We have a logger for the CLI (in cli's __init__.py) but not the library. Maybe we should define one, thoughts? Probably outside the scope of the PR either way.
visionsim/emulate/spc.py
Outdated
| rng = np.random.default_rng() if rng is None else rng | ||
| return rng.binomial(cast(npt.NDArray[np.integer], 1), 1.0 - np.exp(-img * factor)) | ||
| N = int(2**bitdepth) - 1 | ||
| return (1.0 / N) * rng.binomial(cast(npt.NDArray[np.integer], N), 1.0 - np.exp(-img * flux_gain)) |
There was a problem hiding this comment.
@shantanu-gupta Any reason why you're normalizing by 1/N here? Seems we should return the photon counts no?
There was a problem hiding this comment.
We could do that, although the calling code might then have to rescale to output as PNG. I went with this to avoid having to make changes in any other place.
There was a problem hiding this comment.
Can return counts here though, if that looks better. Let me know what you prefer.
visionsim/cli/emulate.py
Outdated
| # batch_size is not relevant here | ||
| imgs = imgs[:, 0, ...] |
|
Merged with main, and made a few tweaks such as:
TODOs:
|
I didn't get where the clipping bug was. There was definitely clipping at the end because it used to convert to Is linear to sRGB conversion done at export-time now?
Great, thanks!
I think alpha for SPADs is unnecessary complexity for now, unless there are already applications which are doing this.
I agree with your first point that interpolation followed by chunking with the same block size should give similar results, modulo motion blur. I used summation under the assumption that the user will explicitly provide the
This sounds reasonable.
Agree with this as well. |
|
I think renaming it |
|
We could get images that "just work" if we keep the current parameter (renamed as We could also average the chunk instead of summing it, like you said earlier. Then we would have to specify the flux multiplier to be larger for longer chunks (more motion blur, less noise), and correspondingly a lower |
…animate too, add migration
|
Went ahead and made the SPAD emulation use bitplanes instead, updated the ffmpeg.animation logic to use it (and the fps if available) and changed the schemas/database models accordingly. Also added a Will take a look at the RGB emulation shortly. |
|
Another problem is that the poisson sampling depends on how big the input sequence size is (again, this "duplicate" idea). So I think it makes sense to average the frames to get a "perfect" blurred image in the [0, 1] range, then apply the |
|
Okay, pushed the above tweaks, and it seems to work well when |
That's a good catch! Although I think it's probably more correct to say that the ADC outputs something like a DNG (so abstract numbers), rather than a physical photoelectron count. The ISO gain directly converts the photoelectron count to a voltage in the ADC's input range (I believe), so the current interface itself should be enough to handle this. It's perhaps worth having a note in the docs about this though.. The signal flow should be something like: UPDATE: Re-checked Hasinoff et al. (2010), "Noise-Optimal Capture for High Dynamic Range Photography", which has a model like this in Fig. 1. |
Adds additional parameters for extra control over SPAD and RGB image sensor simulations.
For SPADs:
For RGB:
Some other small-scale changes as well, described in corresponding commit titles.
📚 Documentation preview 📚: https://visionsim--23.org.readthedocs.build/en/23/