Skip to content

Additions to image sensor simulations#23

Open
shantanu-gupta wants to merge 29 commits intomainfrom
rgb_configurable_exposure
Open

Additions to image sensor simulations#23
shantanu-gupta wants to merge 29 commits intomainfrom
rgb_configurable_exposure

Conversation

@shantanu-gupta
Copy link
Contributor

@shantanu-gupta shantanu-gupta commented Jan 15, 2026

Adds additional parameters for extra control over SPAD and RGB image sensor simulations.

For SPADs:

  • allow grayscale output
  • allow individual frames to represent a more general binomial sensor (e.g. 5-bit sensor to represent block-averages of 31 binary frames each, correspondingly down-sampling the SPAD frame rate)

For RGB:

  • split "factor" into a scale for the actual photon flux and the gain factor corresponding to the sensitivity/ISO setting
  • explicit shutter angle control (currently constant for all frames)
  • RGGB mosaicing and de-mosaicing implemented instead of directly simulating 3 independent RGB channels
  • optional (Gaussian) de-noising and unsharp masking stages

Some other small-scale changes as well, described in corresponding commit titles.


📚 Documentation preview 📚: https://visionsim--23.org.readthedocs.build/en/23/

Copy link
Member

@jungerm2 jungerm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these contributions! They are much appreciated!!
I left comments directly in the code. Please have a look.

Other than that, I expect these changes will affect the quickstart and conventional cameras guide. It would be good to update these and expand the latter to reflect these novel capabilities.

Adding a few tests would also be nice, if anything as to make sure there are no future regressions.

Note: I also merged with main, and did some minimal reformatting to pass the existing tests. You'll want to pull.

self.ts = np.linspace(0, 1, len(self.transforms)) if ts is None else np.array(ts)
self.determinants = np.linalg.det(self.transforms[:, :3, :3])

k = min(len(self.transforms) - 1, k) # for small chunk_sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably emit a warning here instead of silently downgrading the interpolation order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

pyproject.toml Outdated
dependencies = [
"opencv-python",
"imageio",
"scikit-image",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this extra dependency? I think we're only using it for basic filtering/converting to grayscale, which can all be done with torch(vision) which are already bundled.

Copy link
Contributor Author

@shantanu-gupta shantanu-gupta Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if torchvision has unsharp masking implemented. skimage is pretty stable (in my experience), so maybe it's an okay dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not worried about stability; scikit is excellent. I just want to minimize bloat. Isn't this basically a convolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, yeah. Will add something to one of the utils files then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the skimage stuff with existing dependencies.

return img


def rgb2raw_bayer(rgb: torch.Tensor | npt.NDArray, cfa_pattern: Literal["rggb"] = "rggb") -> torch.Tensor | npt.NDArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the input has an alpha channel? Maybe also expand the docstring to describe the changes in image dimensionality this does, eg: RGB->Luma

Nitpick: can we expand the 2, so it's consistent with the other methods? Maybe just rgb_to_bayer, same for the inverse operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add alpha-handling, and the docstring/naming updates.

Comment on lines +69 to +73
def raw2rgb_bayer(
raw: torch.Tensor | npt.NDArray,
cfa_pattern: Literal["rggb"] = "rggb",
method: Literal["off", "bilinear", "MHC04"] = "bilinear",
) -> torch.Tensor | npt.NDArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the method should be Literal["bilinear", "MHC04"] | None = "bilinear" to make it more clear that if it's none then this is effectively a No-OP?

Copy link
Contributor Author

@shantanu-gupta shantanu-gupta Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to add this as a comment in the code too, but "off" is not a no-op; it still has to create a 3-channel image out of the raw image and move the mosaiced data over to the right locations. There's just no interpolation done with the "off" method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification!

Comment on lines +80 to +85
Alternative implementations are also available from OpenCV:
rgb = cv2.cvtColor(<uint16_array>, cv2.COLOR_BAYER_BG2BGR)[:,:,::-1],
and the colour-demosaicing library (https://pypi.org/project/colour-demosaicing):
rgb = demosaicing_CFA_Bayer_bilinear(raw, pattern="RGGB")
rgb = demosaicing_CFA_Bayer_Malvar2004(raw, pattern="RGGB"),
which appear to give similar results but could run faster (not benchmarked).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have opencv as a dependency already, why not just use that? Alternatively, we should add a test to ensure correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I remember OpenCV only has the "bilinear" method implemented, and the Malvar et al. method can give better results, so having the option may still be worthwhile. This implementation is only here to avoid adding the colour-demosaicing library dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, maybe a quick unit test is still a good idea no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, testing is still good to do, I was only responding to "why not just use OpenCV's version".

factor: float = 1.0,
readout_std: float = 20.0,
fwc: int | None = None,
duplicate: float = 1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicate arg was a hack, so I'm glad it's gone, but it's meant to address emulation from a short sequence, how do you deal with this?

Copy link
Contributor Author

@shantanu-gupta shantanu-gupta Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing here to specifically help with that; this just sums up the frames based on the chunk_size parameter, so there can be juddering-type artifacts for chunk_sizes like {2, 3, 4, ...}. I think chunk_size = 1 should work fine as that just uses the original frames as-is -- at least the frames I got from the pre-release dataset looked alright.

I would prefer to recommend to the user to use the interpolation modules or render originally at higher frame rates to avoid artifacts, rather than resorting to any hacks here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, maybe it's worth adding this to the docs. In fact a new troubleshooting page might be a good idea.

@jungerm2
Copy link
Member

Clearly I haven't gotten to it yet, but emulate_rgb_from_flow would probably reuse a lot of the components from emulate_rgb_from_sequence. So maybe its worth future-proofing a bit and extracting some of the core operations into an ISP module? Might be too soon, just food for thought.

@jungerm2 jungerm2 self-assigned this Jan 16, 2026
@shantanu-gupta
Copy link
Contributor Author

shantanu-gupta commented Jan 27, 2026

I agree an ISP module would be nice. Could also help the writing when we update the docs like you wrote in the first comment.

I'll see what I can do about that, after addressing the other comments above.

… emulate_rgb_from_sequence input; more inoffensive defaults
@shantanu-gupta
Copy link
Contributor Author

shantanu-gupta commented Jan 27, 2026

I've tried to address most of the comments related to the RGB simulation. Still have to get to the SPC sim., the main wrinkle there is the forced grayscale conversion if requested: how to handle it with possible alpha channel in input, and having a reasonable interface. If the RGB stuff looks okay then the alpha channel handling can probably be adapted from that.

Adding tests is another big TODO remaining. I'm not really sure how to go about it... I suppose I could use blender.render-animation with some custom config to generate datasets of purely grayscale images, as well as those containing an alpha-channel? Or is it better to just create dummy data just for testing. Thoughts?

@jungerm2
Copy link
Member

Thanks for addressing most of my pedantic comments! I left a few more haha.

For tests: Dummy data might be better, we don't want to also test the whole simulator (that's already tested over in test_simualte.py, and there's a fixture, namely cube_dataset that is responsible for rendering a dummy dataset). If we use that instead of dummy data then we'd be testing more than just the ISP stuff. You can either generate dummy data on the fly, or even just store some in the repo.

patch = linearrgb_to_srgb(patch)
patch = np.round(patch * 2**8) / 2**8
patch = linearrgb_to_srgb(patch.astype(np.double))
patch = np.round(patch * 255).astype(np.uint8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's not part of the CLI, but it is exposed as a public method, so we can expect users to use it directly.

(That's a good xkcd, didn't know it!)

from scipy.ndimage import gaussian_filter


def unsharp_mask(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting rid of the extra dependency!

Quick question: Why's this unsharpening one channel at a time? Isn't gaussian_filter vectorized? can't we just do (roughly):

img_smooth = gaussian_filter(img, sigma)
return np.clip(img + (amount * (img - img_smooth)), 0, 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly to allow uint8 inputs, for which that expression will not work. The path for floating-point input basically does just what you wrote above.

Copy link
Contributor Author

@shantanu-gupta shantanu-gupta Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the type-check up so that floating-point input doesn't have to be processed channel-by-channel. I can do that if it looks worth it.

"""Unsharp-masking to sharpen an image

Borrows interface from scikit-image's version:
<https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.unsharp_mask>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be `<>`_ syntax? Does it render properly? This PR predates the auto-build-docs-in-pr pipeline.


from visionsim.emulate.rgb import emulate_rgb_from_sequence

logger = logging.getLogger(__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually brings into focus another point: We have a logger for the CLI (in cli's __init__.py) but not the library. Maybe we should define one, thoughts? Probably outside the scope of the PR either way.

@jungerm2 jungerm2 mentioned this pull request Feb 5, 2026
6 tasks
rng = np.random.default_rng() if rng is None else rng
return rng.binomial(cast(npt.NDArray[np.integer], 1), 1.0 - np.exp(-img * factor))
N = int(2**bitdepth) - 1
return (1.0 / N) * rng.binomial(cast(npt.NDArray[np.integer], N), 1.0 - np.exp(-img * flux_gain))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shantanu-gupta Any reason why you're normalizing by 1/N here? Seems we should return the photon counts no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, although the calling code might then have to rescale to output as PNG. I went with this to avoid having to make changes in any other place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can return counts here though, if that looks better. Let me know what you prefer.

Comment on lines +327 to +328
# batch_size is not relevant here
imgs = imgs[:, 0, ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this too?

@jungerm2
Copy link
Member

jungerm2 commented Feb 24, 2026

Merged with main, and made a few tweaks such as:

  • Changed emulate_rgb_from_sequence to return linear-intensity float array, it's consistent with the rest of the API, and the current version had a bug where if chunk-size was > 255 it would clip.
  • Added a rgb_to_grayscale util since it was used in a few places (supports alpha passthrough)
  • Removed alpha logic from emulate spad for the time being. I don't know if it really makes sense to have binary frames with an alpha channel?

TODOs:

  • Emulate RGB does not work as intended and might need a small redesign. For instance, on one hand we probably should normalize by burst size, because I can take the same sequence, interpolate it 16x and 128x and I should get roughly the same rgb emulations (albeit with less motion artifacts) when I emulate from these with chunk-size=16 and 128 respectively. On the flip side, you could argue that more input frames is akin to a longer exposure, and thus we should sum these. This ambiguity will remain as long as there's not a clear sense of exposure time. Adding this might also fix the "duplicate" workaround we used to have. I also think we should hold off on alpha considerations (and duplicating the color channel to ensure RGB) until these issues are resolved.
  • (minor) We might want to enable non-power of two bitdepth, and just replace bitdepth by say bitplanes.
  • (minor) If we do the above, we should consider adding it to the transforms.json schema such that ffmpeg.animate can correctly understand spad data when using binomial frames, optionally add tonemapping+invert response too.

@shantanu-gupta
Copy link
Contributor Author

shantanu-gupta commented Mar 3, 2026

* Changed `emulate_rgb_from_sequence` to return linear-intensity float array, it's consistent with the rest of the API, and the current version had a bug where if chunk-size was > 255 it would clip.

I didn't get where the clipping bug was. There was definitely clipping at the end because it used to convert to uint8 but other than that I don't see any other problem. Anyway, it's moot now if it's returning in [0, 1].

Is linear to sRGB conversion done at export-time now?

* Added a `rgb_to_grayscale` util since it was used in a few places (supports alpha passthrough)

Great, thanks!

* Removed alpha logic from emulate spad for the time being. I don't know if it really makes sense to have binary frames with an alpha channel?

I think alpha for SPADs is unnecessary complexity for now, unless there are already applications which are doing this.

* [ ]  Emulate RGB does not work as intended and might need a small redesign. For instance, on one hand we probably should normalize by burst size, because I can take the same sequence, interpolate it 16x and 128x and I should get roughly the same rgb emulations (albeit with less motion artifacts) when I emulate from these with chunk-size=16 and 128 respectively.  On the flip side, you could argue that more input frames is akin to a longer exposure, and thus we should sum these. This ambiguity will remain as long as there's not a clear sense of exposure time. Adding this might also fix the "duplicate" workaround we used to have. I also think we should hold off on alpha considerations (and duplicating the color channel to ensure RGB) until these issues are resolved.

I agree with your first point that interpolation followed by chunking with the same block size should give similar results, modulo motion blur. I used summation under the assumption that the user will explicitly provide the flux_gain parameter: for chunk_size = 16 it should be 8 times that for chunk_size = 128, representing longer exposures in the individual frames being added. Effectively I treat the rgb_sequence as just an abstract radiance (as photons/some unit time), with the actual range directed by flux_gain. We could name it flux_per_exposure or something similar to make this even more explicit, let me know.

* [ ]  (minor) We might want to enable non-power of two bitdepth, and just replace bitdepth by say `bitplanes`.

This sounds reasonable.

* [ ]  (minor) If we do the above, we should consider adding it to the transforms.json schema such that `ffmpeg.animate` can correctly understand spad data when using binomial frames, optionally add tonemapping+invert response too.

Agree with this as well.

@jungerm2
Copy link
Member

jungerm2 commented Mar 3, 2026

I think renaming it flux_per_exposure kind of fixes the issue, but it places the responsibility of doing the mental gymnastics on the end user, and we can't expect them to really tweak this. The event emulation just takes a --fps of the input, so maybe it makes sense to do this here too? I'd like to get it to a spot where it just works and gives reasonable results without having to think about parameters too much.

@shantanu-gupta
Copy link
Contributor Author

shantanu-gupta commented Mar 3, 2026

We could get images that "just work" if we keep the current parameter (renamed as flux_per_exposure), and make the default iso_gain parameter cover the rest of the sensor's dynamic range, so something like (2**adc_bitdepth - 1) / (chunk_size * flux_per_exposure). Then the user has the option to modulate the flux and ISO gain sliders from there as needed.

We could also average the chunk instead of summing it, like you said earlier. Then we would have to specify the flux multiplier to be larger for longer chunks (more motion blur, less noise), and correspondingly a lower iso_gain. That could be fine too. To me this looks pretty similar to the current state in terms of complexity, so I don't have a strong preference either way. Let me know which you prefer.

@jungerm2
Copy link
Member

jungerm2 commented Mar 4, 2026

Went ahead and made the SPAD emulation use bitplanes instead, updated the ffmpeg.animation logic to use it (and the fps if available) and changed the schemas/database models accordingly. Also added a _log_once utility.

Will take a look at the RGB emulation shortly.

@jungerm2
Copy link
Member

jungerm2 commented Mar 4, 2026

Another problem is that the poisson sampling depends on how big the input sequence size is (again, this "duplicate" idea). So I think it makes sense to average the frames to get a "perfect" blurred image in the [0, 1] range, then apply the flux_gain which maps this to a photon rate (effectively saying that flux_gain corresponds to the average number of photons incident on the brightest point of the image). This would make flux_gain the main fudging factor between sim2real and will likely be in the same range as the FWC, say 50_000 for a DSLR.

@jungerm2
Copy link
Member

jungerm2 commented Mar 5, 2026

Okay, pushed the above tweaks, and it seems to work well when flux_gain is on the order of 2*adc_bitdepth, which makes sense. This means that the patch intensity is now directly in photo-electrons. Currently, the ADC levels are also in increments of single electrons, which seems unrealistic. The FWC of a DSLR is ~50k electrons, yet an ADC might be (at most) 14 bits (~16k levels). So how large, in electrons, is a "unit" of ADC? How should we model this decrepency?

@shantanu-gupta
Copy link
Contributor Author

shantanu-gupta commented Mar 11, 2026

Okay, pushed the above tweaks, and it seems to work well when flux_gain is on the order of 2*adc_bitdepth, which makes sense. This means that the patch intensity is now directly in photo-electrons. Currently, the ADC levels are also in increments of single electrons, which seems unrealistic. The FWC of a DSLR is ~50k electrons, yet an ADC might be (at most) 14 bits (~16k levels). So how large, in electrons, is a "unit" of ADC? How should we model this decrepency?

That's a good catch! Although I think it's probably more correct to say that the ADC outputs something like a DNG (so abstract numbers), rather than a physical photoelectron count. The ISO gain directly converts the photoelectron count to a voltage in the ADC's input range (I believe), so the current interface itself should be enough to handle this. It's perhaps worth having a note in the docs about this though..

The signal flow should be something like:
final_reading = ADC_quantize[iso_gain * (read_noise + min(FWC, eta * flux_gain * pixel_value))]
where pixel_value is in [0, 1] and unit-less, flux_gain is in photons, eta is quantum efficiency (as photo-electrons per photon), read_noise and FWC in photo-electrons, iso_gain something like (DNG levels per photo-electron), and finally ADC_quantize returns DNG level.

UPDATE: Re-checked Hasinoff et al. (2010), "Noise-Optimal Capture for High Dynamic Range Photography", which has a model like this in Fig. 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants