Additions to image sensor simulations by shantanu-gupta · Pull Request #23 · wision-lab/visionsim

shantanu-gupta · 2026-01-15T05:39:37Z

Adds additional parameters for extra control over SPAD and RGB image sensor simulations.

For SPADs:

allow grayscale output
allow individual frames to represent a more general binomial sensor (e.g. 5-bit sensor to represent block-averages of 31 binary frames each, correspondingly down-sampling the SPAD frame rate)

For RGB:

split "factor" into a scale for the actual photon flux and the gain factor corresponding to the sensitivity/ISO setting
explicit shutter angle control (currently constant for all frames)
RGGB mosaicing and de-mosaicing implemented instead of directly simulating 3 independent RGB channels
optional (Gaussian) de-noising and unsharp masking stages

Some other small-scale changes as well, described in corresponding commit titles.

📚 Documentation preview 📚: https://visionsim--23.org.readthedocs.build/en/23/

jungerm2

Thanks for these contributions! They are much appreciated!!
I left comments directly in the code. Please have a look.

Other than that, I expect these changes will affect the quickstart and conventional cameras guide. It would be good to update these and expand the latter to reflect these novel capabilities.

Adding a few tests would also be nice, if anything as to make sure there are no future regressions.

Note: I also merged with main, and did some minimal reformatting to pass the existing tests. You'll want to pull.

jungerm2 · 2026-01-15T18:23:38Z

visionsim/interpolate/pose.py

        self.ts = np.linspace(0, 1, len(self.transforms)) if ts is None else np.array(ts)
        self.determinants = np.linalg.det(self.transforms[:, :3, :3])
+
+        k = min(len(self.transforms) - 1, k)  # for small chunk_sizes


We should probably emit a warning here instead of silently downgrading the interpolation order.

.gitignore

jungerm2 · 2026-01-15T18:26:00Z

pyproject.toml

 dependencies = [
    "opencv-python",
    "imageio",
+	"scikit-image",


Do we need this extra dependency? I think we're only using it for basic filtering/converting to grayscale, which can all be done with torch(vision) which are already bundled.

I'm not sure if torchvision has unsharp masking implemented. skimage is pretty stable (in my experience), so maybe it's an okay dependency?

I'm not worried about stability; scikit is excellent. I just want to minimize bloat. Isn't this basically a convolution?

I guess so, yeah. Will add something to one of the utils files then.

Replaced the skimage stuff with existing dependencies.

visionsim/cli/emulate.py

visionsim/emulate/rgb.py

jungerm2 · 2026-01-15T19:15:05Z

visionsim/utils/color.py

    return img
+
+
+def rgb2raw_bayer(rgb: torch.Tensor | npt.NDArray, cfa_pattern: Literal["rggb"] = "rggb") -> torch.Tensor | npt.NDArray:


What happens if the input has an alpha channel? Maybe also expand the docstring to describe the changes in image dimensionality this does, eg: RGB->Luma

Nitpick: can we expand the 2, so it's consistent with the other methods? Maybe just rgb_to_bayer, same for the inverse operation.

Will add alpha-handling, and the docstring/naming updates.

jungerm2 · 2026-01-15T19:18:25Z

visionsim/utils/color.py

+def raw2rgb_bayer(
+    raw: torch.Tensor | npt.NDArray,
+    cfa_pattern: Literal["rggb"] = "rggb",
+    method: Literal["off", "bilinear", "MHC04"] = "bilinear",
+) -> torch.Tensor | npt.NDArray:


Maybe the method should be Literal["bilinear", "MHC04"] | None = "bilinear" to make it more clear that if it's none then this is effectively a No-OP?

I am going to add this as a comment in the code too, but "off" is not a no-op; it still has to create a 3-channel image out of the raw image and move the mosaiced data over to the right locations. There's just no interpolation done with the "off" method.

I see, thanks for the clarification!

jungerm2 · 2026-01-15T19:19:27Z

visionsim/utils/color.py

+    Alternative implementations are also available from OpenCV:
+        rgb = cv2.cvtColor(<uint16_array>, cv2.COLOR_BAYER_BG2BGR)[:,:,::-1],
+    and the colour-demosaicing library (https://pypi.org/project/colour-demosaicing):
+        rgb = demosaicing_CFA_Bayer_bilinear(raw, pattern="RGGB")
+        rgb = demosaicing_CFA_Bayer_Malvar2004(raw, pattern="RGGB"),
+    which appear to give similar results but could run faster (not benchmarked).


We have opencv as a dependency already, why not just use that? Alternatively, we should add a test to ensure correctness.

From what I remember OpenCV only has the "bilinear" method implemented, and the Malvar et al. method can give better results, so having the option may still be worthwhile. This implementation is only here to avoid adding the colour-demosaicing library dependency.

Makes sense, maybe a quick unit test is still a good idea no?

Yes, testing is still good to do, I was only responding to "why not just use OpenCV's version".

jungerm2 · 2026-01-15T19:20:51Z

visionsim/cli/emulate.py

-    factor: float = 1.0,
-    readout_std: float = 20.0,
-    fwc: int | None = None,
-    duplicate: float = 1.0,


This duplicate arg was a hack, so I'm glad it's gone, but it's meant to address emulation from a short sequence, how do you deal with this?

There's nothing here to specifically help with that; this just sums up the frames based on the chunk_size parameter, so there can be juddering-type artifacts for chunk_sizes like {2, 3, 4, ...}. I think chunk_size = 1 should work fine as that just uses the original frames as-is -- at least the frames I got from the pre-release dataset looked alright.

I would prefer to recommend to the user to use the interpolation modules or render originally at higher frame rates to avoid artifacts, rather than resorting to any hacks here.

I agree, maybe it's worth adding this to the docs. In fact a new troubleshooting page might be a good idea.

jungerm2 · 2026-01-15T19:37:50Z

Clearly I haven't gotten to it yet, but emulate_rgb_from_flow would probably reuse a lot of the components from emulate_rgb_from_sequence. So maybe its worth future-proofing a bit and extracting some of the core operations into an ISP module? Might be too soon, just food for thought.

shantanu-gupta · 2026-01-27T03:18:14Z

I agree an ISP module would be nice. Could also help the writing when we update the docs like you wrote in the first comment.

I'll see what I can do about that, after addressing the other comments above.

… emulate_rgb_from_sequence input; more inoffensive defaults

shantanu-gupta · 2026-01-27T03:55:35Z

I've tried to address most of the comments related to the RGB simulation. Still have to get to the SPC sim., the main wrinkle there is the forced grayscale conversion if requested: how to handle it with possible alpha channel in input, and having a reasonable interface. If the RGB stuff looks okay then the alpha channel handling can probably be adapted from that.

Adding tests is another big TODO remaining. I'm not really sure how to go about it... I suppose I could use blender.render-animation with some custom config to generate datasets of purely grayscale images, as well as those containing an alpha-channel? Or is it better to just create dummy data just for testing. Thoughts?

jungerm2 · 2026-01-28T15:40:25Z

Thanks for addressing most of my pedantic comments! I left a few more haha.

For tests: Dummy data might be better, we don't want to also test the whole simulator (that's already tested over in test_simualte.py, and there's a fixture, namely cube_dataset that is responsible for rendering a dummy dataset). If we use that instead of dummy data then we'd be testing more than just the ISP stuff. You can either generate dummy data on the fly, or even just store some in the repo.

jungerm2 · 2026-02-03T16:06:14Z

visionsim/emulate/rgb.py

-    patch = linearrgb_to_srgb(patch)
-    patch = np.round(patch * 2**8) / 2**8
+    patch = linearrgb_to_srgb(patch.astype(np.double))
+    patch = np.round(patch * 255).astype(np.uint8)


You are right, it's not part of the CLI, but it is exposed as a public method, so we can expect users to use it directly.

(That's a good xkcd, didn't know it!)

jungerm2 · 2026-02-03T16:16:42Z

visionsim/utils/imgproc.py

+from scipy.ndimage import gaussian_filter
+
+
+def unsharp_mask(


Thanks for getting rid of the extra dependency!

Quick question: Why's this unsharpening one channel at a time? Isn't gaussian_filter vectorized? can't we just do (roughly):

img_smooth = gaussian_filter(img, sigma) return np.clip(img + (amount * (img - img_smooth)), 0, 1)

This is mostly to allow uint8 inputs, for which that expression will not work. The path for floating-point input basically does just what you wrote above.

We could move the type-check up so that floating-point input doesn't have to be processed channel-by-channel. I can do that if it looks worth it.

jungerm2 · 2026-02-03T16:19:54Z

visionsim/utils/imgproc.py

+    """Unsharp-masking to sharpen an image
+
+    Borrows interface from scikit-image's version:
+    <https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.unsharp_mask>


Shouldn't this be `<>`_ syntax? Does it render properly? This PR predates the auto-build-docs-in-pr pipeline.

jungerm2 · 2026-02-03T16:21:35Z

visionsim/cli/emulate.py


 from visionsim.emulate.rgb import emulate_rgb_from_sequence

+logger = logging.getLogger(__name__)


This actually brings into focus another point: We have a logger for the CLI (in cli's __init__.py) but not the library. Maybe we should define one, thoughts? Probably outside the scope of the PR either way.

jungerm2 · 2026-02-24T15:21:07Z

visionsim/emulate/spc.py

    rng = np.random.default_rng() if rng is None else rng
-    return rng.binomial(cast(npt.NDArray[np.integer], 1), 1.0 - np.exp(-img * factor))
+    N = int(2**bitdepth) - 1
+    return (1.0 / N) * rng.binomial(cast(npt.NDArray[np.integer], N), 1.0 - np.exp(-img * flux_gain))


@shantanu-gupta Any reason why you're normalizing by 1/N here? Seems we should return the photon counts no?

We could do that, although the calling code might then have to rescale to output as PNG. I went with this to avoid having to make changes in any other place.

Can return counts here though, if that looks better. Let me know what you prefer.

jungerm2 · 2026-02-24T17:01:52Z

visionsim/cli/emulate.py

+            # batch_size is not relevant here
+            imgs = imgs[:, 0, ...]


Why this too?

jungerm2 · 2026-02-24T18:09:31Z

Merged with main, and made a few tweaks such as:

Changed emulate_rgb_from_sequence to return linear-intensity float array, it's consistent with the rest of the API, and the current version had a bug where if chunk-size was > 255 it would clip.
Added a rgb_to_grayscale util since it was used in a few places (supports alpha passthrough)
Removed alpha logic from emulate spad for the time being. I don't know if it really makes sense to have binary frames with an alpha channel?

TODOs:

Emulate RGB does not work as intended and might need a small redesign. For instance, on one hand we probably should normalize by burst size, because I can take the same sequence, interpolate it 16x and 128x and I should get roughly the same rgb emulations (albeit with less motion artifacts) when I emulate from these with chunk-size=16 and 128 respectively. On the flip side, you could argue that more input frames is akin to a longer exposure, and thus we should sum these. This ambiguity will remain as long as there's not a clear sense of exposure time. Adding this might also fix the "duplicate" workaround we used to have. I also think we should hold off on alpha considerations (and duplicating the color channel to ensure RGB) until these issues are resolved.
(minor) We might want to enable non-power of two bitdepth, and just replace bitdepth by say bitplanes.
(minor) If we do the above, we should consider adding it to the transforms.json schema such that ffmpeg.animate can correctly understand spad data when using binomial frames, optionally add tonemapping+invert response too.

shantanu-gupta · 2026-03-03T21:36:39Z

* Changed `emulate_rgb_from_sequence` to return linear-intensity float array, it's consistent with the rest of the API, and the current version had a bug where if chunk-size was > 255 it would clip.

I didn't get where the clipping bug was. There was definitely clipping at the end because it used to convert to uint8 but other than that I don't see any other problem. Anyway, it's moot now if it's returning in [0, 1].

Is linear to sRGB conversion done at export-time now?

* Added a `rgb_to_grayscale` util since it was used in a few places (supports alpha passthrough)

Great, thanks!

* Removed alpha logic from emulate spad for the time being. I don't know if it really makes sense to have binary frames with an alpha channel?

I think alpha for SPADs is unnecessary complexity for now, unless there are already applications which are doing this.

* [ ]  Emulate RGB does not work as intended and might need a small redesign. For instance, on one hand we probably should normalize by burst size, because I can take the same sequence, interpolate it 16x and 128x and I should get roughly the same rgb emulations (albeit with less motion artifacts) when I emulate from these with chunk-size=16 and 128 respectively.  On the flip side, you could argue that more input frames is akin to a longer exposure, and thus we should sum these. This ambiguity will remain as long as there's not a clear sense of exposure time. Adding this might also fix the "duplicate" workaround we used to have. I also think we should hold off on alpha considerations (and duplicating the color channel to ensure RGB) until these issues are resolved.

I agree with your first point that interpolation followed by chunking with the same block size should give similar results, modulo motion blur. I used summation under the assumption that the user will explicitly provide the flux_gain parameter: for chunk_size = 16 it should be 8 times that for chunk_size = 128, representing longer exposures in the individual frames being added. Effectively I treat the rgb_sequence as just an abstract radiance (as photons/some unit time), with the actual range directed by flux_gain. We could name it flux_per_exposure or something similar to make this even more explicit, let me know.

* [ ]  (minor) We might want to enable non-power of two bitdepth, and just replace bitdepth by say `bitplanes`.

This sounds reasonable.

* [ ]  (minor) If we do the above, we should consider adding it to the transforms.json schema such that `ffmpeg.animate` can correctly understand spad data when using binomial frames, optionally add tonemapping+invert response too.

Agree with this as well.

jungerm2 · 2026-03-03T22:49:07Z

I think renaming it flux_per_exposure kind of fixes the issue, but it places the responsibility of doing the mental gymnastics on the end user, and we can't expect them to really tweak this. The event emulation just takes a --fps of the input, so maybe it makes sense to do this here too? I'd like to get it to a spot where it just works and gives reasonable results without having to think about parameters too much.

shantanu-gupta · 2026-03-03T23:57:16Z

We could get images that "just work" if we keep the current parameter (renamed as flux_per_exposure), and make the default iso_gain parameter cover the rest of the sensor's dynamic range, so something like (2**adc_bitdepth - 1) / (chunk_size * flux_per_exposure). Then the user has the option to modulate the flux and ISO gain sliders from there as needed.

We could also average the chunk instead of summing it, like you said earlier. Then we would have to specify the flux multiplier to be larger for longer chunks (more motion blur, less noise), and correspondingly a lower iso_gain. That could be fine too. To me this looks pretty similar to the current state in terms of complexity, so I don't have a strong preference either way. Let me know which you prefer.

…animate too, add migration

jungerm2 · 2026-03-04T16:33:35Z

Went ahead and made the SPAD emulation use bitplanes instead, updated the ffmpeg.animation logic to use it (and the fps if available) and changed the schemas/database models accordingly. Also added a _log_once utility.

Will take a look at the RGB emulation shortly.

jungerm2 · 2026-03-04T20:22:39Z

Another problem is that the poisson sampling depends on how big the input sequence size is (again, this "duplicate" idea). So I think it makes sense to average the frames to get a "perfect" blurred image in the [0, 1] range, then apply the flux_gain which maps this to a photon rate (effectively saying that flux_gain corresponds to the average number of photons incident on the brightest point of the image). This would make flux_gain the main fudging factor between sim2real and will likely be in the same range as the FWC, say 50_000 for a DSLR.

jungerm2 · 2026-03-05T01:22:47Z

Okay, pushed the above tweaks, and it seems to work well when flux_gain is on the order of 2*adc_bitdepth, which makes sense. This means that the patch intensity is now directly in photo-electrons. Currently, the ADC levels are also in increments of single electrons, which seems unrealistic. The FWC of a DSLR is ~50k electrons, yet an ADC might be (at most) 14 bits (~16k levels). So how large, in electrons, is a "unit" of ADC? How should we model this decrepency?

shantanu-gupta · 2026-03-11T06:20:21Z

Okay, pushed the above tweaks, and it seems to work well when flux_gain is on the order of 2*adc_bitdepth, which makes sense. This means that the patch intensity is now directly in photo-electrons. Currently, the ADC levels are also in increments of single electrons, which seems unrealistic. The FWC of a DSLR is ~50k electrons, yet an ADC might be (at most) 14 bits (~16k levels). So how large, in electrons, is a "unit" of ADC? How should we model this decrepency?

That's a good catch! Although I think it's probably more correct to say that the ADC outputs something like a DNG (so abstract numbers), rather than a physical photoelectron count. The ISO gain directly converts the photoelectron count to a voltage in the ADC's input range (I believe), so the current interface itself should be enough to handle this. It's perhaps worth having a note in the docs about this though..

The signal flow should be something like:
final_reading = ADC_quantize[iso_gain * (read_noise + min(FWC, eta * flux_gain * pixel_value))]
where pixel_value is in [0, 1] and unit-less, flux_gain is in photons, eta is quantum efficiency (as photo-electrons per photon), read_noise and FWC in photo-electrons, iso_gain something like (DNG levels per photo-electron), and finally ADC_quantize returns DNG level.

UPDATE: Re-checked Hasinoff et al. (2010), "Noise-Optimal Capture for High Dynamic Range Photography", which has a model like this in Fig. 1.

Shantanu Gupta added 11 commits September 26, 2025 16:01

allow chunk sizes in {1,2,3}

7ecf1f1

RGB sensor model partly from Noise-Optimal HDR paper

1ba4dda

Bayer pattern and demosaicing, skimage dependency addition

76636ec

allow chunk sizes in {1,2,3}

6bdbaa5

RGB sensor model partly from Noise-Optimal HDR paper

3378fea

Bayer pattern and demosaicing, skimage dependency addition

007d9a2

return to full words in CLI interface

fd6d71a

add editor swap files to gitignore

9208923

typo

fb737b8

allow grayscale SPAD simulation and binomial samples with bit-depth > 1

f28f4f6

lint fix

1896d46

shantanu-gupta requested a review from jungerm2 January 15, 2026 05:39

jungerm2 added 3 commits January 15, 2026 10:55

Merge branch 'main' into rgb_configurable_exposure

2460c75

re-format

0417c04

resolve conflicts

7992767

jungerm2 reviewed Jan 15, 2026

View reviewed changes

Test docs build on PR

f5a1853

jungerm2 self-assigned this Jan 16, 2026

remove skimage dependency

d199131

rgb simulation interface cleanup; handling potential alpha channel in…

bbff6c7

… emulate_rgb_from_sequence input; more inoffensive defaults

Shantanu Gupta added 3 commits January 28, 2026 13:39

remove skimage dependency

6bb38c3

spad sim refactoring, removed skimage dependency

9edb88e

forgot linting

f25fee0

jungerm2 reviewed Feb 3, 2026

View reviewed changes

jungerm2 mentioned this pull request Feb 5, 2026

Evolving the dataset formats #26

Merged

6 tasks

jungerm2 reviewed Feb 24, 2026

View reviewed changes

merge w/ main

1a570bb

jungerm2 added 3 commits February 24, 2026 12:13

fix typing issues

ce8cc91

Merge branch 'main' into rgb_configurable_exposure

3991229

Merge branch 'main' into rgb_configurable_exposure

45373bd

add bitplanes arg for spc emulation and in schemas, use it in ffmpeg.…

5e15ca7

…animate too, add migration

jungerm2 added 3 commits March 4, 2026 10:43

fix interpolate dataset

bfd5aef

remove unused strip_alpha helper

00650d2

rename raw_to_rgb_bayer to raw_bayer_to_rgb

618cb3c

make emulate rgb from seq independent from seq length

151f0b0

		return img


		def rgb2raw_bayer(rgb: torch.Tensor \| npt.NDArray, cfa_pattern: Literal["rggb"] = "rggb") -> torch.Tensor \| npt.NDArray:


		from visionsim.emulate.rgb import emulate_rgb_from_sequence

		logger = logging.getLogger(__name__)

Conversation

shantanu-gupta commented Jan 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jungerm2 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shantanu-gupta Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shantanu-gupta Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shantanu-gupta Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jungerm2 commented Jan 15, 2026

Uh oh!

shantanu-gupta commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shantanu-gupta commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jungerm2 commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shantanu-gupta Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shantanu-gupta commented Jan 15, 2026 •

edited by github-actions bot

Loading

jungerm2 left a comment •

edited

Loading

shantanu-gupta Jan 27, 2026 •

edited

Loading

shantanu-gupta Jan 27, 2026 •

edited

Loading

shantanu-gupta Jan 27, 2026 •

edited

Loading

shantanu-gupta commented Jan 27, 2026 •

edited

Loading

shantanu-gupta commented Jan 27, 2026 •

edited

Loading

shantanu-gupta Feb 3, 2026 •

edited

Loading

jungerm2 commented Feb 24, 2026 •

edited by shantanu-gupta

Loading

shantanu-gupta commented Mar 3, 2026 •

edited

Loading

shantanu-gupta commented Mar 3, 2026 •

edited

Loading

jungerm2 commented Mar 4, 2026 •

edited

Loading

shantanu-gupta commented Mar 11, 2026 •

edited

Loading