This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Description
Great work!
Your paper said that “Finally, we apply a single 1 × 1 convolution layer to get the per-pixel embeddings. ” in 4.1 Implementation details Pixel decoder.
However,the codes in pixel_decoder.py ,
self.mask_features = Conv2d(
conv_dim,
mask_dim,
kernel_size=3,
stride=1,
padding=1,
)
the final conv2d also has a 3*3 kernel, do I miss something?
Thanks!