diff --git a/README.md b/README.md index cad1abd..a3ead28 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,218 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) - **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Name: Meghana Seshadri +* Tested on: Windows 10, i7-4870HQ @ 2.50GHz 16GB, GeForce GT 750M 2048MB (personal computer) + + +## Project Overview + +The goal of this project was to get an introduction to writing a GPU Rasterizer in CUDA. The pipeline being used is similar to that of OpenGL, where the following are implemented: vertex shading, primitive assembly, rasterization, fragment shading, and a framebuffer. + + +![](renders/FinalRenders/normals_cow.gif) + +![](renders/FinalRenders/truck.gif) + + +[Click here for instructions of this project](./INSTRUCTION.md) + +### Features + +The following features were implemented (most of which can be toggled with flags in `rasterize.cu`): + +**Basic Rasterization Pipeline:** + +* Vertex shading. (_vertexTransformAndAssembly in rasterize.cu) +* Primitive assembly with support for triangles read from buffers of index and vertex data (_primitiveAssembly in rasterize.cu) +* Rasterization (_rasterize in rasterize.cu) +* Fragment shading (render in rasterize.cu) +* A depth buffer for storing and depth testing fragments (int * dev_depth in rasterize.cu) +* Fragment-to-depth-buffer writing (with atomics for race avoidance) +* Lambertian lighting scheme in the Fragment shader (render in rasterize.cu) + + +**Extra Features:** + +* UV texture mapping with bilinear texture filtering and perspective correct texture coordinates +* Support for rasterizing the following primitives: + - Points + - Lines + - Triangles + + +### Running the code +The main function requires a glTF model file (can be found in `/gltfs`). Call the program with one as an argument: `cis565_rasterizer gltfs/duck/duck.gltf`. (In Visual Studio, `../gltfs/duck/duck.gltf`.) + +If you are using Visual Studio, you can set this in the `Debugging > Command Arguments` section in the Project properties. + + +### Rasterization Pipeline + +#### Vertex Shading +* VertexIn[n] vs_input -> VertexOut[n] vs_output +* Apply some vertex transformation (e.g. model-view-projection matrix using glm::lookAt and glm::perspective). + +#### Primitive assembly +* VertexOut[n] vs_output -> Triangle[t] primitives + +#### Rasterization +* Triangle[t] primitives -> Fragment[m] rasterized +* Parallelize over triangles, but now avoid looping over all pixels: + - When rasterizing a triangle, only scan over the box around the triangle (getAABBForTriangle). + +#### Fragments to depth buffer +* Fragment[m] rasterized -> Fragment[width][height] depthbuffer +* depthbuffer is for storing and depth testing fragments. + +**Depth Buffer Testing** + +Each pixel can contain multiple fragments, each at different z-depth values. In a rasterizer, one must only render the fragment with the minimum depth (aka the front most fragment). The nearest fragments per pixel are then stored in a depth buffer. Every run of the rasterization will constantly find the nearest fragment and update the depth buffer accordingly. + +This process can be done before fragment shading, which prevents the fragment shader from changing the depth of a fragment. In order to do this safely on the GPU, however, race conditions must be avoided. Race conditions can occur ince multiple primitives in a scene are writing their fragment to the same place in the depth buffer. In order to handle this, there are two approaches: + +* `Approach 1:` Convert your depth value to a fixed-point int, and use atomicMin to store it into an int-typed depth buffer intdepth. After that, the value which is stored at intdepth[i] is (usually) that of the fragment which should be stored into the fragment depth buffer. + - This may result in some rare race conditions (e.g. across blocks). + +* `Approach 2:` (The safer approach) Lock the location in the depth buffer during the time that a thread is comparing old and new fragment depths (and possibly writing a new fragment). This should work in all cases, but be slower. This method involves using CUDA mutexes to test only the fragments within a pixel serially. + +Approach 2 is the safer of the two approaches. By allocating a device int array (set to all 0's initially), we can use this to store all the minimum depth values per pixel. The dimensions of this 1D array would be width * height to correspond with the screen. In the resources section, you can see a pseudocode breakdown of how this mutex is used. + +#### Fragment shading +* Fragment[width][height] depthbuffer -> +* Add a shading method, such as Lambert or Blinn-Phong. Lights can be defined by kernel parameters (like GLSL uniforms). + +#### Fragment to framebuffer writing +* -> vec3[width][height] framebuffer +* Simply copies the colors out of the depth buffer into the framebuffer (to be displayed on the screen). + + +## Renders + +### Texture Mapping +![](renders/FinalRenders/cesiummilktruck_textured.PNG) +###### (Cesium Milk Truck) + +![](renders/FinalRenders/duck.PNG) +###### (Duck) + +### Rendering with points +![](renders/FinalRenders/cesiummilktruck_points.PNG) + +### Rendering with lines +![](renders/FinalRenders/rasterizeLines_box.PNG) + +![](renders/FinalRenders/cow_lines.PNG) -### (TODO: Your README) +### Depth Buffer Test +![](renders/FinalRenders/box_depthtest.PNG) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +### Normal Test +![](renders/FinalRenders/normals_flower.PNG) -### Credits + +## Performance Analysis + +The following tests were all to compare the different steps of the rasterization pipeline and were done by taking an average of times over 10 iterations of each step. + + +### Performance Across Pipeline +![](renders/FinalRenders/modeltestingpipeline.PNG) + +According to the chart, the vertex and primitive assembly is clearly the stage that takes up the most time, as compared to the rasterization step and fragment shader, no matter what poly count of the model. Why is this the case? The vertex assembly is completing a series of incoherent memory allocations to and from global memory for each primitive by allocating vertices for each triangle, and other attribute data such as positions, normals, texture data, and transformations. + + +### Feature Analysis + +#### UV texture Mapping with bilinear texture filtering and perspective correct texture coordinates + + +![](renders/FinalRenders/texturing_duck_chart.PNG) + +![](renders/FinalRenders/notexturing_cow_chart.PNG) + +Regardless of texturing or not, the vertex shading and primitive assembly stage of the rasterization pipeline takes up the most time as compared to the other stages. + + +#### Support for rasterizing points, lines, and triangles + +**Points** + +![](renders/FinalRenders/renderingpoints_box_chart.PNG) + +![](renders/FinalRenders/renderingpoints_cow_chart.PNG) + +Regardless of rendering points with high poly count models or low poly count models, the vertex shading and primitive assembly stage of the rasterization pipeline takes up the most time as compared to the other stages. + + +**Lines** + +![](renders/FinalRenders/renderinglines_box_chart.PNG) + +![](renders/FinalRenders/renderinglines_cow_chart.PNG) + +Regardless of rendering lines with high poly count models or low poly count models, the vertex shading and primitive assembly stage of the rasterization pipeline takes up the most time as compared to the other stages. + + +### Mutex Test +![](renders/FinalRenders/mutextest_notzoomed.PNG) + +![](renders/FinalRenders/mutextest_zoomed.PNG) + +Regardless of rendering with the mutex utilized for depth buffer testing, the vertex shading and primitive assembly stage of the rasterization pipeline takes up the most time as compared to the other stages. What's interesting to note here is that, the box takes a lot longer to render than the flower, regardless of with the mutex or not, and regardless of whether you zoom into the model or not. + +![](renders/FinalRenders/mutextest_numbers.PNG) + +This probably occurs because the box is such a low poly count model, hence, the overhead in setting up the device arrays and completing the various kernels for the rasterization pipeline is higher. + +#### Note + +Something that I noticed was that as I zoomed in to the object in the window, then the program would run much more slowly and sometimes even freeze. I believe this may be occurring because the rasterizer is trying to access triangles that have some vertices that are not visible in the window. Hence, the loop through a triangle's bounding box would take much longer because it's trying to access triangle vertices that don't exist on the screen. The optimization/fix here would probably be to clamp the min and max coordinates of the bounding box to be within the screen width and height. + +## Resources + +### CUDA Mutexes + +CUDA mutexes were used for depth buffer testing. + +Adapted from +[this StackOverflow question](http://stackoverflow.com/questions/21341495/cuda-mutex-and-atomiccas). + +```cpp +__global__ void kernelFunction(...) { + // Get a pointer to the mutex, which should be 0 right now. + unsigned int *mutex = ...; + + // Loop-wait until this thread is able to execute its critical section. + bool isSet; + do { + isSet = (atomicCAS(mutex, 0, 1) == 0); + if (isSet) { + // Critical section goes here. + // The critical section MUST be inside the wait loop; + // if it is afterward, a deadlock will occur. + } + if (isSet) { + mutex = 0; + } + } while (!isSet); +} +``` + +### Credits and other links * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo) * [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md) + +**Texture Mapping** +* [Getting color from UV coordinates](https://stackoverflow.com/questions/35005603/get-color-of-the-texture-at-uv-coordinate) +* [Bilinear filtering 1](https://en.wikipedia.org/wiki/Bilinear_filtering) +* [Bilinear filtering 2](https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/interpolation/bilinear-filtering) +* [Perspective Correct Interpolation](https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/perspective-correct-interpolation-vertex-attributes) + +**Bresenham Line Rendering** +* [Lecture slides 1](http://groups.csail.mit.edu/graphics/classes/6.837/F02/lectures/6.837-7_Line.pdf) +* [Lecture slides 2](https://www.cs.helsinki.fi/group/goa/mallinnus/lines/bresenh.html) \ No newline at end of file diff --git a/renders/FinalRenders/box_depthtest.PNG b/renders/FinalRenders/box_depthtest.PNG new file mode 100644 index 0000000..c25608d Binary files /dev/null and b/renders/FinalRenders/box_depthtest.PNG differ diff --git a/renders/FinalRenders/cesiummilktruck_points.PNG b/renders/FinalRenders/cesiummilktruck_points.PNG new file mode 100644 index 0000000..95534dd Binary files /dev/null and b/renders/FinalRenders/cesiummilktruck_points.PNG differ diff --git a/renders/FinalRenders/cesiummilktruck_textured.PNG b/renders/FinalRenders/cesiummilktruck_textured.PNG new file mode 100644 index 0000000..335ab14 Binary files /dev/null and b/renders/FinalRenders/cesiummilktruck_textured.PNG differ diff --git a/renders/FinalRenders/cow_lines.PNG b/renders/FinalRenders/cow_lines.PNG new file mode 100644 index 0000000..2ff5282 Binary files /dev/null and b/renders/FinalRenders/cow_lines.PNG differ diff --git a/renders/FinalRenders/cow_points.PNG b/renders/FinalRenders/cow_points.PNG new file mode 100644 index 0000000..0c2b1a8 Binary files /dev/null and b/renders/FinalRenders/cow_points.PNG differ diff --git a/renders/FinalRenders/demo1.gif b/renders/FinalRenders/demo1.gif new file mode 100644 index 0000000..4535dc0 Binary files /dev/null and b/renders/FinalRenders/demo1.gif differ diff --git a/renders/FinalRenders/duck.PNG b/renders/FinalRenders/duck.PNG new file mode 100644 index 0000000..f87404a Binary files /dev/null and b/renders/FinalRenders/duck.PNG differ diff --git a/renders/FinalRenders/flower_lambert.PNG b/renders/FinalRenders/flower_lambert.PNG new file mode 100644 index 0000000..923bfb0 Binary files /dev/null and b/renders/FinalRenders/flower_lambert.PNG differ diff --git a/renders/FinalRenders/modeltestingpipeline.PNG b/renders/FinalRenders/modeltestingpipeline.PNG new file mode 100644 index 0000000..48e1d27 Binary files /dev/null and b/renders/FinalRenders/modeltestingpipeline.PNG differ diff --git a/renders/FinalRenders/mutextest_notzoomed.PNG b/renders/FinalRenders/mutextest_notzoomed.PNG new file mode 100644 index 0000000..30635f5 Binary files /dev/null and b/renders/FinalRenders/mutextest_notzoomed.PNG differ diff --git a/renders/FinalRenders/mutextest_numbers.PNG b/renders/FinalRenders/mutextest_numbers.PNG new file mode 100644 index 0000000..6110903 Binary files /dev/null and b/renders/FinalRenders/mutextest_numbers.PNG differ diff --git a/renders/FinalRenders/mutextest_zoomed.PNG b/renders/FinalRenders/mutextest_zoomed.PNG new file mode 100644 index 0000000..78d3aeb Binary files /dev/null and b/renders/FinalRenders/mutextest_zoomed.PNG differ diff --git a/renders/FinalRenders/normals_cow.gif b/renders/FinalRenders/normals_cow.gif new file mode 100644 index 0000000..89ec70d Binary files /dev/null and b/renders/FinalRenders/normals_cow.gif differ diff --git a/renders/FinalRenders/normals_flower.PNG b/renders/FinalRenders/normals_flower.PNG new file mode 100644 index 0000000..a2bb165 Binary files /dev/null and b/renders/FinalRenders/normals_flower.PNG differ diff --git a/renders/FinalRenders/notexturing_cow_chart.PNG b/renders/FinalRenders/notexturing_cow_chart.PNG new file mode 100644 index 0000000..1fb97c6 Binary files /dev/null and b/renders/FinalRenders/notexturing_cow_chart.PNG differ diff --git a/renders/FinalRenders/rasterizeLines_box.PNG b/renders/FinalRenders/rasterizeLines_box.PNG new file mode 100644 index 0000000..4d05453 Binary files /dev/null and b/renders/FinalRenders/rasterizeLines_box.PNG differ diff --git a/renders/FinalRenders/renderinglines_box_chart.PNG b/renders/FinalRenders/renderinglines_box_chart.PNG new file mode 100644 index 0000000..e05d7be Binary files /dev/null and b/renders/FinalRenders/renderinglines_box_chart.PNG differ diff --git a/renders/FinalRenders/renderinglines_cow_chart.PNG b/renders/FinalRenders/renderinglines_cow_chart.PNG new file mode 100644 index 0000000..eb6a3a0 Binary files /dev/null and b/renders/FinalRenders/renderinglines_cow_chart.PNG differ diff --git a/renders/FinalRenders/renderingpoints_box_chart.PNG b/renders/FinalRenders/renderingpoints_box_chart.PNG new file mode 100644 index 0000000..2219588 Binary files /dev/null and b/renders/FinalRenders/renderingpoints_box_chart.PNG differ diff --git a/renders/FinalRenders/renderingpoints_cow_chart.PNG b/renders/FinalRenders/renderingpoints_cow_chart.PNG new file mode 100644 index 0000000..e3b0d61 Binary files /dev/null and b/renders/FinalRenders/renderingpoints_cow_chart.PNG differ diff --git a/renders/FinalRenders/texturing_duck_chart.PNG b/renders/FinalRenders/texturing_duck_chart.PNG new file mode 100644 index 0000000..a1fb3ec Binary files /dev/null and b/renders/FinalRenders/texturing_duck_chart.PNG differ diff --git a/renders/FinalRenders/truck.gif b/renders/FinalRenders/truck.gif new file mode 100644 index 0000000..52ae607 Binary files /dev/null and b/renders/FinalRenders/truck.gif differ diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..a3971dc 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -18,6 +18,21 @@ #include #include +#include +#include + + +#define DEPTH_TEST false +#define NORMAL_TEST false +#define LAMBERT_SHADING true +#define LIGHT_POS glm::vec3(0.0f, 0.0f, 0.0f) +#define FRAG_COL glm::vec3(0.0f, 0.5f, 1.0f) + +#define BILINEAR true +#define USING_MUTEX true +#define DRAW_POINTS false +#define DRAW_LINES true + namespace { typedef unsigned short VertexIndex; @@ -43,10 +58,10 @@ namespace { glm::vec3 eyePos; // eye space position used for shading glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; + glm::vec3 col; glm::vec2 texcoord0; TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; + int texWidth, texHeight; // ... }; @@ -55,6 +70,8 @@ namespace { VertexOut v[3]; }; + //EYESPACE = CAMERA/VIEW SPACE + struct Fragment { glm::vec3 color; @@ -62,10 +79,12 @@ namespace { // The attributes listed below might be useful, // but always feel free to modify on your own - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; + VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; + int diffuseTexWidth; + int diffuseTexHeight; // ... }; @@ -83,9 +102,11 @@ namespace { VertexAttributeTexcoord* dev_texcoord0; // Materials, add more attributes when needed + TextureData* dev_diffuseTex; int diffuseTexWidth; int diffuseTexHeight; + // TextureData* dev_specularTex; // TextureData* dev_normalTex; // ... @@ -111,6 +132,8 @@ static glm::vec3 *dev_framebuffer = NULL; static int * dev_depth = NULL; // you might need this buffer when doing depth test +static int * dev_mutex = NULL; + /** * Kernel that writes the image to the OpenGL PBO directly. */ @@ -133,22 +156,154 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { } } + + +// ============================================================================= +// TIMER FUNCTIONS +// ============================================================================= + +using time_point_t = std::chrono::high_resolution_clock::time_point; +time_point_t time_start_cpu; +time_point_t time_end_cpu; +bool cpu_timer_started = false; +float prev_elapsed_time_cpu_milliseconds = 0.f; + +void startCpuTimer() +{ + if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); } + cpu_timer_started = true; + time_start_cpu = std::chrono::high_resolution_clock::now(); +} + +void endCpuTimer() +{ + time_end_cpu = std::chrono::high_resolution_clock::now(); + + if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); } + + std::chrono::duration duro = time_end_cpu - time_start_cpu; + prev_elapsed_time_cpu_milliseconds = + static_cast(duro.count()); + + cpu_timer_started = false; +} + +void printCPUTimer() +{ + std::cout << "Time (in ms): " << prev_elapsed_time_cpu_milliseconds << std::endl; +} + + + /** * Writes fragment colors to the framebuffer */ + +// =========================================================================================== +// FRAGMENT SHADING / FRAGMENT TO FRAME BUFFER WRITING +// =========================================================================================== __global__ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; int index = x + (y * w); - if (x < w && y < h) { - framebuffer[index] = fragmentBuffer[index].color; + if (x < w && y < h) + { + Fragment currFrag = fragmentBuffer[index]; // TODO: add your fragment shader code here - } -} + if (currFrag.dev_diffuseTex == NULL) + { + glm::vec3 outputColor(1.0f); + + if (DRAW_POINTS || DRAW_LINES) + { + outputColor = glm::vec3(currFrag.color); + } + + else if (DEPTH_TEST) + { + // The background is black instead of white because rendering based on only frag color + // If depth buffer was passed, then this would render the background as white + // since the buffer was set to high int's + outputColor = glm::vec3(currFrag.color); + } + + else if (LAMBERT_SHADING) + { + float lambert = glm::abs(glm::dot(currFrag.eyeNor, glm::normalize(LIGHT_POS - currFrag.eyePos))); + outputColor = glm::vec3(glm::clamp(lambert * currFrag.color, 0.0f, 1.0f)); + } + + else if (NORMAL_TEST) + { + outputColor = glm::vec3(glm::clamp(currFrag.eyeNor, 0.0f, 1.0f)); + } + + framebuffer[index] = outputColor; + }//end if null texture + + // Texture Mapping with and w/o Bilinear filtering + else + { + //https://stackoverflow.com/questions/35005603/get-color-of-the-texture-at-uv-coordinate + //https://en.wikipedia.org/wiki/Bilinear_filtering + //https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/interpolation/bilinear-filtering + + glm::vec3 texColor(1.0f); + TextureData* tex = currFrag.dev_diffuseTex; + int texWidth = currFrag.diffuseTexWidth; + int texHeight = currFrag.diffuseTexHeight; + // Scale the UV coords to width and height of texture + float uCoord = currFrag.texcoord0.x * texWidth; + float vCoord = currFrag.texcoord0.y * texHeight; + float r = 1.0f; + float g = 1.0f; + float b = 1.0f; + + if (BILINEAR) + { + int u_floor = glm::floor(uCoord); + int v_floor = glm::floor(vCoord); + float u_fract = uCoord - u_floor; + float v_fract = vCoord - v_floor; + float u_opposite = 1.0f - u_fract; + float v_opposite = 1.0f - v_fract; + + int c00_idx = 3 * (u_floor + (v_floor * texWidth)); + int c10_idx = 3 * ((u_floor + 1) + (v_floor * texWidth)); + int c01_idx = 3 * (u_floor + ((v_floor + 1) * texWidth)); + int c11_idx = 3 * ((u_floor + 1) + ((v_floor + 1) * texWidth)); + + r = ((tex[c00_idx] * u_opposite + tex[c10_idx] * u_fract) * v_opposite) + + ((tex[c01_idx] * u_opposite + tex[c11_idx] * u_fract) * v_fract); + + g = ((tex[c00_idx + 1] * u_opposite + tex[c10_idx + 1] * u_fract) * v_opposite) + + ((tex[c01_idx + 1] * u_opposite + tex[c11_idx + 1] * u_fract) * v_fract); + + b = ((tex[c00_idx + 2] * u_opposite + tex[c10_idx + 2] * u_fract) * v_opposite) + + ((tex[c01_idx + 2] * u_opposite + tex[c11_idx + 2] * u_fract) * v_fract); + }//end bilinear + + else + { + int i_uCoord = int(uCoord); + int i_vCoord = int(vCoord); + int pixelUVIdx = 3 * (i_uCoord + (i_vCoord * texWidth)); + r = tex[pixelUVIdx + 0]; + g = tex[pixelUVIdx + 1]; + b = tex[pixelUVIdx + 2]; + }//end with no bilinear + + texColor = glm::vec3(r, g, b) / 255.0f; + framebuffer[index] = texColor; + + }//end else texture not null + + }//end index +}//end kernel /** * Called once at the beginning of the program to allocate memory. @@ -166,6 +321,10 @@ void rasterizeInit(int w, int h) { cudaFree(dev_depth); cudaMalloc(&dev_depth, width * height * sizeof(int)); + cudaFree(dev_mutex); + cudaMalloc(&dev_mutex, width * height * sizeof(int)); + cudaMemset(dev_mutex, 0, width * height * sizeof(int)); + checkCUDAError("rasterizeInit"); } @@ -621,7 +780,9 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { } - +// =========================================================================================== +// Vertex Assembly +// =========================================================================================== __global__ void _vertexTransformAndAssembly( @@ -635,17 +796,60 @@ void _vertexTransformAndAssembly( if (vid < numVertices) { // TODO: Apply vertex transformation here + // Multiply the MVP matrix for each vertex position, this will transform everything into clipping space + glm::vec3 currVertPos = primitive.dev_position[vid]; + glm::vec3 currNor = primitive.dev_normal[vid]; + + glm::vec4 _currVertPos = glm::vec4(currVertPos, 1.0f); + glm::vec4 unHomScreenSpace = MVP * _currVertPos; + // Then divide the pos by its w element to transform into NDC space + // Perspective divide and Perspective Correct Interpolation + unHomScreenSpace /= unHomScreenSpace.w; + // Finally transform x and y to viewport space + glm::vec4 pixelPos = unHomScreenSpace; + float pixelX = (float)width * ((unHomScreenSpace.x + 1.0f) / 2.0f); + float pixelY = (float)height * ((1.0f - unHomScreenSpace.y) / 2.0f); + pixelPos.x = pixelX; + pixelPos.y = pixelY; + + // Convert z from [-1, 1] to [0, 1] to be between clipping planes + pixelPos.z = -(1.0f + pixelPos.z) / 2.0f; + + // Position and normal from camera + glm::vec3 cameraPos = glm::vec3(MV * _currVertPos); + glm::vec3 cameraNor = glm::normalize(MV_normal * currNor); // TODO: Apply vertex assembly here // Assemble all attribute arraies into the primitive array - + primitive.dev_verticesOut[vid].pos = pixelPos; + primitive.dev_verticesOut[vid].eyePos = cameraPos; + primitive.dev_verticesOut[vid].eyeNor = cameraNor; + + // Give a preliminary color + primitive.dev_verticesOut[vid].col = FRAG_COL; + + // Texture info + if (primitive.dev_diffuseTex == NULL) + { + primitive.dev_verticesOut[vid].dev_diffuseTex = NULL; + } + else + { + primitive.dev_verticesOut[vid].dev_diffuseTex = primitive.dev_diffuseTex; + primitive.dev_verticesOut[vid].texcoord0 = primitive.dev_texcoord0[vid]; + primitive.dev_verticesOut[vid].texWidth = primitive.diffuseTexWidth; + primitive.dev_verticesOut[vid].texHeight = primitive.diffuseTexHeight; + } } } +// =========================================================================================== +// Primitive Assembly +// =========================================================================================== static int curPrimitiveBeginId = 0; @@ -660,19 +864,351 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ // TODO: uncomment the following code for a start // This is primitive assembly for triangles - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + } // TODO: other primitive types (point, line) + + //if (primitive.primitiveMode == TINYGLTF_MODE_POINTS) + //{ + + //} + + //if (primitive.primitiveMode == TINYGLTF_MODE_LINE) + //{ + + //} } } +__device__ +void _rasterizeLines(Fragment* fragmentBuffer, int width, int height, glm::vec3 pt1, glm::vec3 pt2, glm::vec3 color) +{ + int dx = pt2.x - pt1.x; + int dy = pt2.y - pt1.y; + int m = dy / dx; + int eps = 0; + int y = pt1.y; + + + int dxe = 0; + + for (int x = pt1.x; x <= pt2.x; x++) + { + + int fragIdx = x + (y * width); + fragmentBuffer[fragIdx].color = color; + + // Method 1 + // https://www.cs.helsinki.fi/group/goa/mallinnus/lines/bresenh.html + //eps += dy; + //if (m > 0) + //{ + // if ((eps << 1) >= dx) + // { + // y++; + // eps -= dx; + // } + //} + + //if (m < 0) + //{ + // if (eps + m > -0.5) + // { + // eps = eps + m; + // } + // else + // { + // y--; + // eps = eps + m + 1; + // } + //} + + // http://groups.csail.mit.edu/graphics/classes/6.837/F02/lectures/6.837-7_Line.pdf + // Method 2 + // This is only for case x1 < x2, m <= 1 + //y = pt1.y + m * (x - pt1.x); + //eps += m; + //if (eps > 0.5) + //{ + // y++; + // eps -= 1; + //} + + + // Method 3 + y = pt1.y + m * (x - pt1.x); + dxe += m * dx + dy; + if (dxe > (dx + 1) / 2) + { + y++; + eps -= 1; + } + } +} + +// =========================================================================================== +// Rasterize Kernel +// =========================================================================================== + +/* + For every triangle + Calculate AABB + Iterate through min and max AABB bounds + Calculate barycentric coord + Check if barycentric coord is in triangle + + NOTES for depth test + - if currdepth is < curr depth in depth buffer, replace and write to depth buffer with new value + - need to create a float buffer for this + - or use the int depth_buffer as an index container + - fragmentbuffer[depth_buffer[idx]].color = glm::vec3(depthval) + - TO USE INT BUFFER, SCALE THE DEPTH VALUE YOU GET SO THAT IT BECOMES AN INT + - AND THEN JUST COMPARE WITH THAT + - race condition : if multiple threads trying to write to same place in depth buffer +*/ + +__global__ +void _rasterize(int numTriIndices, Primitive* dev_primitives, Fragment* fragmentBuffer, int width, int height, int* dev_depth, int* dev_mutex) +{ + int idx = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (idx < numTriIndices) + { + Primitive currPrim = dev_primitives[idx]; + + glm::vec3 triEyePos[3]; + triEyePos[0] = glm::vec3(currPrim.v[0].eyePos); + triEyePos[1] = glm::vec3(currPrim.v[1].eyePos); + triEyePos[2] = glm::vec3(currPrim.v[2].eyePos); + + glm::vec3 triEyeNor[3]; + triEyeNor[0] = glm::vec3(currPrim.v[0].eyeNor); + triEyeNor[1] = glm::vec3(currPrim.v[1].eyeNor); + triEyeNor[2] = glm::vec3(currPrim.v[2].eyeNor); + + glm::vec3 triFragCol[3]; + triFragCol[0] = glm::vec3(currPrim.v[0].col); + triFragCol[1] = glm::vec3(currPrim.v[1].col); + triFragCol[2] = glm::vec3(currPrim.v[2].col); + + glm::vec2 triTexCoord[3]; + triTexCoord[0] = glm::vec2(currPrim.v[0].texcoord0); + triTexCoord[1] = glm::vec2(currPrim.v[1].texcoord0); + triTexCoord[2] = glm::vec2(currPrim.v[2].texcoord0); + + glm::vec3 triPos[3]; + triPos[0] = glm::vec3(currPrim.v[0].pos); + triPos[1] = glm::vec3(currPrim.v[1].pos); + triPos[2] = glm::vec3(currPrim.v[2].pos); + AABB triAABB = getAABBForTriangle(triPos); + + if (DRAW_POINTS) + { + for (int i = 0; i < 3; i++) + { + int x = triPos[i].x; + int y = triPos[i].y; + int fragIdx = x + (y * width); + fragmentBuffer[fragIdx].color = currPrim.v[0].col; //all 3 verts are set to arbitrary color from vert shader + } + }//end render points + + else if (DRAW_LINES) + { + //Draw lines between the 3 vertices of triangle + _rasterizeLines(fragmentBuffer, width, height, triPos[0], triPos[1], currPrim.v[0].col); + _rasterizeLines(fragmentBuffer, width, height, triPos[1], triPos[2], currPrim.v[0].col); + _rasterizeLines(fragmentBuffer, width, height, triPos[2], triPos[0], currPrim.v[0].col); + + }//end render lines + + else + { + //int clampedWidthMin = glm::clamp((int)triAABB.min.x, 0, (int)triAABB.min.x); + //int clampedWidthMax = glm::clamp((int)triAABB.max.x, (int)triAABB.max.x, width); + //int clampedHeightMin = glm::clamp((int)triAABB.min.y, 0, (int)triAABB.min.y); + //int clampedHeightMax = glm::clamp((int)triAABB.max.y, (int)triAABB.max.y, height); + + for (int x = triAABB.min.x; x <= triAABB.max.x; x++) //for (int x = clampedWidthMin; x <= clampedWidthMax; x++) + { + for (int y = triAABB.min.y; y <= triAABB.max.y; y++) //for (int y = clampedHeightMin; y <= clampedHeightMax; y++) + { + glm::vec3 baryCoord = calculateBarycentricCoordinate(triPos, glm::vec2(x, y)); + bool isBaryCoordInTri = isBarycentricCoordInBounds(baryCoord); + if (isBaryCoordInTri) + { + int fragIdx = x + (y * width); + + if (USING_MUTEX) + { + bool isSet = false; + do + { + isSet = (atomicCAS(&dev_mutex[fragIdx], 0, 1) == 0); + + if (isSet) + { + // ========================= PUT CODE IN MUTEX CHECK =============================== + + // Calculating color according to depth buffer + float depthVal = getZAtCoordinate(baryCoord, triPos); + int scale = 10000; // Is this a good enough number? + int scaledDepth = scale * depthVal; + + //atomicMin(&dev_depth[fragIdx], scaledDepth); + + if (scaledDepth < dev_depth[fragIdx]) + { + dev_depth[fragIdx] = scaledDepth; + + if (DEPTH_TEST) + { + glm::vec3 newColor(dev_depth[fragIdx] / (float)scale); + fragmentBuffer[fragIdx].color = newColor; + } + + else if (LAMBERT_SHADING) + { + glm::vec3 interpolatedEyePos(baryCoord.x * triEyePos[0] + baryCoord.y * triEyePos[1] + baryCoord.z * triEyePos[2]); + glm::vec3 interpolatedEyeNor(baryCoord.x * triEyeNor[0] + baryCoord.y * triEyeNor[1] + baryCoord.z * triEyeNor[2]); + glm::vec3 interpolatedFragColor(baryCoord.x * triFragCol[0] + baryCoord.y * triFragCol[1] + baryCoord.z * triFragCol[2]); + + fragmentBuffer[fragIdx].eyePos = interpolatedEyePos; + fragmentBuffer[fragIdx].eyeNor = interpolatedEyeNor; + fragmentBuffer[fragIdx].color = interpolatedFragColor; + } + + else if (NORMAL_TEST) + { + glm::vec3 interpolatedEyePos(baryCoord.x * triEyePos[0] + baryCoord.y * triEyePos[1] + baryCoord.z * triEyePos[2]); + glm::vec3 interpolatedEyeNor(baryCoord.x * triEyeNor[0] + baryCoord.y * triEyeNor[1] + baryCoord.z * triEyeNor[2]); + + fragmentBuffer[fragIdx].eyePos = interpolatedEyePos; + fragmentBuffer[fragIdx].eyeNor = interpolatedEyeNor; + } + + else + { + glm::vec3 interpolatedFragColor(baryCoord.x * triFragCol[0] + baryCoord.y * triFragCol[1] + baryCoord.z * triFragCol[2]); + fragmentBuffer[fragIdx].color = interpolatedFragColor; + } + + // Texture Mapping with perspective correct coordinates + //https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/perspective-correct-interpolation-vertex-attributes + glm::vec3 perspCorrectBaryCoord(baryCoord.x / triEyePos[0].z, + baryCoord.y / triEyePos[1].z, + baryCoord.z / triEyePos[2].z); + + float uFactor = perspCorrectBaryCoord.x * triTexCoord[0].x + + perspCorrectBaryCoord.y * triTexCoord[1].x + + perspCorrectBaryCoord.z * triTexCoord[2].x; + + float vFactor = perspCorrectBaryCoord.x * triTexCoord[0].y + + perspCorrectBaryCoord.y * triTexCoord[1].y + + perspCorrectBaryCoord.z * triTexCoord[2].y; + + float z = 1.0f / (perspCorrectBaryCoord.x + perspCorrectBaryCoord.y + perspCorrectBaryCoord.z); + + fragmentBuffer[fragIdx].texcoord0 = glm::vec2(uFactor * z, vFactor * z); + + // These should be the same regardless of which prim's vertex + fragmentBuffer[fragIdx].dev_diffuseTex = currPrim.v[0].dev_diffuseTex; + fragmentBuffer[fragIdx].diffuseTexWidth = currPrim.v[0].texWidth; + fragmentBuffer[fragIdx].diffuseTexHeight = currPrim.v[0].texHeight; + + }//if depths are equal + // ========================= PUT CODE IN MUTEX CHECK =============================== + }//end if isSet + + if (isSet) dev_mutex[fragIdx] = 0; + + } while (!isSet); + } //end if using mutex + + else + { + // Calculating color according to depth buffer + float depthVal = getZAtCoordinate(baryCoord, triPos); + int scale = 10000; // Is this a good enough number? + int scaledDepth = scale * depthVal; + + atomicMin(&dev_depth[fragIdx], scaledDepth); + if (scaledDepth == dev_depth[fragIdx]) + { + if (DEPTH_TEST) + { + glm::vec3 newColor(dev_depth[fragIdx] / (float)scale); + fragmentBuffer[fragIdx].color = newColor; + } + + else if (LAMBERT_SHADING) + { + glm::vec3 interpolatedEyePos(baryCoord.x * triEyePos[0] + baryCoord.y * triEyePos[1] + baryCoord.z * triEyePos[2]); + glm::vec3 interpolatedEyeNor(baryCoord.x * triEyeNor[0] + baryCoord.y * triEyeNor[1] + baryCoord.z * triEyeNor[2]); + glm::vec3 interpolatedFragColor(baryCoord.x * triFragCol[0] + baryCoord.y * triFragCol[1] + baryCoord.z * triFragCol[2]); + + fragmentBuffer[fragIdx].eyePos = interpolatedEyePos; + fragmentBuffer[fragIdx].eyeNor = interpolatedEyeNor; + fragmentBuffer[fragIdx].color = interpolatedFragColor; + } + + else if (NORMAL_TEST) + { + glm::vec3 interpolatedEyePos(baryCoord.x * triEyePos[0] + baryCoord.y * triEyePos[1] + baryCoord.z * triEyePos[2]); + glm::vec3 interpolatedEyeNor(baryCoord.x * triEyeNor[0] + baryCoord.y * triEyeNor[1] + baryCoord.z * triEyeNor[2]); + + fragmentBuffer[fragIdx].eyePos = interpolatedEyePos; + fragmentBuffer[fragIdx].eyeNor = interpolatedEyeNor; + } + + else + { + glm::vec3 interpolatedFragColor(baryCoord.x * triFragCol[0] + baryCoord.y * triFragCol[1] + baryCoord.z * triFragCol[2]); + fragmentBuffer[fragIdx].color = interpolatedFragColor; + } + + // Texture Mapping with perspective correct coordinates + glm::vec3 perspCorrectBaryCoord(baryCoord.x / triEyePos[0].z, + baryCoord.y / triEyePos[1].z, + baryCoord.z / triEyePos[2].z); + + float uFactor = perspCorrectBaryCoord.x * triTexCoord[0].x + + perspCorrectBaryCoord.y * triTexCoord[1].x + + perspCorrectBaryCoord.z * triTexCoord[2].x; + + float vFactor = perspCorrectBaryCoord.x * triTexCoord[0].y + + perspCorrectBaryCoord.y * triTexCoord[1].y + + perspCorrectBaryCoord.z * triTexCoord[2].y; + + float z = 1.0f / (perspCorrectBaryCoord.x + perspCorrectBaryCoord.y + perspCorrectBaryCoord.z); + + fragmentBuffer[fragIdx].texcoord0 = glm::vec2(uFactor * z, vFactor * z); + + // These should be the same regardless of which prim's vertex + fragmentBuffer[fragIdx].dev_diffuseTex = currPrim.v[0].dev_diffuseTex; + fragmentBuffer[fragIdx].diffuseTexWidth = currPrim.v[0].texWidth; + fragmentBuffer[fragIdx].diffuseTexHeight = currPrim.v[0].texHeight; + }//if depths are equal + }//end else not using mutex + }//end if baryInBounds + }//end for y + }//end for x + }//end render triangles + + }//end if idx +}//end _rasterize + + + // =========================================================================================== + // Rasterize CPU Function + // =========================================================================================== /** @@ -687,7 +1223,9 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g // Execute your rasterization pipeline here // (See README for rasterization pipeline outline.) - // Vertex Process & primitive assembly + // ================================== Vertex Process & primitive assembly ================================== + //startCpuTimer(); + { curPrimitiveBeginId = 0; dim3 numThreadsPerBlock(128); @@ -695,17 +1233,23 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g auto it = mesh2PrimitivesMap.begin(); auto itEnd = mesh2PrimitivesMap.end(); - for (; it != itEnd; ++it) { + + + for (; it != itEnd; ++it) + { auto p = (it->second).begin(); // each primitive auto pEnd = (it->second).end(); - for (; p != pEnd; ++p) { + for (; p != pEnd; ++p) + { dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); + _vertexTransformAndAssembly <<< numBlocksForVertices, numThreadsPerBlock >>>(p->numVertices, *p, MVP, MV, MV_normal, width, height); checkCUDAError("Vertex Processing"); + cudaDeviceSynchronize(); - _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > + + _primitiveAssembly <<< numBlocksForIndices, numThreadsPerBlock >>> (p->numIndices, curPrimitiveBeginId, dev_primitives, @@ -717,22 +1261,44 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g } checkCUDAError("Vertex Processing and Primitive Assembly"); - } + } // end vertex process and prim assembly + //endCpuTimer(); + //printCPUTimer(); + + // ================================== initialize depth and fragment buffer ================================== cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); initDepth << > >(width, height, dev_depth); - // TODO: rasterize - - - - // Copy depthbuffer colors into framebuffer - render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); + // ================================== rasterize ================================== + //startCpuTimer(); + + dim3 primitive_numThreadsPerBlock(128); + dim3 numBlocksForPrimitives((totalNumPrimitives + primitive_numThreadsPerBlock.x - 1) / primitive_numThreadsPerBlock.x); + _rasterize <<>> (totalNumPrimitives, + dev_primitives, + dev_fragmentBuffer, + width, + height, + dev_depth, + dev_mutex); + checkCUDAError("_rasterize"); + + //endCpuTimer(); + //printCPUTimer(); + + // ================================== Copy depthbuffer colors into framebuffer ================================== + //startCpuTimer(); + render <<>>(width, height, dev_fragmentBuffer, dev_framebuffer); checkCUDAError("fragment shader"); - // Copy framebuffer into OpenGL buffer for OpenGL previewing + //endCpuTimer(); + //printCPUTimer(); + + // ================================== Copy framebuffer into OpenGL buffer for OpenGL previewing ================================== sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); checkCUDAError("copy render result to pbo"); -} + +}//end rasterize function /** * Called once at the end of the program to free CUDA memory. @@ -772,5 +1338,8 @@ void rasterizeFree() { cudaFree(dev_depth); dev_depth = NULL; + cudaFree(dev_mutex); + dev_mutex = NULL; + checkCUDAError("rasterize Free"); } diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h index 46c701e..4fb84a9 100644 --- a/src/rasterizeTools.h +++ b/src/rasterizeTools.h @@ -25,6 +25,19 @@ glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { return glm::vec3(m * v); } + +// WRITING MY OWN +/** +* Multiplies a glm::mat4 matrix and a vec3. +*/ +__host__ __device__ static +glm::vec3 multiplyMV3to4(glm::mat4 m, glm::vec3 v) { + glm::vec4 updatedV = glm::vec4(v, 1.0f); + glm::vec4 output = m * updatedV; + return glm::vec3(output); +} + + // CHECKITOUT /** * Finds the axis aligned bounding box for a given triangle.