diff --git a/README.md b/README.md index cad1abd..879b2bb 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,91 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) - **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Ricky Rajani +* Tested on: Windows 7, i7-6700 @ 3.40GHz 16GB, NVIDIA Quadro K620 (Moore 100C Lab) + +This project implements a simplified rasterized graphics pipeline, similar to the OpenGL pipeline, using CUDA. + +### Core Features: +- Vertex assembly +- Vertex shading +- Primitive assembly +- Depth test +- Rasterization +- Race avoidance using atomic function +- Fragment shading with lambertian shading +- Framebuffer + +### Extra Features: +- Support for rasterizing additional primitives: lines and points +- UV texture mapping with bilinear texture filtering and perspective correct texture coordinates + +### Flags: +These flags can be found at the top of ```rasterize.cu``` +- ```LIGHTING``` - Enables lambertian shading +- ```TEXTURE``` - Enables UV texture mapping, models are white by default +- ```BILINEAR``` - Enables bilinear texture filtering when ```TEXTURE``` is enabled +- ```PERSPECTIVE``` - Enables perspective correction when ```TEXTURE``` is enabled +- ```POINTS``` - Enables points instead of triangle primitives +- ```POINTCLOUD``` - Sparsity of points +- ```LINE``` - Enables lines instead of triangle primitives + +# Samples + +#### Demos of scenes using basic rasterization pipeline using Lambertian shading + +Cow | Duck +:-------------------------------: | :-------------------------------: +![](renders/cow_normal.PNG) | ![](renders/duck_normal.PNG) + +Engine | Truck +:-------------------------------: | :-------------------------------: +![](renders/engine_normal.PNG) | ![](renders/truck_normal.PNG) + +### Demos of scenes using additional primitives +Duck | Cow +:-------------------------------: | :-------------------------------: +![](renders/duck_points_10.PNG) | ![](renders/cow_points_10.PNG) + +Duck | Cow | Truck +:-------------------------------: | :-------------------------------: | :-------------------------------: +![](renders/duck_lines.PNG) | ![](renders/cow_lines.PNG) | ![](renders/truck_lines.PNG) + + +![](renders/rasterize-graph.PNG) + +The points have a step size of 50, so it is understandable that there is not as great a performance hit when using point primitives as it requires less iterations than using line primitives. + +#### Demos of scenes using UV texture mapping + +Duck | Truck +:-------------------------------: | :-------------------------------: +![](renders/duck_texture.PNG) | ![](renders/truck_texture.PNG) + + +Checkerboard | Checkerboard with Bilinear Filtering | Checkerboard with Perspective Correction +:-------------------------------: | :-------------------------------: | :-------------------------------: +![](renders/checkerboard-normal.PNG) | ![](renders/checkerboard-bilinear.PNG) | ![](renders/checkerboard-perspective.PNG) + +Bilinear filtering is an antialiasing technique which creates smoother edges on the checkerboard compared to UV texture mapping without the filtering. In this method the four nearest texels to the pixel center are sampled, and their colors are combined by weighted average according to distance. This removes the 'blockiness' seen during magnification, as there is now a smooth gradient of color change from one texel to the next. After profiling ```kernTextureMap``` when biliniear texture filtering is turned off and on, it seems that there is a performance hit. Without the filtering the kernel takes 0.88 ms for each iteration and 1.02 ms with filtering. + +The perspective correction fixed the distortion that was occuring. As can be seen from the charts below, it greatly reduced the FPS causing a non-trivial performance hit. Likewise, the same occurs with bilinear texture filtering, but the performance hit is not as great as perspective correction. + +# Performance Analysis + +![](renders/fps_graph.PNG) -### (TODO: Your README) +![](renders/pipeline_timing_graph.PNG) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +It is important to note that the number of primitives in a scene does not have a direct effect on the time spent in each pipeline stage, specifically the rasterization stage which consumes the most amount of time. Examining the charts above with various scenes, the cow has over 5000 primitives and the rasterization stage takes a third of the total time, whereas the box only has 12 primitives but the rasterization stage takes up most of the total time. An explanation for this could be the size of the primitives. Each kernel iterates through an entire bounding box; therefore, if a primitive has a large size then its bounding box will be bigger and there is more work each kernel must do during each iteration. This may lead to a considerable performance hit. +*CMakeLists modified to include 'common.h' for recording performance time ### Credits * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo) * [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md) +* [Bilineary texture filtering](https://en.wikipedia.org/wiki/Bilinear_interpolation) +* [Perspective correctness](https://en.wikipedia.org/wiki/Texture_mapping#Perspective_correctness) diff --git a/renders/checkerboard-bilinear.PNG b/renders/checkerboard-bilinear.PNG new file mode 100644 index 0000000..de29585 Binary files /dev/null and b/renders/checkerboard-bilinear.PNG differ diff --git a/renders/checkerboard-normal.PNG b/renders/checkerboard-normal.PNG new file mode 100644 index 0000000..bcb11c6 Binary files /dev/null and b/renders/checkerboard-normal.PNG differ diff --git a/renders/checkerboard-perspective.PNG b/renders/checkerboard-perspective.PNG new file mode 100644 index 0000000..4e91dd1 Binary files /dev/null and b/renders/checkerboard-perspective.PNG differ diff --git a/renders/cow_lines.PNG b/renders/cow_lines.PNG new file mode 100644 index 0000000..88a71c2 Binary files /dev/null and b/renders/cow_lines.PNG differ diff --git a/renders/cow_normal.PNG b/renders/cow_normal.PNG new file mode 100644 index 0000000..d3416cb Binary files /dev/null and b/renders/cow_normal.PNG differ diff --git a/renders/cow_points.PNG b/renders/cow_points.PNG new file mode 100644 index 0000000..9d5065a Binary files /dev/null and b/renders/cow_points.PNG differ diff --git a/renders/cow_points_10.PNG b/renders/cow_points_10.PNG new file mode 100644 index 0000000..8f3ff33 Binary files /dev/null and b/renders/cow_points_10.PNG differ diff --git a/renders/duck_lines.PNG b/renders/duck_lines.PNG new file mode 100644 index 0000000..a46202f Binary files /dev/null and b/renders/duck_lines.PNG differ diff --git a/renders/duck_normal.PNG b/renders/duck_normal.PNG new file mode 100644 index 0000000..d9d0b90 Binary files /dev/null and b/renders/duck_normal.PNG differ diff --git a/renders/duck_points.PNG b/renders/duck_points.PNG new file mode 100644 index 0000000..4a1fd8e Binary files /dev/null and b/renders/duck_points.PNG differ diff --git a/renders/duck_points_10.PNG b/renders/duck_points_10.PNG new file mode 100644 index 0000000..efc0a71 Binary files /dev/null and b/renders/duck_points_10.PNG differ diff --git a/renders/duck_texture.PNG b/renders/duck_texture.PNG new file mode 100644 index 0000000..8f48610 Binary files /dev/null and b/renders/duck_texture.PNG differ diff --git a/renders/engine_normal.PNG b/renders/engine_normal.PNG new file mode 100644 index 0000000..99d62fd Binary files /dev/null and b/renders/engine_normal.PNG differ diff --git a/renders/fps_graph.PNG b/renders/fps_graph.PNG new file mode 100644 index 0000000..bc415a0 Binary files /dev/null and b/renders/fps_graph.PNG differ diff --git a/renders/pipeline_timing_graph.PNG b/renders/pipeline_timing_graph.PNG new file mode 100644 index 0000000..2631a16 Binary files /dev/null and b/renders/pipeline_timing_graph.PNG differ diff --git a/renders/rasterize-graph.PNG b/renders/rasterize-graph.PNG new file mode 100644 index 0000000..0695f33 Binary files /dev/null and b/renders/rasterize-graph.PNG differ diff --git a/renders/truck_lines.PNG b/renders/truck_lines.PNG new file mode 100644 index 0000000..9301e38 Binary files /dev/null and b/renders/truck_lines.PNG differ diff --git a/renders/truck_normal.PNG b/renders/truck_normal.PNG new file mode 100644 index 0000000..21b7ea9 Binary files /dev/null and b/renders/truck_normal.PNG differ diff --git a/renders/truck_points.PNG b/renders/truck_points.PNG new file mode 100644 index 0000000..69084f9 Binary files /dev/null and b/renders/truck_points.PNG differ diff --git a/renders/truck_texture.PNG b/renders/truck_texture.PNG new file mode 100644 index 0000000..7aa6f50 Binary files /dev/null and b/renders/truck_texture.PNG differ diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index a57f69f..1cc14c7 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -1,4 +1,5 @@ set(SOURCE_FILES + "common.h" "rasterize.cu" "rasterize.h" "rasterizeTools.h" diff --git a/src/common.h b/src/common.h new file mode 100644 index 0000000..ef3cd42 --- /dev/null +++ b/src/common.h @@ -0,0 +1,112 @@ +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +namespace StreamCompaction { + namespace Common { + __global__ void kernMapToBoolean(int n, int *bools, const int *idata); + + __global__ void kernScatter(int n, int *odata, + const int *idata, const int *bools, const int *indices); + + /** + * This class is used for timing the performance + * Uncopyable and unmovable + * + * Adapted from WindyDarian(https://github.com/WindyDarian) + */ + class PerformanceTimer + { + public: + PerformanceTimer() + { + cudaEventCreate(&event_start); + cudaEventCreate(&event_end); + } + + ~PerformanceTimer() + { + cudaEventDestroy(event_start); + cudaEventDestroy(event_end); + } + + void startCpuTimer() + { + if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); } + cpu_timer_started = true; + + time_start_cpu = std::chrono::high_resolution_clock::now(); + } + + void endCpuTimer() + { + time_end_cpu = std::chrono::high_resolution_clock::now(); + + if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); } + + std::chrono::duration duro = time_end_cpu - time_start_cpu; + prev_elapsed_time_cpu_milliseconds = + static_cast(duro.count()); + + cpu_timer_started = false; + } + + void startGpuTimer() + { + if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); } + gpu_timer_started = true; + + cudaEventRecord(event_start); + } + + void endGpuTimer() + { + cudaEventRecord(event_end); + cudaEventSynchronize(event_end); + + if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); } + + cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end); + gpu_timer_started = false; + } + + float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015 + { + return prev_elapsed_time_cpu_milliseconds; + } + + float getGpuElapsedTimeForPreviousOperation() //noexcept + { + return prev_elapsed_time_gpu_milliseconds; + } + + // remove copy and move functions + PerformanceTimer(const PerformanceTimer&) = delete; + PerformanceTimer(PerformanceTimer&&) = delete; + PerformanceTimer& operator=(const PerformanceTimer&) = delete; + PerformanceTimer& operator=(PerformanceTimer&&) = delete; + + private: + cudaEvent_t event_start = nullptr; + cudaEvent_t event_end = nullptr; + + using time_point_t = std::chrono::high_resolution_clock::time_point; + time_point_t time_start_cpu; + time_point_t time_end_cpu; + + bool cpu_timer_started = false; + bool gpu_timer_started = false; + + float prev_elapsed_time_cpu_milliseconds = 0.f; + float prev_elapsed_time_gpu_milliseconds = 0.f; + }; + } +} diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..27b3b5a 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -11,6 +11,7 @@ #include #include #include +#include "common.h" #include #include #include "rasterizeTools.h" @@ -18,6 +19,33 @@ #include #include +#define blockSize 256 + +// Toggle features + +// Lambert shading +#define LIGHTING 1 + +// Bilinear texture filtering and perspective correction work with TEXTURE enabled +#define TEXTURE 0 +#define BILINEAR 0 +#define PERSPECTIVE 0 + +#define POINTCLOUD 10 +#define POINTS 1 +#define LINE 0 + +using StreamCompaction::Common::PerformanceTimer; +PerformanceTimer& timer() +{ + static PerformanceTimer timer; + return timer; +} + +// timer().startGpuTimer(); +// timer().endGpuTimer(); +// cout << timer().getGpuElapsedTimeForPreviousOperation() << endl; + namespace { typedef unsigned short VertexIndex; @@ -43,30 +71,38 @@ namespace { glm::vec3 eyePos; // eye space position used for shading glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; + glm::vec3 col; glm::vec2 texcoord0; TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; + int texWidth, texHeight; + int tex; // ... }; struct Primitive { PrimitiveType primitiveType = Triangle; // C++ 11 init VertexOut v[3]; + + TextureData* dev_diffuseTex = NULL; + int texWidth, texHeight; + int tex; }; struct Fragment { glm::vec3 color; - + // TODO: add new attributes to your Fragment // The attributes listed below might be useful, // but always feel free to modify on your own - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; + VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; // ... + + int texWidth, texHeight; + int tex; }; struct PrimitiveDevBufPointers { @@ -86,14 +122,15 @@ namespace { TextureData* dev_diffuseTex; int diffuseTexWidth; int diffuseTexHeight; - // TextureData* dev_specularTex; - // TextureData* dev_normalTex; + /*TextureData* dev_specularTex; + TextureData* dev_normalTex;*/ // ... // Vertex Out, vertex used for rasterization, this is changing every frame VertexOut* dev_verticesOut; // TODO: add more attributes when needed + int tex; }; } @@ -146,7 +183,11 @@ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { framebuffer[index] = fragmentBuffer[index].color; // TODO: add your fragment shader code here - + framebuffer[index] = fragmentBuffer[index].color; +#if LIGHTING + glm::vec3 light = glm::normalize(glm::vec3(3, 5, 2)); + framebuffer[index] *= glm::dot(light, fragmentBuffer[index].eyeNor); +#endif } } @@ -331,7 +372,7 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { // 2. for each mesh: // for each primitive: - // build device buffer of indices, materail, and each attributes + // build device buffer of indices, material, and each attributes // and store these pointers in a map { @@ -523,6 +564,8 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { TextureData* dev_diffuseTex = NULL; int diffuseTexWidth = 0; int diffuseTexHeight = 0; + + int texture = 0; if (!primitive.material.empty()) { const tinygltf::Material &mat = scene.materials.at(primitive.material); printf("material.name = %s\n", mat.name.c_str()); @@ -540,13 +583,14 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { diffuseTexWidth = image.width; diffuseTexHeight = image.height; + texture = image.component; checkCUDAError("Set Texture Image data"); } } } - // TODO: write your code for other materails + // TODO: write your code for other materials // You may have to take a look at tinygltfloader // You can also use the above code loading diffuse material as a start point } @@ -583,7 +627,10 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { diffuseTexWidth, diffuseTexHeight, - dev_vertexOut //VertexOut + dev_vertexOut, //VertexOut + + texture + }); totalNumPrimitives += numPrimitives; @@ -640,8 +687,30 @@ void _vertexTransformAndAssembly( // Finally transform x and y to viewport space // TODO: Apply vertex assembly here - // Assemble all attribute arraies into the primitive array - + // Assemble all attribute arrays into the primitive array + + VertexAttributePosition& currPos = primitive.dev_position[vid]; + VertexOut& outVertex = primitive.dev_verticesOut[vid]; + + glm::vec4 pos = MVP * glm::vec4(currPos, 1.0f); + glm::vec4 ndc = pos / pos.w; + ndc.x = (1.0f - ndc.x) * width / 2.0f; + ndc.y = (1.0f - ndc.y) * width / 2.0f; + + outVertex.pos = ndc; + outVertex.eyeNor = glm::normalize(MV_normal * primitive.dev_normal[vid]); + outVertex.eyePos = glm::vec3(MV * glm::vec4(currPos, 1.0f)); + +#if TEXTURE + if (primitive.dev_diffuseTex != NULL) { + outVertex.texcoord0 = primitive.dev_texcoord0[vid]; + } + outVertex.dev_diffuseTex = primitive.dev_diffuseTex; + outVertex.texWidth = primitive.diffuseTexWidth; + outVertex.texHeight = primitive.diffuseTexHeight; + outVertex.tex = primitive.tex; +#endif + } } @@ -660,20 +729,159 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ // TODO: uncomment the following code for a start // This is primitive assembly for triangles - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} - + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; +#if TEXTURE + dev_primitives[pid + curPrimitiveBeginId].dev_diffuseTex + = primitive.dev_diffuseTex; + dev_primitives[pid + curPrimitiveBeginId].texWidth + = primitive.diffuseTexWidth; + dev_primitives[pid + curPrimitiveBeginId].texHeight + = primitive.diffuseTexHeight; +#elif !POINTS && !LINE + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType].col + = glm::vec3(1.0f, 1.0f, 1.0f); // default is white +#endif + } // TODO: other primitive types (point, line) } } +__device__ +void kernRasterizeExtraPrims(int width, int height, glm::vec3& color, glm::vec3& pos1, glm::vec3 pos2, Fragment* fragmentbuffer) { + glm::vec3 dist = glm::abs(pos1 - pos2); + if (dist.x > 0 && dist.y > 0) { + float length = glm::max(dist.x, dist.y); +#if POINTS + for (float i = 0; i <= length; i += POINTCLOUD) { +#else + for (float i = 0; i <= length; i++) { +#endif + glm::vec3 point = ((1.0f - (i / length)) * pos1 + (i / length) * pos2); + int index = (int)(point.x) + (int)(point.y) * width; + fragmentbuffer[index].color = color; + } + } +} +__global__ +void kernRasterize(int n, Primitive* primitives, Fragment* fragmentbuffer, int* depthsbuffer, int width, int height) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + + Primitive& prim = primitives[index]; + + glm::vec3 tri[3] = { + glm::vec3(prim.v[0].pos), + glm::vec3(prim.v[1].pos), + glm::vec3(prim.v[2].pos) + }; + + AABB aabb = getAABBForTriangle(tri); + if (aabb.min.x < 0 || aabb.max.x > width - 1 || aabb.min.y < 0 || aabb.max.y > height - 1) { + return; + } + + for (int x = aabb.min.x; x <= aabb.max.x; x++) { + for (int y = aabb.min.y; y <= aabb.max.y; y++) { + int idx = x + y * width; + + glm::vec3 bcc = calculateBarycentricCoordinate(tri, glm::vec2(x, y)); + if (isBarycentricCoordInBounds(bcc)) { + float z = getZAtCoordinate(bcc, tri); + int depth = -INT_MAX * z; + atomicMin(&depthsbuffer[idx], depth); + + if (depthsbuffer[idx] == depth) { + Fragment& frag = fragmentbuffer[idx]; + + frag.color = bcc.x * prim.v[0].col + bcc.y * prim.v[1].col + bcc.z * prim.v[2].col; + frag.eyePos = bcc.x * prim.v[0].eyePos + bcc.y * prim.v[1].eyePos + bcc.z * prim.v[2].eyePos; + frag.eyeNor = bcc.x * prim.v[0].eyeNor + bcc.y * prim.v[1].eyeNor + bcc.z * prim.v[2].eyeNor; +#if TEXTURE + frag.dev_diffuseTex = prim.v[0].dev_diffuseTex; + frag.texWidth = prim.v[0].texWidth; + frag.texHeight = prim.v[0].texHeight; + frag.tex = prim.v[0].tex; +#if PERSPECTIVE + glm::vec3 u_a = glm::vec3(bcc.x / prim.v[0].eyePos.z, bcc.y / prim.v[1].eyePos.z, bcc.z / prim.v[2].eyePos.z); + frag.texcoord0 = ((u_a.x * prim.v[0].texcoord0) + (u_a.y * prim.v[1].texcoord0) + (u_a.z * prim.v[2].texcoord0)) + / (u_a.x + u_a.y + u_a.z); +#else + frag.texcoord0 = bcc.x * prim.v[0].texcoord0 + bcc.y * prim.v[1].texcoord0 + bcc.z * prim.v[2].texcoord0; +#endif +#endif + } + } + } + } + +#if POINTS || LINE + if (prim.primitiveType == TINYGLTF_MODE_POINTS) { + glm::vec3 color = prim.v[0].col; + color += prim.v[0].eyeNor; + color = glm::normalize(color); + + kernRasterizeExtraPrims(width, height, color, glm::vec3(prim.v[0].pos), glm::vec3(prim.v[1].pos), fragmentbuffer); + kernRasterizeExtraPrims(width, height, color, glm::vec3(prim.v[2].pos), glm::vec3(prim.v[0].pos), fragmentbuffer); + } +#endif +} + +__device__ __host__ +glm::vec3 getTexture(int x, int y, int width, int height, TextureData* texture, int tex) { + int index = x + y * width; + int id = tex * index; + + if (x >= 0 && x < width && y >= 0 && y < height) { + return (glm::vec3(texture[id], texture[id + 1], texture[id + 2]) / 255.0f); + } + + return glm::vec3(0.0f); +} + +__global__ +void kernTextureMap(int width, int height, Fragment* fragmentbuffer) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * width); + + if (x >= 0 && x < width && y >= 0 && y < height) { + Fragment& frag = fragmentbuffer[index]; + if (frag.dev_diffuseTex != NULL) { + float texX = frag.texcoord0.x * (frag.texWidth - 1.0f) + 0.5f; + float texY = frag.texcoord0.y * (frag.texHeight - 1.0f) + 0.5f; +#if BILINEAR + // Reference: https://en.wikipedia.org/wiki/Bilinear_interpolation + float u = texX * 1.0f - 0.5f; + float v = texY * 1.0f - 0.5f; + float x = glm::floor(u); + float y = glm::floor(v); + float u_ratio = u - x; + float v_ratio = v - y; + float u_opposite = 1.0f - u_ratio; + float v_opposite = 1.0f - v_ratio; + + glm::vec3 tex00 = getTexture(x, y, frag.texWidth, frag.texHeight, frag.dev_diffuseTex, frag.tex); + glm::vec3 tex10 = getTexture(x + 1, y, frag.texWidth, frag.texHeight, frag.dev_diffuseTex, frag.tex); + glm::vec3 tex01 = getTexture(x, y + 1, frag.texWidth, frag.texHeight, frag.dev_diffuseTex, frag.tex); + glm::vec3 tex11 = getTexture(x + 1, y + 1, frag.texWidth, frag.texHeight, frag.dev_diffuseTex, frag.tex); + + glm::vec3 result = (tex00 * u_opposite + tex10 * u_ratio) * v_opposite + (tex01 * u_opposite + tex11 * u_ratio) * v_ratio; + frag.color = result; +#else + frag.color = getTexture(texX, texY, frag.texWidth, frag.texHeight, frag.dev_diffuseTex, frag.tex); +#endif + } + } +} /** * Perform rasterization. @@ -683,6 +891,8 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g dim3 blockSize2d(sideLength2d, sideLength2d); dim3 blockCount2d((width - 1) / blockSize2d.x + 1, (height - 1) / blockSize2d.y + 1); + dim3 numBlocksPrimitives((totalNumPrimitives + blockSize - 1) / blockSize); + //std::cout << totalNumPrimitives << std::endl; // Execute your rasterization pipeline here // (See README for rasterization pipeline outline.) @@ -695,6 +905,7 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g auto it = mesh2PrimitivesMap.begin(); auto itEnd = mesh2PrimitivesMap.end(); + //timer().startGpuTimer(); for (; it != itEnd; ++it) { auto p = (it->second).begin(); // each primitive auto pEnd = (it->second).end(); @@ -718,16 +929,36 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g checkCUDAError("Vertex Processing and Primitive Assembly"); } - + //timer().endGpuTimer(); + //std::cout << timer().getGpuElapsedTimeForPreviousOperation() << std::endl; + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + //timer().startGpuTimer(); initDepth << > >(width, height, dev_depth); - - // TODO: rasterize - + //timer().endGpuTimer(); + //std::cout << timer().getGpuElapsedTimeForPreviousOperation() << std::endl; + // TODO: rasterize + //timer().startGpuTimer(); + kernRasterize << > >(totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, width, height); + //timer().endGpuTimer(); + //std::cout << timer().getGpuElapsedTimeForPreviousOperation() << std::endl; + checkCUDAError("kernRasterize failed"); + +#if TEXTURE + // UV Texture Map + //timer().startGpuTimer(); + kernTextureMap << > >(width, height, dev_fragmentBuffer); + //timer().endGpuTimer(); + //std::cout << timer().getGpuElapsedTimeForPreviousOperation() << std::endl; + checkCUDAError("kernTextureMap failed"); +#endif // Copy depthbuffer colors into framebuffer + //timer().startGpuTimer(); render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); + //timer().endGpuTimer(); + //std::cout << timer().getGpuElapsedTimeForPreviousOperation() << std::endl; checkCUDAError("fragment shader"); // Copy framebuffer into OpenGL buffer for OpenGL previewing sendImageToPBO<<>>(pbo, width, height, dev_framebuffer);