diff --git a/README.md b/README.md
index cad1abd..8eb5ff7 100644
--- a/README.md
+++ b/README.md
@@ -5,16 +5,139 @@ CUDA Rasterizer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Rudraksha Shah
+* Tested on: Windows 10, i7-7700HQ @ 2.80GHz 16GB, GTX 1050 4096MB (Personal Computer)
 
-### (TODO: Your README)
+![Truck Turn Table](./renders/duck_TT.gif)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+## Points and Lines Rasterization Renders
 
+| Points | Lines |
+| --- | ---  |
+| ![Points](./renders/duck_points.png) | ![Lines](./renders/truck_lines.png) | 
+
+## Texturing Renders
+
+Texturing | Bi-Linear Filtering
+--- | ---
+![T](./renders/checkerboard_T.png) | ![TBF](./renders/checkerboard_TBF.png)
+
+| Texturinng | Texture - Perpective Correct Interpolation |
+| --- | --- |
+| ![T](./renders/truck_t.png) | ![TBF](./renders/truck_PCI.png) |
+
+
+
+Overview
+===========
+
+In this project I have built a GPU Rasterizer using CUDA and C++.
+
+The main features of the Rasterizer are as follows:
+
+* Rasterization Methods:
+
+    * Naive Barycentric Rasterization
+    * Scan-Line Rasterization
+
+* Rasterization Modes:
+
+    * Points
+    * Lines
+    * Triangles    
+
+* Texturing:
+
+    * Perspective Correct Interpolation
+    * Bi-Linear Filtering
+
+* Shading:
+
+    * Lambert
+
+* Anti-Aliasing:
+
+    * FXAA
+    * SSAA
+
+* Optimizations:
+
+    * Back Face Culling
+
+Rasterization is a way of rendering 3D graphics in which we project the geometry of the scene onto the screen.
+
+Implementing the Rasterization on the CPU a year ago in CIS 460 was tricky but implementing the entire pipeline on the GPU was a challange in itself!
+
+The basic process of Rasterization is to take 3d geometry in object space then take it from object space -> world space -> camera space -> un-homogenized projected space -> homogenized NDC space -> pixel space. Once the object is in 2D pixel space it is rendered onto the screen. For rendering I have implemented the following methods:
+
+1. Barycentric Rasterization: In this method we iterate through each pizel in the bounding box surrounding the given triangle we are trying to render and check for each pixel if it lies inside the triangle or not using barycetric weights.
+
+2. Scan-Line Rasterization: In this method we again start from the bounding box surrounding the triangle we are renderig but for each pixel row we find valid intersections with the triangle edges. Now we fill in those pixels from one point of intersection to the other. This way we do not have to spend time checking unnecessary pixels around the triangle.
+
+Performance Analysis
+=======================
+
+* All performance analysis is done using the Cesium Milk Truck gltf model and the Barycentric Rasterization process of rendering with solid color per vertex and lambert shading.
+
+## No Anti Aliasing vs FXAA vs SSAA
+
+
+ No Anti Aliasing | FXAA | SSAA 
+ --- | --- | ---
+ ![NO AA](./renders/checkerboard_T1.png) | ![FXAA](./renders/checkerboard_FXAA.png) | ![SSAA](./renders/checkerboard_SSAA.png)
+
+![Performance chart AA](./renders/chart_AA.png)
+
+* FXAA: It stands for Fast Approximate Anti Aliasing. It is a post process anti aliasing method that works by finding the edges in the final rendered image and then smoothening them.
+
+    It is a hack way of doing anti aliasing but it is one the fastest and with little to no overhead cost of implementation. The only downside is that it does not understand goemtry so it might end up anti aliasing some parts of the image that do not need to be smoothed. But the performance vs final result tradeoff is very good and comparable to a 4x SSAA. 
+
+* SSAA: It stands for Super Sampling anti Aliasing. As its name suggest we do just that. First we go through the entire rasterization pipeline with the resolution set to a Factor of SSAA which are 2x, 4x, 8x and so on. Then finaly when we are plotting the pixels to the screen we take the scaled up texture and sample it to fill the final image with the original reslution.
+    
+    This was we create a sharp image from the begining and down sampling in the end gains us the extra level of detail needed to do anti aliasing.
+
+* As expected and we can confirm this from the chart that as compared to the standard SSAA anti aliasing the FXAA anti aliasing method is considerably faster and only costs us about 5-10% performance drop. While the SSAA anti aliasing method costs us a whopping 80-90% performance drop. This to me is wired as I expected SSAA to have a performace hit on the frame rate by 50% but not by this much.
+    
+    The original size of the image was 800 x 800 and the SSAA size was 3200 x 3200 4x times. One explanation I have for the drastic drop in the frame rate might be the overhead cost of having many more threads doing nothing. As the image of the truck as is was very small and thus only a small portion of the image contained the object. But spawnnig so many threads and their overhead of minimal computation along with the sparceness of the image content might have added up to reduce the frame rate.
+
+## No Back Face Culling vs Back Face Culling
+
+No Back Face Culling | With Back Face Culling Turned On
+--- | ---
+![No BFC](./renders/duck_NBFC.png) | ![W BFC](./renders/duck_BFC.png)
+
+![Performance chart BFC](./renders/chart_BFC.png)
+
+* In this case I expected there to be a minimum of 5% of performance improvement but the frame rate did not change which leads me to believe that just returning the threads that have been deemed to be checking a culled triangle is not enough to gain a performance gain.
+
+    On the conterary the mesh that we were working with needed be sufficiently large and complex to warrent a compaction. As from past experience `thrust` compaction has a significant overhead that is not at all feasible for such small workloads.    
+
+## Rasterization Methods
+
+![Performance chart B vs SL Rast](./renders/chart_RM.png)
+
+* In this case I am a bit surprised and also happy that my expectataion had been met. While implementing the rasterizer on the CPU I had learned that calculating and checking the barycentric coordinates for all the points inside the bounding box is not a feasible or efficient way of doing rasterization of complex geometries on the CPU.
+
+    Translating that understanding I thought that it would be a performance optimization to have the pipeline follow the scan line implementation even on the GPU. And lo and behold it is!
+
+    My rational behind this optimization is that it is far more efficient to do two calculations for intersetions per pixel row inside the bounding box of the triangle and filling in the pixels between those intersection points. Than to check each and every pixel against the traingle essentially doing NxN computations for each pixel in the bounding box of dimension NxN.
+
+Bloopers:
+==========
+
+* In this image it looks like I am applying shadows to edges giving them a depth... which would be a cool thing to do! but unfortunatley its just me incorrectly assuming the texture coordinates while detecting the edge in FXAA.
+
+![Bad FXAA](./renders/FXAA_BAD.PNG)
+
+![Bad FXAA](./renders/FXAA_BAD1.PNG)
 
 ### Credits
 
+* [Bresenham's Algo](https://www.cs.helsinki.fi/group/goa/mallinnus/lines/bresenh.html)
+* [FXAA information](https://blog.codinghorror.com/fast-approximate-anti-aliasing-fxaa/)
+* [FXAA original paper](http://developer.download.nvidia.com/assets/gamedev/files/sdk/11/FXAA_WhitePaper.pdf)
+* [FXAA tutorial](http://blog.simonrodriguez.fr/articles/30-07-2016_implementing_fxaa.html)
 * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo)
 * [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md)
+
+Thank you for reading through, I would love to hear your feedback on any improvements or suggestions!
\ No newline at end of file
diff --git a/renders/FXAA_BAD.PNG b/renders/FXAA_BAD.PNG
new file mode 100644
index 0000000..f3083ee
Binary files /dev/null and b/renders/FXAA_BAD.PNG differ
diff --git a/renders/FXAA_BAD1.PNG b/renders/FXAA_BAD1.PNG
new file mode 100644
index 0000000..abf8add
Binary files /dev/null and b/renders/FXAA_BAD1.PNG differ
diff --git a/renders/chart_AA.png b/renders/chart_AA.png
new file mode 100644
index 0000000..45a7dcb
Binary files /dev/null and b/renders/chart_AA.png differ
diff --git a/renders/chart_BFC.png b/renders/chart_BFC.png
new file mode 100644
index 0000000..5ea757f
Binary files /dev/null and b/renders/chart_BFC.png differ
diff --git a/renders/chart_RM.png b/renders/chart_RM.png
new file mode 100644
index 0000000..701648f
Binary files /dev/null and b/renders/chart_RM.png differ
diff --git a/renders/checkerboard_FXAA.png b/renders/checkerboard_FXAA.png
new file mode 100644
index 0000000..370bec0
Binary files /dev/null and b/renders/checkerboard_FXAA.png differ
diff --git a/renders/checkerboard_SSAA.png b/renders/checkerboard_SSAA.png
new file mode 100644
index 0000000..5b47bec
Binary files /dev/null and b/renders/checkerboard_SSAA.png differ
diff --git a/renders/checkerboard_T.png b/renders/checkerboard_T.png
new file mode 100644
index 0000000..7d7bdec
Binary files /dev/null and b/renders/checkerboard_T.png differ
diff --git a/renders/checkerboard_T1.png b/renders/checkerboard_T1.png
new file mode 100644
index 0000000..0efa10b
Binary files /dev/null and b/renders/checkerboard_T1.png differ
diff --git a/renders/checkerboard_TBF.png b/renders/checkerboard_TBF.png
new file mode 100644
index 0000000..799113d
Binary files /dev/null and b/renders/checkerboard_TBF.png differ
diff --git a/renders/duck_BFC.png b/renders/duck_BFC.png
new file mode 100644
index 0000000..0095dfd
Binary files /dev/null and b/renders/duck_BFC.png differ
diff --git a/renders/duck_NBFC.png b/renders/duck_NBFC.png
new file mode 100644
index 0000000..e298f20
Binary files /dev/null and b/renders/duck_NBFC.png differ
diff --git a/renders/duck_TT.gif b/renders/duck_TT.gif
new file mode 100644
index 0000000..e2a0df9
Binary files /dev/null and b/renders/duck_TT.gif differ
diff --git a/renders/duck_points.png b/renders/duck_points.png
new file mode 100644
index 0000000..801788f
Binary files /dev/null and b/renders/duck_points.png differ
diff --git a/renders/truck_PCI.png b/renders/truck_PCI.png
new file mode 100644
index 0000000..cf0efb4
Binary files /dev/null and b/renders/truck_PCI.png differ
diff --git a/renders/truck_TT.gif b/renders/truck_TT.gif
new file mode 100644
index 0000000..03910f6
Binary files /dev/null and b/renders/truck_TT.gif differ
diff --git a/renders/truck_lines.png b/renders/truck_lines.png
new file mode 100644
index 0000000..d5eeaf6
Binary files /dev/null and b/renders/truck_lines.png differ
diff --git a/renders/truck_points.png b/renders/truck_points.png
new file mode 100644
index 0000000..e25553d
Binary files /dev/null and b/renders/truck_points.png differ
diff --git a/renders/truck_t.png b/renders/truck_t.png
new file mode 100644
index 0000000..6f4afb8
Binary files /dev/null and b/renders/truck_t.png differ
diff --git a/src/rasterize.cu b/src/rasterize.cu
index 1262a09..d7fbb0c 100644
--- a/src/rasterize.cu
+++ b/src/rasterize.cu
@@ -17,6 +17,42 @@
 #include "rasterize.h"
 #include <glm/gtc/quaternion.hpp>
 #include <glm/gtc/matrix_transform.hpp>
+#include <chrono>
+
+// Render Modes (One of them needs to be toggled on to render something on the screen)
+#define POINTS 0
+#define LINES 0
+#define TRIANGLES 1
+
+// Rasterization Methods (Renders Solid Triangles)
+#define NAIVE_EDGEINTERSECTION_SCANLINE_TOGGLE 0	// 0 - Naive scanline & 1 - Edge intersection scanline
+
+// Coloring (Either of the two should be on to have an output on the screen)
+#define SOLIDCOLOR 0
+#define TEXTURING 1
+#define PERSPECTIVECORRECTTEXTURING 1
+#define BILNEARFILTERING 1
+// This is the color used for solid coloring
+#define COLOR glm::vec3(0.98f, 0.98f, 0.98f)
+
+//  Shading
+#define LAMBERT 1
+
+// Back Face Culling
+#define BACKFACECULLING 1
+
+// Anti-Aliasing
+#define FXAA 0	// Fast Approximation AA (Post Processing)
+#define SSAA 0	// SSAA toggle
+#define SSAAMULTIPLYER 4 // 1x is no SSAA. Increase this value to increase the resolution and SSAA effect
+
+// Performance Analysis
+#define PERFORMANCE_ANALYSIS_PER_RASTERIZATION_CALL 0
+#define PA_VERTEX_PROCESS_PRIMITIVE_ASSEMBLY 0
+#define PA_RASTERIZATION 0
+#define PA_FRAGMENT_SHADING 0
+#define PA_FXAA 0
+#define PA_COPY_FRAME_BUFFER 0
 
 namespace {
 
@@ -43,16 +79,17 @@ namespace {
 
 		 glm::vec3 eyePos;	// eye space position used for shading
 		 glm::vec3 eyeNor;	// eye space normal used for shading, cuz normal will go wrong after perspective transformation
-		// glm::vec3 col;
+		 glm::vec3 color;
 		 glm::vec2 texcoord0;
 		 TextureData* dev_diffuseTex = NULL;
-		// int texWidth, texHeight;
+		 int texWidth, texHeight;
 		// ...
 	};
 
 	struct Primitive {
 		PrimitiveType primitiveType = Triangle;	// C++ 11 init
 		VertexOut v[3];
+		bool culled;	// Used for triangle culling
 	};
 
 	struct Fragment {
@@ -62,10 +99,12 @@ namespace {
 		// The attributes listed below might be useful, 
 		// but always feel free to modify on your own
 
-		// glm::vec3 eyePos;	// eye space position used for shading
-		// glm::vec3 eyeNor;
-		// VertexAttributeTexcoord texcoord0;
-		// TextureData* dev_diffuseTex;
+		glm::vec3 eyePos;	// eye space position used for shading
+		glm::vec3 eyeNor;
+		VertexAttributeTexcoord texcoord0;
+		TextureData* dev_diffuseTex;
+		int textureWidth;
+		int textureHeight;
 		// ...
 	};
 
@@ -95,7 +134,6 @@ namespace {
 
 		// TODO: add more attributes when needed
 	};
-
 }
 
 static std::map<std::string, std::vector<PrimitiveDevBufPointers>> mesh2PrimitivesMap;
@@ -111,6 +149,11 @@ static glm::vec3 *dev_framebuffer = NULL;
 
 static int * dev_depth = NULL;	// you might need this buffer when doing depth test
 
+// Newly created
+static int * dev_mutex = NULL;	// used for depth test without conflicts
+static glm::vec3 *dev_temp_framebuffer = NULL;
+float* dev_quality = NULL;
+
 /**
  * Kernel that writes the image to the OpenGL PBO directly.
  */
@@ -118,19 +161,89 @@ __global__
 void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
-    int index = x + (y * w);
 
-    if (x < w && y < h) {
+#if SSAA
+	
+	int downWidth = w / SSAAMULTIPLYER;
+	int downHeight = h / SSAAMULTIPLYER;
+    int index = x + (y * downWidth);
+
+    if (x < downWidth && y < downHeight) {
         glm::vec3 color;
-        color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0;
-        color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0;
-        color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0;
-        // Each thread writes one pixel location in the texture (textel)
+
+		// Down sampling the image to the original size
+		for (int i = 0; i < SSAAMULTIPLYER; i++) {
+			for (int j = 0; j < SSAAMULTIPLYER; j++) {
+				int idx = (i + (x * SSAAMULTIPLYER)) + ((j + (y * SSAAMULTIPLYER)) * w);
+				color.x += glm::clamp(image[idx].x, 0.0f, 1.0f) * 255.0;
+				color.y += glm::clamp(image[idx].y, 0.0f, 1.0f) * 255.0;
+				color.z += glm::clamp(image[idx].z, 0.0f, 1.0f) * 255.0;
+			}
+		}
+		color /= (SSAAMULTIPLYER*SSAAMULTIPLYER);
+
+		// Each thread writes one pixel location in the texture (textel)
         pbo[index].w = 0;
         pbo[index].x = color.x;
         pbo[index].y = color.y;
         pbo[index].z = color.z;
     }
+
+#else
+	
+	int index = x + (y * w);
+
+	if (x < w && y < h) {
+		glm::vec3 color;
+		
+		color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0;
+		color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0;
+		color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0;
+		
+		// Each thread writes one pixel location in the texture (textel)
+		pbo[index].w = 0;
+		pbo[index].x = color.x;
+		pbo[index].y = color.y;
+		pbo[index].z = color.z;
+	}
+
+#endif
+}
+
+/**
+* Bilnear filtering
+*/
+__device__
+glm::vec3 bilinearFiltering(Fragment& tempfragment, float uvXf, float uvYf) {
+	int uvX0 = uvXf;
+	int uvY0 = uvYf;
+	int uvX1 = glm::clamp(uvX0 + 1, 0, tempfragment.textureWidth - 1);
+	int uvY1 = glm::clamp(uvY0 + 1, 0, tempfragment.textureHeight - 1);
+	glm::vec3 finalColor;
+	int uvIndex;
+	
+	// Get the color of the four surrounding pixels
+	uvIndex = (uvX0 + (uvY0 * tempfragment.textureWidth)) * 3;
+	glm::vec3 colorX0Y0 = (glm::vec3(tempfragment.dev_diffuseTex[uvIndex], tempfragment.dev_diffuseTex[uvIndex + 1], tempfragment.dev_diffuseTex[uvIndex + 2])) / 255.0f;
+	
+	uvIndex = (uvX1 + (uvY0 * tempfragment.textureWidth)) * 3;
+	glm::vec3 colorX1Y0 = (glm::vec3(tempfragment.dev_diffuseTex[uvIndex], tempfragment.dev_diffuseTex[uvIndex + 1], tempfragment.dev_diffuseTex[uvIndex + 2])) / 255.0f;
+	
+	uvIndex = (uvX0 + (uvY1 * tempfragment.textureWidth)) * 3;
+	glm::vec3 colorX0Y1 = (glm::vec3(tempfragment.dev_diffuseTex[uvIndex], tempfragment.dev_diffuseTex[uvIndex + 1], tempfragment.dev_diffuseTex[uvIndex + 2])) / 255.0f;
+	
+	uvIndex = (uvX1 + (uvY1 * tempfragment.textureWidth)) * 3;
+	glm::vec3 colorX1Y1 = (glm::vec3(tempfragment.dev_diffuseTex[uvIndex], tempfragment.dev_diffuseTex[uvIndex + 1], tempfragment.dev_diffuseTex[uvIndex + 2])) / 255.0f;
+
+	// Bilinearly in terpolate between the colors based on the fractional part in the uvs
+	float weightY = uvYf - uvY0;
+	float weightX = uvXf - uvX0;
+	glm::vec3 interpColorY0 = glm::mix(colorX0Y0, colorX1Y0, weightX);
+	glm::vec3 interpColorY1 = glm::mix(colorX0Y1, colorX1Y1, weightX);
+	finalColor = glm::mix(interpColorY0, interpColorY1, weightY);
+
+	// Return the final color
+	return finalColor;
 }
 
 /** 
@@ -143,10 +256,67 @@ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) {
     int index = x + (y * w);
 
     if (x < w && y < h) {
-        framebuffer[index] = fragmentBuffer[index].color;
+		glm::vec3 finalColor;
+		Fragment& thisFragment = fragmentBuffer[index];
 
-		// TODO: add your fragment shader code here
+#if (POINTS || LINES)
+
+		finalColor = thisFragment.color;
+		finalColor = glm::clamp(finalColor, 0.0f, 1.0f);
+		framebuffer[index] = finalColor;
 
+#elif TEXTURING
+		
+		if (thisFragment.dev_diffuseTex != NULL) {
+
+#if BILNEARFILTERING
+
+			float uvXf = thisFragment.texcoord0.x * thisFragment.textureWidth;
+			float uvYf = thisFragment.texcoord0.y * thisFragment.textureHeight;
+
+			finalColor = bilinearFiltering(thisFragment, uvXf, uvYf);
+
+#else
+
+			// Get color form the texture and store it.
+			int uvX = thisFragment.texcoord0.x * thisFragment.textureWidth;
+			int uvY = thisFragment.texcoord0.y * thisFragment.textureHeight;
+
+			int uvIndex = (uvX + (uvY * thisFragment.textureWidth)) * 3;
+			finalColor = glm::vec3(thisFragment.dev_diffuseTex[uvIndex], thisFragment.dev_diffuseTex[uvIndex + 1], thisFragment.dev_diffuseTex[uvIndex + 2]);
+			finalColor /= 255.0f;
+
+#endif
+		
+		}
+
+#if LAMBERT
+
+		// Diffuse/Lambert shading
+		glm::vec3 LightDirection = glm::normalize(thisFragment.eyePos - glm::vec3(100.0f));
+		finalColor *= (glm::dot(-LightDirection, thisFragment.eyeNor));
+
+#endif
+
+		finalColor = glm::clamp(finalColor, 0.0f, 1.0f);
+		framebuffer[index] = finalColor;
+
+#elif SOLIDCOLOR
+
+#if LAMBERT
+
+		// Lambert Shading
+		finalColor = thisFragment.color;
+		glm::vec3 LightDirection = glm::normalize(thisFragment.eyePos - glm::vec3(100.0f));
+		finalColor *= (glm::dot(-LightDirection, thisFragment.eyeNor));
+
+#endif
+
+		finalColor = glm::clamp(finalColor , 0.1f, 1.0f);
+		framebuffer[index] = finalColor;
+
+#endif
+		// TODO: add your fragment shader code here
     }
 }
 
@@ -154,8 +324,18 @@ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) {
  * Called once at the beginning of the program to allocate memory.
  */
 void rasterizeInit(int w, int h) {
-    width = w;
-    height = h;
+#if SSAA
+
+	width = w * SSAAMULTIPLYER;
+    height = h * SSAAMULTIPLYER;
+
+#else
+
+	width = w;
+	height = h;
+
+#endif
+
 	cudaFree(dev_fragmentBuffer);
 	cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment));
 	cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment));
@@ -166,6 +346,16 @@ void rasterizeInit(int w, int h) {
 	cudaFree(dev_depth);
 	cudaMalloc(&dev_depth, width * height * sizeof(int));
 
+	// Newly Created
+	cudaFree(dev_mutex);
+	cudaMalloc(&dev_mutex, width * height * sizeof(int));
+
+	cudaFree(dev_temp_framebuffer);
+	cudaMalloc(&dev_temp_framebuffer, width * height * sizeof(glm::vec3));
+	cudaMemset(dev_temp_framebuffer, 0, width * height * sizeof(glm::vec3));
+
+	cudaFree(dev_quality);
+	cudaMalloc(&dev_quality, 12 * sizeof(float));
 	checkCUDAError("rasterizeInit");
 }
 
@@ -182,6 +372,17 @@ void initDepth(int w, int h, int * depth)
 	}
 }
 
+__global__
+void initMutex(int w, int h, int * mutex) {
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+	if (x < w && y < h)
+	{
+		int index = x + (y * w);
+		mutex[index] = 0;
+	}
+}
 
 /**
 * kern function with support for stride to sometimes replace cudaMemcpy
@@ -639,6 +840,34 @@ void _vertexTransformAndAssembly(
 		// Then divide the pos by its w element to transform into NDC space
 		// Finally transform x and y to viewport space
 
+		VertexOut* tempPtrToPrimitiveOutVertex = &primitive.dev_verticesOut[vid];
+
+		glm::vec4 objSpacePos = glm::vec4(primitive.dev_position[vid], 1.0f);
+		glm::vec4 tempPos = objSpacePos;
+
+		// Object space to un-homogenized coordinates
+		tempPos = MVP * tempPos;
+		
+		// re-homogenizing the coordinates
+		tempPos /= tempPos[3];
+
+		// NDC -> Pixel space
+		tempPos[0] = (1.0f - tempPos[0]) * width / 2.0f;
+		tempPos[1] = (1.0f - tempPos[1]) * height / 2.0f;
+		tempPos[2] = -tempPos[2];
+
+		// Fill in the out variables
+		tempPtrToPrimitiveOutVertex->pos = tempPos;
+		tempPtrToPrimitiveOutVertex->eyePos = glm::vec3(MV * objSpacePos);
+		tempPtrToPrimitiveOutVertex->eyeNor = glm::normalize(MV_normal * primitive.dev_normal[vid]);
+
+#if TEXTURING
+		tempPtrToPrimitiveOutVertex->texcoord0 = primitive.dev_texcoord0[vid];	// These are UV's
+		tempPtrToPrimitiveOutVertex->dev_diffuseTex = primitive.dev_diffuseTex;
+		tempPtrToPrimitiveOutVertex->texWidth = primitive.diffuseTexWidth;
+		tempPtrToPrimitiveOutVertex->texHeight = primitive.diffuseTexHeight;
+#endif
+
 		// TODO: Apply vertex assembly here
 		// Assemble all attribute arraies into the primitive array
 		
@@ -646,7 +875,6 @@ void _vertexTransformAndAssembly(
 }
 
 
-
 static int curPrimitiveBeginId = 0;
 
 __global__ 
@@ -660,25 +888,580 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_
 		// TODO: uncomment the following code for a start
 		// This is primitive assembly for triangles
 
-		//int pid;	// id for cur primitives vector
-		//if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) {
-		//	pid = iid / (int)primitive.primitiveType;
-		//	dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType]
-		//		= primitive.dev_verticesOut[primitive.dev_indices[iid]];
-		//}
+		int pid;	// id for cur primitives vector
+		if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) {
+			pid = iid / (int)primitive.primitiveType;
+			dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] = primitive.dev_verticesOut[primitive.dev_indices[iid]];
+		
+#if BACKFACECULLING
+			glm::vec3 cameraDirection(0.0f, 0.0f, 1.0f);
+			if (glm::dot(cameraDirection, primitive.dev_verticesOut[primitive.dev_indices[iid]].eyeNor) <= 0.0f) {
+				dev_primitives[pid + curPrimitiveBeginId].culled = true;
+			}
+			else {
+				dev_primitives[pid + curPrimitiveBeginId].culled = false;
+			}
+#endif
 
+		}
 
 		// TODO: other primitive types (point, line)
 	}
 	
 }
 
+/**
+*	Draws a line given a line segment
+*/
+__device__
+void drawLine(LineSegment LS, Fragment* fragmentBuffer, int width, int height) {
+		// Bresenham's Algo.
+		// Wiki Reference: https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm
+
+		int x0 = LS.vertex1.x;
+		int y0 = LS.vertex1.y;
+		int x1 = LS.vertex2.x;
+		int y1 = LS.vertex2.y;
+
+		bool steep = (glm::abs(y1 - y0) > glm::abs(x1 - x0));
+		int temp;
+		if (steep) {
+			temp = x0;
+			x0 = y0;
+			y0 = temp;
+
+			temp = x1;
+			x1 = y1;
+			y1 = temp;
+		}
+
+		if (x0 > x1) {
+			temp = x0;
+			x0 = x1;
+			x1 = temp;
+
+			temp = y0;
+			y0 = y1;
+			y1 = temp;
+		}
+	
+		float dx = x1 - x0;
+		float dy = glm::abs(y1 - y0);
+
+		float error = dx / 2.0f;
+		int ystep = (y0 < y1) ? 1 : -1;
+		int y = (int)y0;
+
+		int maxX = (int)x1;
+
+		for (int x = (int)x0; x<maxX; x++)
+		{
+			if (steep)
+			{
+				if ((x <= width && x >= 0) && (y <= height && y >= 0)) {
+					int pixelIndexP = y + (x * width);
+					fragmentBuffer[pixelIndexP].color = COLOR;
+				}
+			}
+			else
+			{
+				if ((x <= width && x >= 0) && (y <= height && y >= 0)) {
+					int pixelIndexP = x + (y * width);
+					fragmentBuffer[pixelIndexP].color = COLOR;
+				}
+			}
+
+			error -= dy;
+			if (error < 0)
+			{
+				y += ystep;
+				error += dx;
+			}
+		}
+}
+
+__global__
+void _rasterizeGeometry(int totalNumPrimitives, Primitive* dev_primitives, Fragment* dev_fragmentBuffer, int* dev_depth, int* dev_mutex , int width, int height) {
+	int index = threadIdx.x + (blockIdx.x * blockDim.x);
+	if (index >= totalNumPrimitives) {
+		return;
+	}
+
+	Primitive& tempPrimitive = dev_primitives[index];
+
+#if BACKFACECULLING
+	if (tempPrimitive.culled == true) {
+		return;
+	}
+#endif
+
+	// Create a trinagle
+	glm::vec3 thisTriangle[3] = { glm::vec3(tempPrimitive.v[0].pos),
+								  glm::vec3(tempPrimitive.v[1].pos),
+								  glm::vec3(tempPrimitive.v[2].pos) };
+
+	// Create a axis aligned bounding box
+	AABB aabb = getAABBForTriangle(thisTriangle);
+
+#if POINTS
+
+	for (int i = 0; i < 3; i++) {
+		int x = (int)thisTriangle[i].x;
+		int y = (int)thisTriangle[i].y;
+		if ((x <= width && x >= 0) && (y <= height && y >= 0)) {
+			int pixelIndexP = x + (y * width);
+			dev_fragmentBuffer[pixelIndexP].color = COLOR;
+		}
+	}
+
+#elif LINES
+		
+	// Create LineSegments from vertices
+	LineSegment LS1 = createLineSegment(thisTriangle[0], thisTriangle[1]);
+	LineSegment LS2 = createLineSegment(thisTriangle[0], thisTriangle[2]);
+	LineSegment LS3 = createLineSegment(thisTriangle[1], thisTriangle[2]);
+
+	// Draw these lines
+	drawLine(LS1, dev_fragmentBuffer, width, height);
+	drawLine(LS2, dev_fragmentBuffer, width, height);
+	drawLine(LS3, dev_fragmentBuffer, width, height);
+
+#elif TRIANGLES
+
+#if NAIVE_EDGEINTERSECTION_SCANLINE_TOGGLE	
+	
+	// Do ScanLine Edge Intersection Rasterization
+	
+	// Fill bounds and Clip them to screen size
+	float maxY = glm::min(aabb.max[1], (float)height);	
+	float minY = glm::max(aabb.min[1], 0.0f);
+
+	// Create LineSegments from vertices
+	LineSegment LS1 = createLineSegment(thisTriangle[0], thisTriangle[1]);
+	LineSegment LS2 = createLineSegment(thisTriangle[0], thisTriangle[2]);
+	LineSegment LS3 = createLineSegment(thisTriangle[1], thisTriangle[2]);
+
+	for (int i = minY; i <= maxY; i++) {
+		// Check for intersections and find the minX and maxX value for each pixel row
+		float minX = FLT_MAX;
+		float maxX = FLT_MIN;
+		int intersectionCount = 0;
+		
+		if (intersectWithLineSegemnt(LS1, i, minX, maxX, aabb)) {
+			intersectionCount++;
+		}
+
+		if (intersectWithLineSegemnt(LS2, i, minX, maxX, aabb)) {
+			intersectionCount++;
+		}
+		
+		if (intersectWithLineSegemnt(LS3, i, minX, maxX, aabb)) {
+			intersectionCount++;
+		}
+
+		if (intersectionCount < 2) {
+			continue;
+		}
+		
+		// Clip them to the screen size
+		minX = glm::max(minX, 0.0f);
+		maxX = glm::min(maxX, (float)width);
+			
+		for (int j = minX; j <= maxX; j++) {
+				
+			// Get the baricentric coordinate for position x, y (j, i)
+			glm::vec3 baryCentricCoordinate = calculateBarycentricCoordinate(thisTriangle, glm::vec2(j, i));
+	
+			if (!isBarycentricCoordInBounds(baryCentricCoordinate)) {
+				continue;
+			}
+
+			int perspectiveCorrectZ = getZAtCoordinate(baryCentricCoordinate, thisTriangle) * 10000;
+
+			int pixelIndex = j + (i * width);
+
+			bool depthUpdated = fillDepthBufferWithMinValue(&dev_mutex[pixelIndex], &dev_depth[pixelIndex], perspectiveCorrectZ);
+
+			if (depthUpdated) {
+				// Interpolating the eye normals and the positions used for shading later
+				dev_fragmentBuffer[pixelIndex].eyePos = tempPrimitive.v[0].eyePos * baryCentricCoordinate.x
+					+ tempPrimitive.v[1].eyePos * baryCentricCoordinate.y
+					+ tempPrimitive.v[2].eyePos * baryCentricCoordinate.z;
+				dev_fragmentBuffer[pixelIndex].eyeNor = tempPrimitive.v[0].eyeNor * baryCentricCoordinate.x
+					+ tempPrimitive.v[1].eyeNor * baryCentricCoordinate.y
+					+ tempPrimitive.v[2].eyeNor * baryCentricCoordinate.z;
+
+#if SOLIDCOLOR
+				dev_fragmentBuffer[pixelIndex].color = COLOR;
+#elif TEXTURING
+
+#if PERSPECTIVECORRECTTEXTURING
+				float z0 = tempPrimitive.v[0].eyePos.z;
+				float z1 = tempPrimitive.v[1].eyePos.z;
+				float z2 = tempPrimitive.v[2].eyePos.z;
+
+				// Correctly interpolated z value
+				float z = baryCentricCoordinate.x / z0 + baryCentricCoordinate.y / z1 + baryCentricCoordinate.z / z2;
+
+				// Perspective corrected texture coordinates
+				dev_fragmentBuffer[pixelIndex].texcoord0 = (tempPrimitive.v[0].texcoord0 / z0 * baryCentricCoordinate.x
+															+ tempPrimitive.v[1].texcoord0 / z1 * baryCentricCoordinate.y
+															+ tempPrimitive.v[2].texcoord0 / z2 * baryCentricCoordinate.z) / z;
+#else
+				dev_fragmentBuffer[pixelIndex].texcoord0 = tempPrimitive.v[0].texcoord0 * baryCentricCoordinate.x
+															+ tempPrimitive.v[1].texcoord0 * baryCentricCoordinate.y
+															+ tempPrimitive.v[2].texcoord0 * baryCentricCoordinate.z;
+#endif
+				dev_fragmentBuffer[pixelIndex].textureWidth = tempPrimitive.v[0].texWidth;
+				dev_fragmentBuffer[pixelIndex].textureHeight = tempPrimitive.v[0].texHeight;
+				dev_fragmentBuffer[pixelIndex].dev_diffuseTex = tempPrimitive.v[0].dev_diffuseTex;
+
+#endif
+			}
+
+		}
+	
+	}
+#else	
+	
+	// Do Scanline Naive Rasterization
+
+	float minX = glm::max(aabb.min[0], 0.0f);
+	float maxX = glm::min(aabb.max[0], (float)(width - 1));
+	float minY = glm::max(aabb.min[1], 0.0f);
+	float maxY = glm::min(aabb.max[1], (float)(height - 1));
+
+	for (int y = minY; y <= maxY; y++) {
+		for (int x = minX; x <= maxX; x++) {
+
+			// Get the baricentric coordinate for position x, y on screen
+			glm::vec3 baryCentricCoordinate = calculateBarycentricCoordinate(thisTriangle, glm::vec2(x, y));
+
+			if (!isBarycentricCoordInBounds(baryCentricCoordinate)) {
+				continue;
+			}
+
+			int perspectiveCorrectZ = getZAtCoordinate(baryCentricCoordinate, thisTriangle) * 10000;
+	
+			int pixelIndex = x + (y * width);
+
+			bool depthUpdated = fillDepthBufferWithMinValue(&dev_mutex[pixelIndex], &dev_depth[pixelIndex], perspectiveCorrectZ);
+
+			if (depthUpdated) {
+				// Interpolating the eye normals and the positions used for shading later
+				dev_fragmentBuffer[pixelIndex].eyePos = tempPrimitive.v[0].eyePos * baryCentricCoordinate.x
+					+ tempPrimitive.v[1].eyePos * baryCentricCoordinate.y
+					+ tempPrimitive.v[2].eyePos * baryCentricCoordinate.z;
+				dev_fragmentBuffer[pixelIndex].eyeNor = tempPrimitive.v[0].eyeNor * baryCentricCoordinate.x
+					+ tempPrimitive.v[1].eyeNor * baryCentricCoordinate.y
+					+ tempPrimitive.v[2].eyeNor * baryCentricCoordinate.z;
+
+#if SOLIDCOLOR
+				dev_fragmentBuffer[pixelIndex].color = COLOR;
+#elif TEXTURING
+
+#if PERSPECTIVECORRECTTEXTURING
+				float z0 = tempPrimitive.v[0].eyePos.z;
+				float z1 = tempPrimitive.v[1].eyePos.z;
+				float z2 = tempPrimitive.v[2].eyePos.z;
+
+				// Correctly interpolated z value
+				float z = baryCentricCoordinate.x / z0 + baryCentricCoordinate.y / z1 + baryCentricCoordinate.z / z2;
+
+				// Perspective corrected texture coordinates
+				dev_fragmentBuffer[pixelIndex].texcoord0 = (tempPrimitive.v[0].texcoord0 / z0 * baryCentricCoordinate.x 
+															+ tempPrimitive.v[1].texcoord0 / z1 * baryCentricCoordinate.y 
+															+ tempPrimitive.v[2].texcoord0 / z2 * baryCentricCoordinate.z) / z;
+#else
+				dev_fragmentBuffer[pixelIndex].texcoord0 = tempPrimitive.v[0].texcoord0 * baryCentricCoordinate.x 
+															+ tempPrimitive.v[1].texcoord0 * baryCentricCoordinate.y 
+															+ tempPrimitive.v[2].texcoord0 * baryCentricCoordinate.z;
+#endif
+				dev_fragmentBuffer[pixelIndex].textureWidth = tempPrimitive.v[0].texWidth;
+				dev_fragmentBuffer[pixelIndex].textureHeight = tempPrimitive.v[0].texHeight;
+				dev_fragmentBuffer[pixelIndex].dev_diffuseTex = tempPrimitive.v[0].dev_diffuseTex;
+
+#endif
+			}
+	
+		}
+	}
+#endif
+
+#endif
+}
+
+
+/**
+*	Perform FXAA
+*	Reference: https://blog.codinghorror.com/fast-approximate-anti-aliasing-fxaa/
+*/
+__global__
+void FXAAkern(int width, int height, glm::vec3* dev_framebuffer, glm::vec3* dev_temp_framebuffer, float FXAA_SPAN_MAX, float FXAA_EDGE_THRESHOLD_MAX, float FXAA_EDGE_THRESHOLD_MIN, float* dev_quality) {
+	int x0 = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int y0 = (blockIdx.y * blockDim.y) + threadIdx.y;
+	int index = x0 + (y0 * width);
+
+	// STEPS for FXAA:
+	// 1. Find the luma of the pixels up, down, right and left of the given pixel
+	// 2. Find the min and the max luma deviance and if it is below the threshold return. No AA will be performed as it is not an edge.
+	// 3. Find the luma of all the corner points and find the difference in luma horizontally and vertically.
+	// 4. Check if the luma deviation is more vertically or horizontally to determine the edge direction.
+	// 5. Once the edge is determined we ofset the uv coordinates (in out case the fragment index) to be as close as to the pixel edge.
+	// 6. Now iterate on both siedes of the current pixel along the edge till we treach the end i.e. a significant gradient drop. This means we have reached the end of the edge.
+	// 7. Now we average the pixel uv coordinates based on how close it is to the either edge
+	// 8. Color the pixel
+
+	if (x0 < width && y0 < height) {
+		// 1. Find the luma of the pixels up, down, right and left of the given pixel
+
+		int x1 = glm::clamp(x0 + 1, 0, width - 1);
+		int y1 = glm::clamp(y0 + 1, 0, height - 1);
+		int xm1 = glm::clamp(x0 - 1, 0, width - 1);
+		int ym1 = glm::clamp(y0 - 1, 0, height - 1);
+
+		// uv Index of the four pixels on the sides of a given pixel
+		int uvIndexUp = x0 + (y1 * width);
+		int uvIndexDown = x0 + (ym1 * width);
+		int uvIndexLeft = xm1 + (y0 * width);
+		int uvIndexRight = x1 + (y0 * width);
+
+		// uv index of the four pixels in the corner around the given pixel
+		int uvIndexUpLeft = xm1 + (y1 * width);
+		int uvIndexDownLeft = xm1 + (ym1 * width);
+		int uvIndexUpRight = x1 + (y1 * width);
+		int uvIndexDownRight = x1 + (ym1 * width);
+
+		// Standard luminosity values of RGB based on the percieption of individual colors by humans
+		glm::vec3 luma(0.299, 0.587, 0.114);
+
+		// Luminosity at the given pixel index
+		float lumaCenter = glm::dot(dev_framebuffer[index] , luma);
+
+		// Find the luminosity of the texture in the surrounding four pixels
+		float lumaUp = glm::dot(dev_framebuffer[uvIndexUp], luma);
+		float lumaDown = glm::dot(dev_framebuffer[uvIndexDown], luma);
+		float lumaRight = glm::dot(dev_framebuffer[uvIndexRight], luma);
+		float lumaLeft = glm::dot(dev_framebuffer[uvIndexLeft], luma);
+	
+		// Find the luminosity of the four corners around a given pixel
+		// These four values combined with the above values will be used to determine if an edge is horizontal or vertical
+		float lumaUpLeft = glm::dot(dev_framebuffer[uvIndexUpLeft], luma);
+		float lumaUpRight = glm::dot(dev_framebuffer[uvIndexUpRight], luma);
+		float lumaDownLeft = glm::dot(dev_framebuffer[uvIndexDownLeft], luma);
+		float lumaDownRight = glm::dot(dev_framebuffer[uvIndexDownRight], luma);
+
+		// Check if we are in a region which needs to be AA'ed
+		
+		// find the min and max luminosity around a given fragmnet
+		float lumaMin = glm::min(lumaCenter, glm::min(glm::min(lumaUp, lumaDown), glm::min(lumaRight, lumaLeft)));
+		float lumaMax = glm::max(lumaCenter, glm::max(glm::max(lumaUp, lumaDown), glm::max(lumaRight, lumaLeft)));
+
+		// Find the deviation (DELTA) of the luminosity for deciding if there is a significant edge to perform AA around the given pixel index
+		float delta = lumaMax - lumaMin;
+
+		// 2. Find the min and the max luma deviance and if it is below the threshold return. No AA will be performed as it is not an edge.
+
+		// If the deviation is not significant enough don't bother doing AA
+		if (delta < glm::max(FXAA_EDGE_THRESHOLD_MIN, lumaMax * FXAA_EDGE_THRESHOLD_MAX)) {
+			dev_temp_framebuffer[index] = dev_framebuffer[index];
+			return;
+		}
+
+		// 3. Find the luma of all the corner points and find the difference in luma horizontally and vertically.
+
+		// Combine the lumas
+		// Edge
+		float lumaDownUp = lumaDown + lumaUp;
+		float lumaLeftRight = lumaLeft + lumaRight;
+		// Corners
+		float lumaLeftCorners = lumaDownLeft + lumaUpLeft;
+		float lumaDownCorners = lumaDownLeft + lumaDownRight;
+		float lumaRightCorners = lumaDownRight + lumaUpRight;
+		float lumaUpCorners = lumaUpRight + lumaUpLeft;
 
+		// Compute an estimation of the gradient along the horizontal and vertical axis.
+		float edgeHorizontal = glm::abs(-2.0 * lumaLeft + lumaLeftCorners) + glm::abs(-2.0 * lumaCenter + lumaDownUp) * 2.0 + glm::abs(-2.0 * lumaRight + lumaRightCorners);
+		float edgeVertical = glm::abs(-2.0 * lumaUp + lumaUpCorners) + glm::abs(-2.0 * lumaCenter + lumaLeftRight) * 2.0 + glm::abs(-2.0 * lumaDown + lumaDownCorners);
+
+		// Is edge horizontal or vertical
+		bool isHorizontal = (edgeHorizontal >= edgeVertical);
+
+		// 4. Check if the luma deviation is more vertically or horizontally to determine the edge direction.
+
+		// Select the two neighboring texels lumas in the opposite direction to the local edge.
+		float luma1 = isHorizontal ? lumaDown : lumaLeft;
+		float luma2 = isHorizontal ? lumaUp : lumaRight;
+		// Compute gradients in this direction.
+		float gradient1 = luma1 - lumaCenter;
+		float gradient2 = luma2 - lumaCenter;
+
+		// Which direction is the steepest ?
+		bool is1Steepest = glm::abs(gradient1) >= glm::abs(gradient2);
+
+		// Gradient in the corresponding direction, normalized.
+		float gradientScaled = 0.25*glm::max(abs(gradient1), abs(gradient2));
+
+		// Choose the step size (one pixel) according to the edge direction.
+		float stepLength = isHorizontal ? (1.0f/height) : (1.0f/width);
+
+		// Average luma in the correct direction.
+		float lumaLocalAverage = 0.0;
+
+		if (is1Steepest) {
+			// Switch the direction
+			stepLength = -stepLength;
+			lumaLocalAverage = 0.5*(luma1 + lumaCenter);
+		}
+		else {
+			lumaLocalAverage = 0.5*(luma2 + lumaCenter);
+		}
+
+		// 5. Once the edge is determined we ofset the uv coordinates (in out case the fragment index) to be as close as to the pixel edge.
+
+		// Shift UV in the correct direction by half a pixel.
+		glm::vec2 currentUV = glm::vec2(x0, y0);
+		if (isHorizontal) {
+			currentUV.y += stepLength * 0.5;
+		}
+		else {
+			currentUV.x += stepLength * 0.5;
+		}
+
+		// 6. Now iterate on both siedes of the current pixel along the edge till we treach the end i.e. a significant gradient drop. This means we have reached the end of the edge.
+
+		// Exploer the edge on both sides and find the endpoint
+		// Do the first iteration and you are done if you find the luminosity gradient is significant
+		// Compute offset (for each iteration step) in the correct direction.
+		glm::vec2 offset = isHorizontal ? glm::vec2((1.0/width), 0.0) : glm::vec2(0.0, (1.0f/height));
+		// Compute UVs to explore on each side of the edge, orthogonally. 
+		// The QUALITY allows us to step faster.
+		glm::vec2 uv1 = currentUV - offset;
+		glm::vec2 uv2 = currentUV + offset;
+
+		// Read the lumas at both current extremities of the exploration segment, and compute the delta wrt to the local average luma.
+		float lumaEnd1 = glm::dot(dev_framebuffer[(int)uv1.x + ((int)uv1.y * width)], luma);//rgb2luma(texture(screenTexture, uv1).rgb);
+		float lumaEnd2 = glm::dot(dev_framebuffer[(int)uv2.x + ((int)uv2.y * width)], luma);//rgb2luma(texture(screenTexture, uv2).rgb);
+		lumaEnd1 -= lumaLocalAverage;
+		lumaEnd2 -= lumaLocalAverage;
+
+		// If the luma deltas at the current extremities are larger than the local gradient, we have reached the side of the edge.
+		bool reached2 = glm::abs(lumaEnd2) >= gradientScaled;
+		bool reached1 = glm::abs(lumaEnd1) >= gradientScaled;
+		bool reachedBoth = reached1 && reached2;
+
+		// If the side is not reached, we continue to explore in this direction.
+		if (!reached1) {
+			uv1 -= offset;
+		}
+		if (!reached2) {
+			uv2 += offset;
+		}
+
+		// Itereating
+		if(!reachedBoth) {
+			for (int i = 1; i < FXAA_SPAN_MAX; i++) {
+			
+				// If needed, read luma in 1st direction, compute delta.
+				if (!reached1) {
+					lumaEnd1 = glm::dot(dev_framebuffer[(int)uv1.x + ((int)uv1.y * width)], luma);// rgb2luma(texture(screenTexture, uv1).rgb);
+					lumaEnd1 = lumaEnd1 - lumaLocalAverage;
+				}
+				// If needed, read luma in opposite direction, compute delta.
+				if (!reached2) {
+					lumaEnd2 = glm::dot(dev_framebuffer[(int)uv2.x + ((int)uv2.y * width) ], luma);// rgb2luma(texture(screenTexture, uv2).rgb);
+					lumaEnd2 = lumaEnd2 - lumaLocalAverage;
+				}
+				// If the luma deltas at the current extremities is larger than the local gradient, we have reached the side of the edge.
+				reached1 = abs(lumaEnd1) >= gradientScaled;
+				reached2 = abs(lumaEnd2) >= gradientScaled;
+				reachedBoth = reached1 && reached2;
+
+				// If the side is not reached, we continue to explore in this direction, with a variable quality.
+				if (!reached1) {
+					uv1 -= offset * dev_quality[i];
+				}
+				if (!reached2) {
+					uv2 += offset * dev_quality[i];
+				}
+
+				if (reachedBoth) {
+					break;
+				}
+			}
+		}
+		// Done iterating
+
+		// 7. Now we average the pixel uv coordinates based on how close it is to the either edge
+
+		// Now we estimate the offset if we are at the center of the edge or near the far sides.
+		// The closer we are to the far sides the more blurring will need to be done to make the edge look smooth
+
+		// Compute the distances to each extremity of the edge.
+		float distance1 = isHorizontal ? (x0 - uv1.x) : (y0 - uv1.y);
+		float distance2 = isHorizontal ? (uv2.x - x0) : (uv2.y - y0);
+
+		// In which direction is the extremity of the edge closer ?
+		bool isDirection1 = distance1 < distance2;
+		float distanceFinal = glm::min(distance1, distance2);
+
+		// Length of the edge.
+		float edgeThickness = (distance1 + distance2);
+
+		// UV offset: read in the direction of the closest side of the edge.
+		float pixelOffset = -distanceFinal / edgeThickness + 0.5;
+
+		// Now check if the luminosity of the center pixe; corrosponds to that on the edges detected
+		// If not than we may have stepped too far
+
+		// Is the luma at center smaller than the local average ?
+		bool isLumaCenterSmaller = lumaCenter < lumaLocalAverage;
+
+		// If the luma at center is smaller than at its neighbour, the delta luma at each end should be positive (same variation).
+		// (in the direction of the closer side of the edge.)
+		bool correctVariation = ((isDirection1 ? lumaEnd1 : lumaEnd2) < 0.0) != isLumaCenterSmaller;
+
+		// If the luma variation is incorrect, do not offset.
+		float finalOffset = correctVariation ? pixelOffset : 0.0;
+
+		// 8. Color the pixel
+
+		// Compute the final UV coordinates.
+		glm::vec2 finalUv = glm::vec2(x0, y0);
+		if (isHorizontal) {
+			finalUv.y += finalOffset * stepLength;
+		}
+		else {
+			finalUv.x += finalOffset * stepLength;
+		}
+
+		// Read the color at the new UV coordinates, and use it.
+		dev_temp_framebuffer[index] = dev_framebuffer[(int)finalUv.x + ((int)finalUv.y * width)];
+	}
+}
+
+/**
+*	Copy data from dev_temp_framebuffer to dev_framebuffer
+*/
+__global__
+void copyKern(int width, int height, glm::vec3* dev_framebuffer, glm::vec3* dev_temp_framebuffer) {
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+	int index = x + (y * width);
+
+	if (x < width && y < height) {
+		dev_framebuffer[index] = dev_temp_framebuffer[index];
+	}
+}
 
 /**
  * Perform rasterization.
  */
 void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) {
+#if PERFORMANCE_ANALYSIS_PER_RASTERIZATION_CALL
+	auto start = std::chrono::high_resolution_clock::now();
+#endif
+
     int sideLength2d = 8;
     dim3 blockSize2d(sideLength2d, sideLength2d);
     dim3 blockCount2d((width  - 1) / blockSize2d.x + 1,
@@ -687,6 +1470,9 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g
 	// Execute your rasterization pipeline here
 	// (See README for rasterization pipeline outline.)
 
+#if PA_VERTEX_PROCESS_PRIMITIVE_ASSEMBLY
+	auto PA_VPPA_Start = std::chrono::high_resolution_clock::now();
+#endif
 	// Vertex Process & primitive assembly
 	{
 		curPrimitiveBeginId = 0;
@@ -718,20 +1504,96 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g
 
 		checkCUDAError("Vertex Processing and Primitive Assembly");
 	}
-	
+
+#if PA_VERTEX_PROCESS_PRIMITIVE_ASSEMBLY
+	auto PA_VPPA_End = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> PA_VPPA_Diff = PA_VPPA_End - PA_VPPA_Start;
+	std::cout << "Time taken for Vertex Processing and Primitive Assembly: " << PA_VPPA_Diff.count() << " ms." << std::endl;
+#endif
+
 	cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment));
 	initDepth << <blockCount2d, blockSize2d >> >(width, height, dev_depth);
-	
-	// TODO: rasterize
-
+	initMutex << <blockCount2d, blockSize2d >> >(width, height, dev_mutex);
 
+#if PA_RASTERIZATION
+	auto PA_R_Start = std::chrono::high_resolution_clock::now();
+#endif
 
+	// TODO: rasterize
+	dim3 blockSize(128);
+	dim3 numBlocksPerTriangle((totalNumPrimitives + blockSize.x - 1) / blockSize.x);
+	_rasterizeGeometry<<<numBlocksPerTriangle, blockSize>>>(totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, dev_mutex,  width, height);
+	checkCUDAError("rasterize geometry");
+
+#if PA_RASTERIZATION
+	auto PA_R_End = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> PA_R_Diff = PA_R_End - PA_R_Start;
+	std::cout << "Time taken for Rasterization: " << PA_R_Diff.count() << " ms." << std::endl;
+#endif
+
+#if PA_FRAGMENT_SHADING
+	auto PA_FS_Start = std::chrono::high_resolution_clock::now();
+#endif 
     // Copy depthbuffer colors into framebuffer
 	render << <blockCount2d, blockSize2d >> >(width, height, dev_fragmentBuffer, dev_framebuffer);
 	checkCUDAError("fragment shader");
+
+#if PA_FRAGMENT_SHADING
+	auto PA_FS_End = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> PA_FS_Diff = PA_FS_End - PA_FS_Start;
+	std::cout << "Time taken for Fragment Shading: " << PA_FS_Diff.count() << " ms." << std::endl;
+#endif 
+
+
+#if PA_COPY_FRAME_BUFFER
+	auto PA_CFB_Start = std::chrono::high_resolution_clock::now();
+#endif
+
+#if PA_FXAA
+	auto PA_FXAA_Start = std::chrono::high_resolution_clock::now();
+#endif
+
+// Anti-Aliasing Effect using post processing Fast Approximation AA
+#if FXAA
+
+	// Do FXAA
+	// Fillup the quality array used for iterating over the pixel edge
+	float quality[12] = {1.5, 2.0, 2.0, 2.0, 4.0, 8.0, 4.0, 8.0, 8.0, 8.0, 16.0, 32.0};
+	cudaMemcpy(dev_quality, &quality, 12 * sizeof(float), cudaMemcpyHostToDevice);
+
+	float FXAA_SPAN_MAX = 12.0;	// This is the number of steps about a given pixel we will take at a given time 
+	float FXAA_EDGE_THRESHOLD_MAX = 1.0 / 8.0;
+	float FXAA_EDGE_THRESHOLD_MIN = 0.0312;
+	FXAAkern << <blockCount2d, blockSize2d>> > (width, height, dev_framebuffer, dev_temp_framebuffer, FXAA_SPAN_MAX, FXAA_EDGE_THRESHOLD_MAX, FXAA_EDGE_THRESHOLD_MIN, dev_quality);
+	checkCUDAError("Apply FXAA");
+
+	// Copy dev_temp_Framebuffer to dev_framebuffer
+	copyKern << <blockCount2d, blockSize2d>> > (width, height, dev_framebuffer, dev_temp_framebuffer);
+	checkCUDAError("copy dev_temp_framebuffer to dev_fraebuffer");
+
+#endif
+
+#if PA_FXAA
+	auto PA_FXAA_End = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> PA_FXAA_Diff = PA_FXAA_End - PA_FXAA_Start;
+	std::cout << "Time taken for FXAA: " << PA_FXAA_Diff.count() << " ms." << std::endl;
+#endif
+
     // Copy framebuffer into OpenGL buffer for OpenGL previewing
     sendImageToPBO<<<blockCount2d, blockSize2d>>>(pbo, width, height, dev_framebuffer);
     checkCUDAError("copy render result to pbo");
+
+#if PA_COPY_FRAME_BUFFER
+	auto PA_CFB_End = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> PA_CFB_Diff = PA_CFB_End - PA_CFB_Start;
+	std::cout << "Time taken for Copy Frame buffer: " << PA_CFB_Diff.count() << " ms." << std::endl;
+#endif
+
+#if PERFORMANCE_ANALYSIS_PER_RASTERIZATION_CALL
+	auto end = std::chrono::high_resolution_clock::now();
+	std::chrono::duration<double, std::milli> diff = end - start;
+	std::cout << "Time taken for Rasterization: " << diff.count() << " ms." << std::endl;
+#endif
 }
 
 /**
@@ -772,5 +1634,15 @@ void rasterizeFree() {
 	cudaFree(dev_depth);
 	dev_depth = NULL;
 
+	// Newly Added
+	cudaFree(dev_mutex);
+	dev_mutex = NULL;
+
+	cudaFree(dev_temp_framebuffer);
+	dev_temp_framebuffer = NULL;
+
+	cudaFree(dev_quality);
+	dev_quality = NULL;
+
     checkCUDAError("rasterize Free");
 }
diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h
index 46c701e..64fc2f3 100644
--- a/src/rasterizeTools.h
+++ b/src/rasterizeTools.h
@@ -17,6 +17,100 @@ struct AABB {
     glm::vec3 max;
 };
 
+struct LineSegment {
+	glm::vec3 vertex1;
+	glm::vec3 vertex2;
+
+	float slope;
+
+	float minY;
+	float maxY;
+};
+
+/*
+* Checks if two floats are within threshold of each other
+* i.e. Nearly equal
+*/
+__host__ __device__ static
+bool nearlyEqual(float f1, float f2) {
+	if (f1 > (f2 - SMALL_EPSILON) && f1 < (f2 + SMALL_EPSILON)) {
+		return true;
+	}
+	else {
+		return false;
+	}
+}
+
+/*
+* Gets the slope of the line segment
+*/
+__host__ __device__ static
+float getLineSegmentSlope(glm::vec3 point1, glm::vec3 point2) {
+	// HORIZONTAL LINE
+	if (nearlyEqual(point1[1], point2[1])) {
+		return 0.0f;
+	}
+	// VERTICLE LINE
+	else if (nearlyEqual(point1[0], point2[0])) {
+		return FLT_MAX;
+	}
+	else {
+		return ((point2[1] - point1[1]) / (point2[0] - point1[0]));
+	}
+}
+
+/**
+* Initializes a line segment
+*/
+__host__ __device__ static
+LineSegment createLineSegment(glm::vec3 point1, glm::vec3 point2) {
+	LineSegment LS;
+
+	LS.slope = getLineSegmentSlope(point1, point2);
+	
+	LS.minY = glm::min(point1[1], point2[1]);
+	LS.maxY = glm::max(point1[1], point2[1]);
+
+	LS.vertex1 = point1;
+	LS.vertex2 = point2;
+
+	return LS;
+}
+
+/*
+* Finds intersection with line segment
+*/
+__host__ __device__ static
+bool intersectWithLineSegemnt(LineSegment LS, int Y, float& minX, float& maxX, AABB aabb) {
+	if ((float)Y < LS.minY || (float)Y > LS.maxY) {
+		return false;
+	}
+	
+	if (LS.slope == 0) {
+		// HORIZONTAL LINE
+		minX = glm::min(minX, glm::min(LS.vertex1[0], LS.vertex2[0]));
+		maxX = glm::max(maxX, glm::max(LS.vertex1[0], LS.vertex2[0]));
+		return true;
+	}
+	else if (LS.slope == FLT_MAX) {
+		// VERTICAL LINE
+		minX = glm::min(minX, glm::min(LS.vertex1[0], LS.vertex2[0]));
+		maxX = glm::max(maxX, glm::max(LS.vertex1[0], LS.vertex2[0]));
+		return true;
+	}
+	else {
+		// P(X,Y) -> Point of intersection
+		float X = (Y - LS.vertex1[1]) / LS.slope + LS.vertex1[0];
+		if (X < aabb.min[0] || X > aabb.max[0]) {
+			return false;
+		}
+
+		minX = glm::min(minX, X);
+		maxX = glm::max(maxX, X);
+		return true;
+	}
+}
+
 /**
  * Multiplies a glm::mat4 matrix and a vec4.
  */
@@ -99,3 +193,41 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3])
            + barycentricCoord.y * tri[1].z
            + barycentricCoord.z * tri[2].z);
 }
+
+/**
+*	Fills the depth buffer without race conditions or memory erite conflicts
+*/
+__device__
+bool fillDepthBufferWithMinValue(int* mutex, int* dev_depth, int perspectiveCorrectZ) {
+	// Loop-wait until this thread is able to execute its critical section.
+	if (perspectiveCorrectZ > *dev_depth) {
+		return false;
+	}
+
+	bool depthUpdated = false;
+
+	bool isSet;
+	do {
+		isSet = (atomicCAS(mutex, 0, 1) == 0);
+		if (isSet) {
+			// Critical section goes here.
+			// The critical section MUST be inside the wait loop;
+			// if it is afterward, a deadlock will occur.
+			if (perspectiveCorrectZ < *dev_depth) {
+				*dev_depth = perspectiveCorrectZ;
+				depthUpdated = true;
+			}
+			*mutex = 0;
+		}
+	} while (!isSet);
+
+	return depthUpdated;
+}
+
+/**
+*	Clamps a given intiger between min and max value
+*/
+__host__ __device__
+int clampVal(int valueToBeClamped, int min, int max) {
+	return glm::clamp(valueToBeClamped, min, max);
+}
\ No newline at end of file
diff --git a/util/utilityCore.hpp b/util/utilityCore.hpp
index a67db68..6ed3f81 100644
--- a/util/utilityCore.hpp
+++ b/util/utilityCore.hpp
@@ -24,6 +24,7 @@
 #define G                           6.67384e-11
 #define EPSILON                     .000000001
 #define ZERO_ABSORPTION_EPSILON     0.00001
+#define SMALL_EPSILON				0.000001
 
 namespace utilityCore {
 extern float clamp(float f, float min, float max);