Forward gpu by zkSNARK · Pull Request #47 · fengggli/gpu-computing-materials

zkSNARK · 2019-04-29T03:12:14Z

PR for both forward-device and backward-device setup.

This PR incorporates a number of significant refactors
which enable CUDA functionality in both the primary
forward and backward functions.

Note that these functions are not optimized, nor are they
well written. However, they work.

Additionally, in several locations, there is leftover code
from where I was testing my mappings from the large
deeply nested loops in the col2im_inner and im2col_inner
functions to the 1D mappings needed for the device code.

Those sections should be moved out of the code and
into documentation somewhere.

.. fix some of the memory leaks in the forward .. make note about optimization in the forward .. add tensor_make_transpose_1230 (not working) and test (not working) .. moves tests for tensor_make_transpose<....> into test_tensor_op.cpp Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

…s into backward

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

This second function which does the same thing as the first is only here to show an example of a more direct mapping between the original memory and a single index. This will allow a direct conversion to cuda code without much issue. Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Tpose 3012 dev

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

…aterials into test-speed-1-32

…izes Also creates a separate file for testing the device utils and for their harnesses. Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

…aterials into test-speed-1-32

moves some things around, but creates harness functions as an example of what all our call structures will need to be in the future in order to gain control over the block and thread sizes. Note that this should be further extended to actually accept dim3, but for now it will not be. Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu> Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

This function was really designed to do a parameter search for the forward_and backward device convolution operations. However, for the submission, I have reduced it to performing a single benchmark. Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

…ce utils Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

…puting-materials into test-speed-1-32

Also creates a separate file for testing the device utils and for their harnesses. Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

…mputing-materials into gpu-computing-submission

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Changes the device allocation and deallocation to take in real tensors which allow me to do a better job forcing everything to notify on its allocation and deallocation counter. Should help figure out whats up with the memory leaks in convolution forward and backward. Also further changes toward unifying on in in the whole project. Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

The problem was a simple missing deallocation on d_dx_cols, and in this case, it was missing because d_dx_cols and d_x_cols look almost the same. It was a very easy mistake to make, but silly none-the-less. Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

zkSNARK and others added 30 commits April 2, 2019 17:52

python back transpose 1230 example and working C

54059f1

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

remove extra unused test DISABLED_test_tpose1230

d0af360

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

disable failing backward test

31b41dc

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Merge branch 'master' into transpose-1230

ede15af

Merge remote-tracking branch 'origin/master' into tpose-1230-2

3c04db5

adds working col2im_inner and test / start col2im

09d2a3b

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

corrects bug in backward example (python)

78a247f

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

adds col2im, tensor_make_remove_padding_square and tests

c6aff2e

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Merge branch 'master' into col2im

578a615

adds newline at end of layer_conv.c

8fadaae

fixes bug in col2im (remove hardcoded array)

8cfa95a

completed backward with 1 numerical test (unoptimized)

9dd9f0e

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Merge branch 'master' into backward

ddfabb1

bugfixes related to signed vs unsigned

9cea809

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

fixes mem leaks in the backward process

dc0aa3e

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

add if(cache) in forward function for fill cache

4e8992d

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

add if(cache) in forward function for fill cache

457de1b

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Merge branch 'backward' of github.com:fengggli/gpu-computing-material…

7506cf3

…s into backward

initial commit with test harness for device conv

3d6f1e7

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

setup for testing components of forward device

b229820

adds remaining harnesses for device forward / backward

4ec614e

starting work on transpose 3012 device

c1d5b2c

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

split tpose3012 into 3 separate tests

f7d2208

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

adds working device transpose 3012

db9a03b

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Merge pull request #30 from fengggli/tpose-3012-dev

7653630

Tpose 3012 dev

added the framework for the func and extra test

3713d19

adds commented example of how to get mapped 1D pad address

e04733c

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

adds extra test on the padding

dfd7e68

Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Christopher Goebel and others added 27 commits May 3, 2019 14:42

Merge branch 'test-speed-1-32' of github.com:fengggli/gpu-computing-m…

576c7d3

…aterials into test-speed-1-32

[enhancement]: adds more complex parameter control for blk / thread s…

6635fce

…izes Also creates a separate file for testing the device utils and for their harnesses. Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

Merge branch 'test-speed-1-32' of github.com:fengggli/gpu-computing-m…

bbff24b

…aterials into test-speed-1-32

[bugfix]: rename test and fix issue with calling tensor_make_alike

ddcf82a

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu> Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

merge cudnn to device branch and add bench folder

b9a45bd

missing files and gitignore

9281452

update documentation

c97cc0c

Update README.md

ad50461

[fix] withou cudnn compile

4e9046e

[grid_adjustment]: add functions for controlling the grid in the devi…

7ae3be4

…ce utils Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

add final presentation

f70eb79

Merge branch 'test-speed-1-32' of https://github.com/fengggli/gpu-com…

541cd4d

…puting-materials into test-speed-1-32

[add] final slides

5c48626

[enhancement]: adds space

6e85bdb

Also creates a separate file for testing the device utils and for their harnesses. Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

[enhancement]: adds space

cb82308

Also creates a separate file for testing the device utils and for their harnesses. Signed-off-by: Christopher Goebel <cmgoebel@iupui.edu>

[bugfix]: fix error in cmake

ab84449

cmake fix

0c54eb6

update readme for parallel build

57ba8a4

final readme

42a9ea5

Merge branch 'gpu-computing-submission' of github.com:fengggli/gpu-co…

ebc57a7

…mputing-materials into gpu-computing-submission

changes all uint to int, refactor dev mem functions into dev lib

318b18a

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

adds leakcheck file

1c11067

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

[bugfix]: make some changes which kill some of the warnings

aab78f1

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

fengggli removed request for fgsong and qoofyk September 9, 2019 16:35

[bugfix]: fix missed uint to int conversion and missing device includes

a325472

Signed-off-by: Christopher Goebel <cmgoebel@iu.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Forward gpu#47

Forward gpu#47
zkSNARK wants to merge 168 commits intomasterfrom
forward-gpu

zkSNARK commented Apr 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

zkSNARK commented Apr 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants