Skip to content

research on Vector multiplication optimizations in Go

Notifications You must be signed in to change notification settings

gohxs/vec-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Researching vector multiplication optimizations

Running

  go get github.com/gohxs/vec-benchmark
  go test github.com/gohxs/vec-benchmark -bench .

The optimizations are based on this function which multiply each element of two vectors into a third vector

all vectors needs to be same size, out vector can only be bigger or equal in size

func Mul(vec1, vec2, out []float32) {
  for i := 0; i < len(vec1); i++ {
    out[i] = vec1[i] * vec2[i]
  }
}

Sample result for

big   vec = 100000 * NWorkers * 8
small vec = 10 * NWorkers * 8

27-11-2017

goos: linux
goarch: amd64
pkg: github.com/gohxs/vec-benchmark
BenchmarkVecSmall/Single/_________VecMulgo-4      10000000        166 ns/op
BenchmarkVecSmall/Single/_____VecMulFuncgo-4      10000000        191 ns/op
BenchmarkVecSmall/Single/asm.VecMulf32x4sse-4     50000000         29.1 ns/op
BenchmarkVecSmall/Single/asm.VecMulf32x8avx-4     100000000        15.3 ns/op
BenchmarkVecSmall/Single/cgo.VecMulf32x4sse-4     20000000         86.6 ns/op
BenchmarkVecSmall/Single/cgo.VecMulf32x8avx-4     20000000         85.5 ns/op
BenchmarkVecSmall/Routine/_________VecMulgo-4      2000000        914 ns/op
BenchmarkVecSmall/Routine/_____VecMulFuncgo-4      2000000        929 ns/op
BenchmarkVecSmall/Routine/asm.VecMulf32x4sse-4     2000000        734 ns/op
BenchmarkVecSmall/Routine/asm.VecMulf32x8avx-4     2000000        808 ns/op
BenchmarkVecSmall/Routine/cgo.VecMulf32x4sse-4     2000000        939 ns/op
BenchmarkVecSmall/Routine/cgo.VecMulf32x8avx-4     2000000        957 ns/op
BenchmarkVecSmall/Worker/_________VecMulgo-4       1000000       1190 ns/op
BenchmarkVecSmall/Worker/_____VecMulFuncgo-4       1000000       1225 ns/op
BenchmarkVecSmall/Worker/asm.VecMulf32x4sse-4      1000000       1005 ns/op
BenchmarkVecSmall/Worker/asm.VecMulf32x8avx-4      1000000       1099 ns/op
BenchmarkVecSmall/Worker/cgo.VecMulf32x4sse-4      1000000       1300 ns/op
BenchmarkVecSmall/Worker/cgo.VecMulf32x8avx-4      1000000       1273 ns/op
BenchmarkVecBig/Single/_________VecMulgo-4           10000     155749 ns/op
BenchmarkVecBig/Single/_____VecMulFuncgo-4           10000     181747 ns/op
BenchmarkVecBig/Single/asm.VecMulf32x4sse-4          20000      63763 ns/op
BenchmarkVecBig/Single/asm.VecMulf32x8avx-4          20000      58711 ns/op
BenchmarkVecBig/Single/cgo.VecMulf32x4sse-4          20000      62572 ns/op
BenchmarkVecBig/Single/cgo.VecMulf32x8avx-4          20000      62099 ns/op
BenchmarkVecBig/Routine/_________VecMulgo-4          20000      93637 ns/op
BenchmarkVecBig/Routine/_____VecMulFuncgo-4          10000     102044 ns/op
BenchmarkVecBig/Routine/asm.VecMulf32x4sse-4         30000      60345 ns/op
BenchmarkVecBig/Routine/asm.VecMulf32x8avx-4         20000      53928 ns/op
BenchmarkVecBig/Routine/cgo.VecMulf32x4sse-4         30000      60901 ns/op
BenchmarkVecBig/Routine/cgo.VecMulf32x8avx-4         30000      58339 ns/op
BenchmarkVecBig/Worker/_________VecMulgo-4           20000     103466 ns/op
BenchmarkVecBig/Worker/_____VecMulFuncgo-4           10000     110428 ns/op
BenchmarkVecBig/Worker/asm.VecMulf32x4sse-4          20000      57864 ns/op
BenchmarkVecBig/Worker/asm.VecMulf32x8avx-4          30000      56748 ns/op
BenchmarkVecBig/Worker/cgo.VecMulf32x4sse-4          30000      61035 ns/op
BenchmarkVecBig/Worker/cgo.VecMulf32x8avx-4          30000      61112 ns/op
PASS
ok      github.com/gohxs/vec-benchmark  70.360s

26-11-2017:

goos: linux
goarch: amd64
pkg: github.com/gohxs/vec-benchmark
BenchmarkVecSmall/Single/_________VecMulgo-4     10000000       167 ns/op
BenchmarkVecSmall/Single/_____VecMulFuncgo-4     10000000       171 ns/op
BenchmarkVecSmall/Single/asm.VecMulf32x4sse-4    50000000        32.4 ns/op
BenchmarkVecSmall/Single/asm.VecMulf32x8avx-4    100000000       15.9 ns/op
BenchmarkVecSmall/Single/cgo.VecMulf32x4sse-4    20000000        87.0 ns/op
BenchmarkVecSmall/Single/cgo.VecMulf32x8avx-4    20000000        85.4 ns/op
BenchmarkVecSmall/Routine/_________VecMulgo-4     2000000       933 ns/op
BenchmarkVecSmall/Routine/_____VecMulFuncgo-4     2000000       924 ns/op
BenchmarkVecSmall/Routine/asm.VecMulf32x4sse-4    2000000       726 ns/op
BenchmarkVecSmall/Routine/asm.VecMulf32x8avx-4    2000000       797 ns/op
BenchmarkVecSmall/Routine/cgo.VecMulf32x4sse-4    2000000       930 ns/op
BenchmarkVecSmall/Routine/cgo.VecMulf32x8avx-4    2000000      1022 ns/op
BenchmarkVecSmall/Worker/_________VecMulgo-4      1000000      1182 ns/op
BenchmarkVecSmall/Worker/_____VecMulFuncgo-4      1000000      1196 ns/op
BenchmarkVecSmall/Worker/asm.VecMulf32x4sse-4     1000000      1022 ns/op
BenchmarkVecSmall/Worker/asm.VecMulf32x8avx-4     1000000      1090 ns/op
BenchmarkVecSmall/Worker/cgo.VecMulf32x4sse-4     1000000      1268 ns/op
BenchmarkVecSmall/Worker/cgo.VecMulf32x8avx-4     1000000      1295 ns/op
BenchmarkVecBig/Single/_________VecMulgo-4          10000    156223 ns/op
BenchmarkVecBig/Single/_____VecMulFuncgo-4          10000    161283 ns/op
BenchmarkVecBig/Single/asm.VecMulf32x4sse-4         20000     64221 ns/op
BenchmarkVecBig/Single/asm.VecMulf32x8avx-4         20000     58880 ns/op
BenchmarkVecBig/Single/cgo.VecMulf32x4sse-4         20000     63158 ns/op
BenchmarkVecBig/Single/cgo.VecMulf32x8avx-4         20000     62082 ns/op
BenchmarkVecBig/Routine/_________VecMulgo-4         20000     95356 ns/op
BenchmarkVecBig/Routine/_____VecMulFuncgo-4         20000     98905 ns/op
BenchmarkVecBig/Routine/asm.VecMulf32x4sse-4        30000     55805 ns/op
BenchmarkVecBig/Routine/asm.VecMulf32x8avx-4        30000     54938 ns/op
BenchmarkVecBig/Routine/cgo.VecMulf32x4sse-4        30000     56425 ns/op
BenchmarkVecBig/Routine/cgo.VecMulf32x8avx-4        30000     56495 ns/op
BenchmarkVecBig/Worker/_________VecMulgo-4          20000     98456 ns/op
BenchmarkVecBig/Worker/_____VecMulFuncgo-4          20000    104448 ns/op
BenchmarkVecBig/Worker/asm.VecMulf32x4sse-4         30000     56922 ns/op
BenchmarkVecBig/Worker/asm.VecMulf32x8avx-4         30000     54801 ns/op
BenchmarkVecBig/Worker/cgo.VecMulf32x4sse-4         30000     56743 ns/op
BenchmarkVecBig/Worker/cgo.VecMulf32x8avx-4         30000     57845 ns/op
PASS
ok      github.com/gohxs/vec-benchmark  74.490s

(Opencl still in development phase)

About

research on Vector multiplication optimizations in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published