Skip to content

Conversation

@wind0204
Copy link
Contributor

@wind0204 wind0204 commented Jan 2, 2024

Please see if this version has better performance than the non-parallel version if it interested you.

  • Dispatch 4 vector operations in each loop to allow a larger throughput in pixelsearch1x.c --I guess a CPU with decode width 5+ would accomplish the same throughput with just 2 vector operations per loop--
  • MOVMSKPS has twice the throughput of PMOVMSKB on AMD Zen2. --I guess it might help with the bottleneck on AMD Zen2--

Best regards.

iseahound and others added 27 commits October 4, 2023 20:42
Finalize ImageSearch1 code to be efficient and bug-free.
Supports transparency properly.
Fast!
…search_loop_could_go_out_of_range

Fix bug where focus search loop could go out of range
Rename pack to iter
Fix comment for pixelsearchall3
…canline_was_skipped

Fix bug where the last scanline was skipped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants