PerformantThe code is optimized for performance using CUDA or can run on multicore CPU if you don't have a GPU at hand
InformativeThrough this documentation you can learn more about the technique and how to use it as well as potential hardware implementations