mirror of
https://github.com/librempeg/librempeg
synced 2024-11-23 03:28:27 +00:00
2537fdc510
Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0 hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 hscale_16_to_19__fs_8_dstW_512_c: 10417.7 hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 hscale_16_to_19__fs_12_dstW_512_c: 14890.5 hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 hscale_16_to_19__fs_16_dstW_512_c: 19006.5 hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 hscale_16_to_19__fs_32_dstW_512_c: 36629.5 hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
||
---|---|---|
.. | ||
hscale.S | ||
Makefile | ||
output.S | ||
rgb2rgb_neon.S | ||
rgb2rgb.c | ||
swscale_unscaled.c | ||
swscale.c | ||
yuv2rgb_neon.S |