Commit Graph

568 Commits

Author SHA1 Message Date
Ramiro Polla
f3837d7e21 checkasm/sw_range_convert: indent after previous couple of commits
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:24 +01:00
Ramiro Polla
d1cf450895 checkasm/sw_range_convert: test all supported bit depths
This commit also reduces the number of times ff_sws_init_scale() gets
called (only once per bit depth), and the number of times randomize_buffers()
gets called (only if the function must be checked).

Benchmarks are only performed on bit depths 8 and 16 (since they are
different functions, and not only different constants).

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:24 +01:00
Ramiro Polla
e916b70b15 checkasm/sw_range_convert: only run benchmarks on largest input width
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:24 +01:00
Ramiro Polla
e912eeba81 checkasm/sw_range_convert: reduce number of input sizes tested
Reduce input sizes to 8 (to test that the function works with widths
smaller than the vector length) and 1920 (raising the largest input
size to improve benchmark results).

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:24 +01:00
Ramiro Polla
37f0cd8d05 checkasm/sw_range_convert: use YUV pixel formats instead of YUVJ
We are already setting the range, so we can use regular YUV pixel
formats instead of YUVJ.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:23 +01:00
Ramiro Polla
1113b2c658 checkasm: use FF_ARRAY_ELEMS instead of hardcoding size of arrays
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-31 08:30:23 +01:00
Niklas Haas
7d8a5e3aee swscale: rename SwsContext to SwsInternal
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.

This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.

For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.

In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-26 09:25:17 +02:00
James Almer
6127e1611e tests/checkasm/sw_rgb: don't write random data past the end of the buffer
Should fix fate-checkasm-sw_rgb under gcc-ubsan.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-18 10:04:11 +02:00
Martin Storsjö
5e84f3d5dd checkasm: lls: Use relative tolerances rather than absolute ones
Depending on the magnitude of the output values, the potential
errors can be larger.

This fixes errors in the lls tests on x86_32 for some seeds,
observed with GCC 11 (on Ubuntu 22.04, with the distro compiler,
with -m32).

Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-10-10 13:35:27 +02:00
Martin Storsjö
e5a14ae4e6 checkasm: Print the SVE vector length at startup
Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-28 18:06:25 +02:00
Martin Storsjö
eadd7fbb05 aarch64: Add CPU feature flags for SVE and SVE2
Add code for detecting the feature on Linux and Windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-28 18:06:25 +02:00
Martin Storsjö
6c0404e6b4 checkasm/sw_rgb: Revert test additions from e18b46d95fadcbaaf450bda9f1871849f2b0c586
The unaligned width test cases fail on i386; we have an assembly
function of rgb24toyv12 which is enabled only within
"#if ARCH_X86_32 && HAVE_7REGS", which seems to fail these new
test cases for unaligned widths.

As that assembly function has existed for a long time in that form,
the issue probably isn't very recent, thus skip testing these cases
for now.

Once the assembly function has been fixed, these test cases can
be readded.

Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-26 21:45:11 +02:00
Zhao Zhili
95e3e4cdb2 swscale/aarch64: Fix rgb24toyv12 only works with aligned width
Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-25 21:37:11 +02:00
Ramiro Polla
b564d62366 checkasm/sw_rgb: add rgb24toyv12 tests
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-08 20:51:02 +02:00
Ramiro Polla
50df07f149 checkasm/sw_rgb: add deinterleaveBytes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-08 20:51:02 +02:00
James Almer
4ab512f6e0 fate/checkasm/sw_gbrp: don't randomly set internal values
They are set by sws_init_context().
May help with signed integer overflows reported by gcc-usan.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-08 20:50:58 +02:00
Rémi Denis-Courmont
2d03f3af32 checkasm/riscv: print official extension names
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-08 20:50:56 +02:00
Anton Khirnov
4d6e1e09dd lavc/opus*: move to opus/ subdir
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-03 10:22:59 +02:00
Ramiro Polla
81a3528647 avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-03 10:22:58 +02:00
Nuo Mi
f39ee4f57c checkasm: add vvc_bdof test
apply_bdof_8_8x16_c: 5776.5
apply_bdof_8_8x16_avx2: 396.2
apply_bdof_8_16x8_c: 5722.0
apply_bdof_8_16x8_avx2: 216.0
apply_bdof_8_16x16_c: 11213.2
apply_bdof_8_16x16_avx2: 434.5
apply_bdof_10_8x16_c: 5657.7
apply_bdof_10_8x16_avx2: 1096.0
apply_bdof_10_16x8_c: 5531.7
apply_bdof_10_16x8_avx2: 212.5
apply_bdof_10_16x16_c: 11043.7
apply_bdof_10_16x16_avx2: 1252.7
apply_bdof_12_8x16_c: 5680.0
apply_bdof_12_8x16_avx2: 1096.5
apply_bdof_12_16x8_c: 5646.2
apply_bdof_12_16x8_avx2: 624.5
apply_bdof_12_16x16_c: 11076.0
apply_bdof_12_16x16_avx2: 1241.5

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-09-03 10:22:55 +02:00
J. Dekker
ea29a27b56 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:54 +02:00
J. Dekker
550c48eac8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:54 +02:00
J. Dekker
045f9e52ca checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:53 +02:00
J. Dekker
859462e8b7 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:53 +02:00
Ramiro Polla
59fb24fa79 checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:52 +02:00
Ramiro Polla
b263720204 checkasm/yuv2yuv: add tests for semiplanar unscaled converters
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:50 +02:00
Ramiro Polla
4b87fd8a49 swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-30 18:02:41 +02:00
Nuo Mi
b6b22d5f56 checkasm: add tests for vvc dmvr
dmvr_8_12x20_c: 186.2
dmvr_8_12x20_avx2: 25.7
dmvr_8_20x12_c: 181.7
dmvr_8_20x12_avx2: 25.2
dmvr_8_20x20_c: 283.2
dmvr_8_20x20_avx2: 32.0
dmvr_10_12x20_c: 90.0
dmvr_10_12x20_avx2: 15.7
dmvr_10_20x12_c: 41.0
dmvr_10_20x12_avx2: 14.7
dmvr_10_20x20_c: 81.5
dmvr_10_20x20_avx2: 26.7
dmvr_12_12x20_c: 190.7
dmvr_12_12x20_avx2: 20.2
dmvr_12_20x12_c: 187.2
dmvr_12_20x12_avx2: 20.2
dmvr_12_20x20_c: 292.7
dmvr_12_20x20_avx2: 27.2
dmvr_h_8_12x20_c: 317.0
dmvr_h_8_12x20_avx2: 37.0
dmvr_h_8_20x12_c: 340.0
dmvr_h_8_20x12_avx2: 41.0
dmvr_h_8_20x20_c: 540.7
dmvr_h_8_20x20_avx2: 64.0
dmvr_h_10_12x20_c: 322.7
dmvr_h_10_12x20_avx2: 30.7
dmvr_h_10_20x12_c: 344.2
dmvr_h_10_20x12_avx2: 34.0
dmvr_h_10_20x20_c: 529.0
dmvr_h_10_20x20_avx2: 51.5
dmvr_h_12_12x20_c: 326.7
dmvr_h_12_12x20_avx2: 33.5
dmvr_h_12_20x12_c: 331.7
dmvr_h_12_20x12_avx2: 51.2
dmvr_h_12_20x20_c: 534.0
dmvr_h_12_20x20_avx2: 62.7
dmvr_hv_8_12x20_c: 650.0
dmvr_hv_8_12x20_avx2: 57.2
dmvr_hv_8_20x12_c: 676.2
dmvr_hv_8_20x12_avx2: 70.0
dmvr_hv_8_20x20_c: 1068.5
dmvr_hv_8_20x20_avx2: 103.2
dmvr_hv_10_12x20_c: 649.0
dmvr_hv_10_12x20_avx2: 48.2
dmvr_hv_10_20x12_c: 677.7
dmvr_hv_10_20x12_avx2: 59.7
dmvr_hv_10_20x20_c: 1093.5
dmvr_hv_10_20x20_avx2: 91.7
dmvr_hv_12_12x20_c: 660.0
dmvr_hv_12_12x20_avx2: 58.7
dmvr_hv_12_20x12_c: 682.7
dmvr_hv_12_20x12_avx2: 72.0
dmvr_hv_12_20x20_c: 1094.0
dmvr_hv_12_20x20_avx2: 113.2
dmvr_v_8_12x20_c: 325.7
dmvr_v_8_12x20_avx2: 31.2
dmvr_v_8_20x12_c: 326.2
dmvr_v_8_20x12_avx2: 38.5
dmvr_v_8_20x20_c: 538.5
dmvr_v_8_20x20_avx2: 54.2
dmvr_v_10_12x20_c: 318.5
dmvr_v_10_12x20_avx2: 23.7
dmvr_v_10_20x12_c: 330.7
dmvr_v_10_20x12_avx2: 40.5
dmvr_v_10_20x20_c: 567.5
dmvr_v_10_20x20_avx2: 48.0
dmvr_v_12_12x20_c: 335.2
dmvr_v_12_12x20_avx2: 30.0
dmvr_v_12_20x12_c: 330.2
dmvr_v_12_20x12_avx2: 39.5
dmvr_v_12_20x20_c: 535.2
dmvr_v_12_20x20_avx2: 60.0

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-15 18:17:55 +02:00
Rémi Denis-Courmont
32d04f137a lavu/riscv: drop probing for zba CPU capability
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-15 10:48:30 +02:00
Rémi Denis-Courmont
ab10f641ec lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-04 17:29:13 +02:00
Rémi Denis-Courmont
04a777d62b checkasm/riscv: preserve T1 whilst calling...
This preserves T1 whilst calling the instrumented function. In a Sci-Fi
setting where type-based Control Flow Integrity (CFI) is supported, the
calling code (i.e., the `checkasm` test case) will set T1 to the expected
value of the landing pad label (LPL) of the instrumented function.

The call wrapper will always use LPL zero which is a wild card. We should
preserve the value of T1 at least until the indirect call to the
instrumented function. Of course this is Sci-Fi, because:
1) there is no hardware (or even QEMU) support yet,
2) all our assembler functions currently use LPL zero anyway.

This uses T3 rather than T2 because indirect branches with T2 is reserved
for notionally direct calls made with an indirect call instruction (e.g.
due to GOT indirection), and are exempted from forward-edge CFI checks.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-08-04 17:29:12 +02:00
Rémi Denis-Courmont
5774d62dfc checkasm/riscv: align the landing pads
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-31 17:43:39 +02:00
Rémi Denis-Courmont
8bd0d30c02 checkasm/riscv: add forward-edge CFI landing pads
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-31 17:43:39 +02:00
Rémi Denis-Courmont
a113208ab0 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-31 17:43:37 +02:00
Martin Storsjö
df2787881f checkasm: Increase the tolerance for ac3_sum_square_butterfly_float
Increase the tolerance from 10 ulp to 11 ulp. This fixes occasional
errors for some inputs; the errors could be reproduced on
aarch64/neon builds, with "checkasm --test=ac3dsp 3446175925".

Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-24 11:27:51 +02:00
Rémi Denis-Courmont
d01914cfa0 checkasm/h264dsp: test TX bypass
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-24 00:53:52 +02:00
James Almer
4e3dc972c3 checkasm/lls: increase epsilon value for the update_lls test
Should fix failures for some seeds on x86_32.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-07-24 00:53:47 +02:00
Ramiro Polla
112cbeea83 checkasm: add tests for yuv2rgb
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-28 22:20:31 +02:00
Nuo Mi
5294f78afb checkasm/vvc_alf: ensure right and bottom boundaries are not overwritten by asm
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-27 09:03:07 +02:00
Nuo Mi
925011ad3d checkasm/vvc_alf: random select alf virtual boundaries position
A picture's virtual boundaries will split a CTU into 4 ALF blocks.
The ALF virtual boundary may cross or not cross a ALF block.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-27 09:03:07 +02:00
Nuo Mi
6406fffad7 checkasm/vvc_alf: only check the valid filter and classify sizes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-27 09:03:07 +02:00
Andreas Rheinhardt
31c95c91ab avcodec/me_cmp: Zero MECmpContext in ff_me_cmp_init()
Not every function will be set, so zero the context
to initialize everything.

This also allows to remove an initialization in dvenc.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-20 20:27:43 +02:00
Andreas Rheinhardt
3933dcd491 avcodec/me_cmp,dvenc,mpegvideo: Move ildct_cmp to its users
MECmpContext.ildct_cmp is an array of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

Remove these pointers from MECmpContext and add pointers
for the actually used functions to its users. (The DV encoder
already did so.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-20 20:27:43 +02:00
Andreas Rheinhardt
d2c568bd77 avcodec/me_cmp, mpegvideo: Move frame_skip_cmp to MpegEncContext
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

One of these other users is mpegvideo_enc; it is the only user
of MECmpContext.frame_skip_cmp and it only uses one of these
function pointers at all.

This commit therefore moves this function pointer to MpegEncContext;
and removes the array from MECmpContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-20 20:27:43 +02:00
Andreas Rheinhardt
840eb90aba avcodec/me_cmp, motion_est: Move me_(pre_)?_cmp etc. to MotionEstContext
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

One of these other users is the motion estimation API.
It uses MECmpContext.(me_pre|me|me_sub|mb)_cmp. It is
basically the only user of these arrays.

This commit therefore moves these arrays to MotionEstContext;
this has the additional advantage of making motion_est.c
more independent from MpegEncContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-20 20:27:43 +02:00
Zhao Zhili
4e1c8f1a93 tests/checkasm: Remove check on linux perf fd in uninit
The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-19 17:49:28 +02:00
Ramiro Polla
2250a5963e checkasm: add tests for {lum,chr}ConvertRange
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-17 13:37:29 +02:00
James Almer
5200d0b509 checkasm/lls: add missing random values to the test buffers
Fixes valgrind warnings after 18adaf9fe558587cb1b707c647af83015b69da48.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-13 21:50:47 +02:00
Rémi Denis-Courmont
3ea49a7f29 checkasm/lls: adjust buffer sizes and alignments
var must be padded.
param has `order + 1`, not `order` elements and is *not* over-aligned.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-11 21:52:14 +02:00
Zhao Zhili
d25c581ec4 tests/checkasm: Fix build error when enable linux perf on Android
B0 is defined by system header, see f0f596dbc6 for ref.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2024-06-11 16:59:43 +02:00