Correct `double` return type for SDOT and Co with Apple Accelerate #65

mmuetzel · 2025-02-04T14:28:38Z

Some BLAS functions have the wrong return type in their implementation in Apple Accelerate. E.g., SDOT should return a single-precision floating point value but it returns a double-precision floating point value.
Essentially, all functions that are returning a single-precision floating point value in their standard implementation are affected.

Other implementations of BLAS (e.g., OpenBLAS) follow the standard implementation also on macOS. That makes it difficult to use FlexiBLAS to switch, e.g., between OpenBLAS and Apple Accelerate on macOS.

Would it be possible that FlexiBLAS added some shims that convert the return type for the affected functions from (non-standard) double to float?

See also the discussion here:
https://octave.discourse.group/t/macos-accelerate-framework-single-precision-issue/6140
Or:
https://fortran-lang.discourse.group/t/how-many-blas-libraries-have-this-error/4454

The text was updated successfully, but these errors were encountered:

grisuthedragon · 2025-02-05T07:46:48Z

First, I would say blame Apple for not following the rules or even using f2c (which introduced this issue) . For example for sdot the way FlexiBLAS provides the function is

float FC_GLOBAL(sdot,SDOT)(blasint* n, float* sx, blasint* incx, float* sy, blasint* incy)

and you need

double FC_GLOBAL(sdot,SDOT)(blasint* n, float* sx, blasint* incx, float* sy, blasint* incy)

Did I get this right?

Since this is an ABI change, the only way I could think of (by now), is to provide a separate flexiblas library and try to add some detection mechanism, when loading the backend, if the single precision routines do the right job.

Can you check if this is the complete list of functions:

sdot
sdsdot
snrm2

Are single precision complex function affected as well?

mmuetzel · 2025-02-05T08:12:42Z

Thank you for considering this.

I agree that this mess is Apple's fault.
I kind of understand how the Apple devs could have accidentally called f2c with the wrong flags at some point in the 70's or 80's. But I don't understand why they haven't fixed that since...

Ideally, FlexiBLAS would expose function prototypes that confirm to the reference implementation even if the underlying implementation has this defect. I.e., FlexiBLAS would continue to expose the function with

float FC_GLOBAL(sdot,SDOT)(blasint* n, float* sx, blasint* incx, float* sy, blasint* incy)

even if the underlying (defective) implementation uses the following (non-conformant) prototype:

double FC_GLOBAL(sdot,SDOT)(blasint* n, float* sx, blasint* incx, float* sy, blasint* incy)

But it would use a small shim function that casts the wrong return type (double) to the correct type (float) so that it looks like there is no issue with Apple Accelerate to programs using these functions.

So, if I understood you correctly, that would be the other way round from what you suggested.

I don't have physical access to a macOS system. But I could ask on the Octave forum thread if someone could run some tests.

If I understand the situation correctly, all functions (not subroutines) that return float are affected. From what I can tell, complex real is not affected. But I will try to have that confirmed.

Grepping for real\s*function in the sources of the reference implementation of BLAS (and dropping the hits for complex functions), I end up with the following list:

sasum
scabs1
scasum
scnrm2
sdot
sdsdot
snrm2

I wouldn't be surprised if they (the Apples devs) did the same mistake in their implementations of LAPACK functions. If that is true (I'll try to have that confirmed), that would also affect the following LAPACK functions in Apple Accelerate:

scsum1
sladiv
slangb
slange
slangt
slanhs
slansb
slansf
slansp
slanst
slansy
slantb
slantp
slantr
slanpy2
slanpy3
sla_gbrcond
sla_gbrvgrw
sla_gecond
sla_gerpvgrw
sla_porcond
sla_porpvgrw
sla_syrcond
sla_syrpvgrw

mmuetzel · 2025-02-05T08:33:09Z

Thinking about it, there is probably no reason for excluding the C* functions from that list as long as they return REAL. So, functions like CLANGB are probably also affected...

grisuthedragon · 2025-02-05T08:49:20Z

As a first though, I do not have a proper idea to implement a shim layer, which does its select on the return type, without recompiling the application. If you have any idea let me know.

The scabs1 function should be neglected, it is only for internal BLAS usage and, for example, not exposed by ATLAS or very old Accelerate frameworks(to my knowledge).

mmuetzel · 2025-02-05T08:59:51Z

I don't know how exactly FlexiBLAS forwards to the actual implementation of a function. But I'd imagine that it would grab a pointer to the function somehow.
If that is the case, a shim layer for sdot could look something like this:

float FC_GLOBAL(sdot,SDOT)(blasint* n, float* sx, blasint* incx, float* sy, blasint* incy)
{
  extern double (*sdot_ptr) (blasint*, float*, blasint*, float*, blasint*) = get_sdot_ptr ();  // I don't know how that is actually done in FlexiBLAS
  return (float) (sdot_ptr (n, sx, incx, sy, incy));
}

dasergatskov · 2025-02-05T10:47:34Z

I kind of understand how the Apple devs could have accidentally called f2c with the wrong flags at some point in the 70's or 80's. But I don't understand why they haven't fixed that since...

I suspect they are still using f2c as they do not have in-house Fortran compiler. Hopefully, flang-new is getting better and may be in a few years...

This is a very crude attempt to check whether BLAS functions that should return single-precision floating point values actually do that. Some implementations (like Apple Accelerate) don't do that. See mpimd-csc#65. This is mainly meant for early feedback.

mmuetzel · 2025-02-05T17:39:11Z

I've opened #68 for a potential implementation idea.

Would something like that work?

This is a very crude attempt to check whether BLAS functions that should return single-precision floating point values actually do that. Some implementations (like Apple Accelerate) don't do that. See mpimd-csc#65. This is mainly meant for early feedback.

grisuthedragon · 2025-02-10T10:30:56Z

Does cdotu return float _Complex or double _Complex regarding f2c this should be float _Complex via the intel abi.

mmuetzel · 2025-02-10T11:19:56Z

CDOTU returns a float _Complex type value with Apple Accelerate.
See: https://octave.discourse.group/t/macos-accelerate-framework-single-precision-issue/6140/27

That test was with gfortran using FlexiBLAS with FLEXIBLAS=APPLE though - using complex, external :: cdotu as the expected return type in the Fortran sources.

mmuetzel · 2025-02-10T11:46:31Z

(OT: Regarding the reference to f2c: I'm no longer certain that the difference in return type came to the Apple implementation via f2c. Reading f2c's manual, the -!R flag (the default) was meant to model f77.
Apparently, the DEC f77 Fortran compiler returned double precision values for REAL FUNCTION. Maybe, they modelled their implementation after that compiler (and f2c wasn't involved at all).
I don't have access to DEC f77 though. So, I can't confirm if that actually is the case.)

mmuetzel · 2025-02-10T14:26:00Z

@dasergatskov tested using CDOTU with Apple Accelerate with an example written in C++: The return value is indeed of type std::complex<float> (i.e., float complex in C99).

grisuthedragon · 2025-03-25T07:35:54Z

I am on the way building this into a new code gen, did any body test, if the LAPACK routines returning float behave like the BLAS ones. For complex float we already figured it out.

mmuetzel · 2025-03-25T10:16:01Z

Thanks for looking into this.

If I recall correctly, we didn't check that yet.
I've asked for a test here: https://octave.discourse.group/t/macos-accelerate-framework-single-precision-issue/6140/37

If you have access to a macOS device with Apple Accelerate, you might be able to test that with the example programs from that thread.

grisuthedragon · 2025-03-25T10:29:35Z

In the LAPACKE sources, there is a hint that is true for LAPACK as well.. .

See
https://github.com/Reference-LAPACK/lapack/blob/ca82e5ea5cd2eaa769ac44fe199e1dce1c15c19e/LAPACKE/include/lapack.h#L110
and following.

mmuetzel · 2025-03-25T10:41:48Z

Anything else would have been surprising. But one never knows...

@dasergatskov also confirmed that scsum1 from Apple Accelerate has the same issue. Thank you for that!

So, the test agrees with that comment in the LAPACKE header. (But I wouldn't be so general as to blame f2c without some limitations. After all, it does produce C code with the "correct" return type if it is used with the correct flags...)

mmuetzel mentioned this issue Feb 5, 2025

WIP: Handle defective implementations of float functions. #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct `double` return type for SDOT and Co with Apple Accelerate #65

Correct `double` return type for SDOT and Co with Apple Accelerate #65

mmuetzel commented Feb 4, 2025

grisuthedragon commented Feb 5, 2025

mmuetzel commented Feb 5, 2025 •

edited

Loading

mmuetzel commented Feb 5, 2025

grisuthedragon commented Feb 5, 2025

mmuetzel commented Feb 5, 2025 •

edited

Loading

dasergatskov commented Feb 5, 2025 •

edited

Loading

mmuetzel commented Feb 5, 2025

grisuthedragon commented Feb 10, 2025

mmuetzel commented Feb 10, 2025 •

edited

Loading

mmuetzel commented Feb 10, 2025

mmuetzel commented Feb 10, 2025

grisuthedragon commented Mar 25, 2025

mmuetzel commented Mar 25, 2025

grisuthedragon commented Mar 25, 2025

mmuetzel commented Mar 25, 2025 •

edited

Loading

Correct double return type for SDOT and Co with Apple Accelerate #65

Correct double return type for SDOT and Co with Apple Accelerate #65

Comments

mmuetzel commented Feb 4, 2025

grisuthedragon commented Feb 5, 2025

mmuetzel commented Feb 5, 2025 • edited Loading

mmuetzel commented Feb 5, 2025

grisuthedragon commented Feb 5, 2025

mmuetzel commented Feb 5, 2025 • edited Loading

dasergatskov commented Feb 5, 2025 • edited Loading

mmuetzel commented Feb 5, 2025

grisuthedragon commented Feb 10, 2025

mmuetzel commented Feb 10, 2025 • edited Loading

mmuetzel commented Feb 10, 2025

mmuetzel commented Feb 10, 2025

grisuthedragon commented Mar 25, 2025

mmuetzel commented Mar 25, 2025

grisuthedragon commented Mar 25, 2025

mmuetzel commented Mar 25, 2025 • edited Loading

Correct `double` return type for SDOT and Co with Apple Accelerate #65

Correct `double` return type for SDOT and Co with Apple Accelerate #65

mmuetzel commented Feb 5, 2025 •

edited

Loading

mmuetzel commented Feb 5, 2025 •

edited

Loading

dasergatskov commented Feb 5, 2025 •

edited

Loading

mmuetzel commented Feb 10, 2025 •

edited

Loading

mmuetzel commented Mar 25, 2025 •

edited

Loading