JuliaArrays · c42f · Sep 27, 2019 · Jul 19, 2018 · Sep 20, 2019
diff --git a/README.md b/README.md
@@ -5,12 +5,12 @@
 [![Build Status](https://travis-ci.org/JuliaArrays/StaticArrays.jl.svg?branch=master)](https://travis-ci.org/JuliaArrays/StaticArrays.jl)
 [![Build status](https://ci.appveyor.com/api/projects/status/xabgh1yhsjxlp30d?svg=true)](https://ci.appveyor.com/project/JuliaArrays/staticarrays-jl)
 [![Coverage Status](https://coveralls.io/repos/github/JuliaArrays/StaticArrays.jl/badge.svg?branch=master)](https://coveralls.io/github/JuliaArrays/StaticArrays.jl?branch=master)
-[![codecov.io](http://codecov.io/github/JuliaArrays/StaticArrays.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaArrays/StaticArrays.jl?branch=master)
+[![codecov.io](https://codecov.io/github/JuliaArrays/StaticArrays.jl/branch/master/graph/badge.svg)](http://codecov.io/github/JuliaArrays/StaticArrays.jl/branch/master)
 [![](https://img.shields.io/badge/docs-latest-blue.svg)](https://JuliaArrays.github.io/StaticArrays.jl/latest)
 [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaArrays.github.io/StaticArrays.jl/stable)
 
 **StaticArrays** provides a framework for implementing statically sized arrays
-in Julia (≥ 0.5), using the abstract type `StaticArray{Size,T,N} <: AbstractArray{T,N}`.
+in Julia, using the abstract type `StaticArray{Size,T,N} <: AbstractArray{T,N}`.
 Subtypes of `StaticArray` will provide fast implementations of common array and
 linear algebra operations. Note that here "statically sized" means that the
 size can be determined from the *type*, and "static" does **not** necessarily
@@ -27,40 +27,42 @@ Full documentation can be found [here](https://JuliaArrays.github.io/StaticArray
 ## Speed
 
 The speed of *small* `SVector`s, `SMatrix`s and `SArray`s is often > 10 × faster
-than `Base.Array`. See this simplified benchmark (or see the full results [here](https://github.com/andyferris/StaticArrays.jl/blob/master/perf/bench10.txt)):
+than `Base.Array`. For example, here's a
+[microbenchmark](perf/README_benchmarks.jl) showing some common operations.
 
 ```
 ============================================
     Benchmarks for 3×3 Float64 matrices
 ============================================
-
-Matrix multiplication               -> 8.2x speedup
-Matrix multiplication (mutating)    -> 3.1x speedup
-Matrix addition                     -> 45x speedup
-Matrix addition (mutating)          -> 5.1x speedup
-Matrix determinant                  -> 170x speedup
-Matrix inverse                      -> 125x speedup
-Matrix symmetric eigendecomposition -> 82x speedup
-Matrix Cholesky decomposition       -> 23.6x speedup
+Matrix multiplication               -> 5.9x speedup
+Matrix multiplication (mutating)    -> 1.8x speedup
+Matrix addition                     -> 33.1x speedup
+Matrix addition (mutating)          -> 2.5x speedup
+Matrix determinant                  -> 112.9x speedup
+Matrix inverse                      -> 67.8x speedup
+Matrix symmetric eigendecomposition -> 25.0x speedup
+Matrix Cholesky decomposition       -> 8.8x speedup
+Matrix LU decomposition             -> 6.1x speedup
+Matrix QR decomposition             -> 65.0x speedup
 ```
 
-These results improve significantly when using `julia -O3` with immutable static
-arrays, as the extra optimization results in surprisingly good SIMD code.
+These numbers were generated on an Intel i7-7700HQ using Julia-1.2. As with all
+synthetic benchmarks, the speedups you see here should only be taken as very
+roughly indicative of the speedup you may see in real code. When in doubt,
+benchmark your real application!
 
 Note that in the current implementation, working with large `StaticArray`s puts a
 lot of stress on the compiler, and becomes slower than `Base.Array` as the size
 increases.  A very rough rule of thumb is that you should consider using a
-normal `Array` for arrays larger than 100 elements. For example, the performance
-crossover point for a matrix multiply microbenchmark seems to be about 11x11 in
-julia 0.5 with default optimizations.
+normal `Array` for arrays larger than 100 elements.
 
 
 ## Quick start
 
+Add *StaticArrays* from the [Pkg REPL](https://docs.julialang.org/en/latest/stdlib/Pkg/#Getting-Started-1), i.e., `pkg> add StaticArrays`. Then:
 ```julia
-Pkg.add("StaticArrays")  # or Pkg.clone("https://github.com/JuliaArrays/StaticArrays.jl")
-using StaticArrays
 using LinearAlgebra
+using StaticArrays
 
 # Create an SVector using various forms, using constructors, functions or macros
 v1 = SVector(1, 2, 3)
@@ -104,7 +106,8 @@ rand(MMatrix{20,20}) * rand(MMatrix{20,20}) # large matrices can use BLAS
 eigen(m3) # eigen(), etc uses specialized algorithms up to 3×3, or else LAPACK
 
 # Static arrays stay statically sized, even when used by Base functions, etc:
-typeof(eigen(m3)) == Eigen{Float64,Float64,SArray{Tuple{3,3},Float64,2,9},SArray{Tuple{3},Float64,1,3}}
+typeof(eigen(m3).vectors) == SMatrix{3,3,Float64,9}
+typeof(eigen(m3).values) == SVector{3,Float64}
 
 # similar() returns a mutable container, while similar_type() returns a constructor:
 typeof(similar(m3)) == MArray{Tuple{3,3},Int64,2,9} # (final parameter is length = 9)
@@ -138,27 +141,7 @@ performance optimizations may be made when the size of the array is known to the
 compiler. One example of this is by loop unrolling, which has a substantial
 effect on small arrays and tends to automatically trigger LLVM's SIMD
 optimizations. Another way performance is boosted is by providing specialized
-methods for `det`, `inv`, `eig` and `chol` where the algorithm depends on the
+methods for `det`, `inv`, `eigen` and `cholesky` where the algorithm depends on the
 precise dimensions of the input. In combination with intelligent fallbacks to
 the methods in Base, we seek to provide a comprehensive support for statically
 sized arrays, large or small, that hopefully "just works".
-
-## Relationship to *FixedSizeArrays* and *ImmutableArrays*
-
-Several existing packages for statically sized arrays have been developed for
-Julia, noteably *FixedSizeArrays* and *ImmutableArrays* which provided signficant
-inspiration for this package. Upon consultation, it has been decided to move
-forward with *StaticArrays* which has found a new home in the *JuliaArrays*
-github organization. It is recommended that new users use this package, and
-that existing dependent packages consider switching to *StaticArrays* sometime
-during the life-cycle of Julia v0.5.
-
-You can try `using StaticArrays.FixedSizeArrays` to add some compatibility
-wrappers for the most commonly used features of the *FixedSizeArrays* package,
-such as `Vec`, `Mat`, `Point` and `@fsa`. These wrappers do not provide a
-perfect interface, but may help in trying out *StaticArrays* with pre-existing
-code.
-
-Furthermore, `using StaticArrays.ImmutableArrays` will let you use the typenames
-from the *ImmutableArrays* package, which does not include the array size as a
-type parameter (e.g. `Vector3{T}` and `Matrix3x3{T}`).
diff --git a/perf/README_benchmarks.jl b/perf/README_benchmarks.jl
@@ -0,0 +1,62 @@
+using BenchmarkTools
+using LinearAlgebra
+using StaticArrays
+
+add!(C, A, B) = (C .= A .+ B)
+
+function simple_bench(N, T=Float64)
+    A = rand(T,N,N)
+    A = A'*A
+    B = copy(A)
+    SA = SMatrix{N,N}(A)
+    MA = MMatrix{N,N}(A)
+    MB = copy(MA)
+
+    print("""
+============================================
+    Benchmarks for $N×$N $T matrices
+============================================
+""")
+    ops = [
+           ("Matrix multiplication              ", *,        (A, A),     (SA, SA)),
+           ("Matrix multiplication (mutating)   ", mul!,     (B, A, A),  (MB, MA, MA)),
+           ("Matrix addition                    ", +,        (A, A),     (SA, SA)),
+           ("Matrix addition (mutating)         ", add!,     (B, A, A),  (MB, MA, MA)),
+           ("Matrix determinant                 ", det,      (A,),       (SA,)),
+           ("Matrix inverse                     ", inv,      (A,),       (SA,)),
+           ("Matrix symmetric eigendecomposition", eigen,    (A,),       (SA,)),
+           ("Matrix Cholesky decomposition      ", cholesky, (A,),       (SA,)),
+           ("Matrix LU decomposition            ", lu,       (A,),       (SA,)),
+           ("Matrix QR decomposition            ", qr,       (A,),       (SA,)),
+    ]
+    for (name, op, Aargs, SAargs) in ops
+        # We load from Ref's here to avoid the compiler completely removing the
+        # benchmark in some cases.
+        #
+        # Like any microbenchmark, the speedups you see here should only be
+        # taken as roughly indicative of the speedup you may see in real code.
+        if length(Aargs) == 1
+            A1  = Ref(Aargs[1])
+            SA1 = Ref(SAargs[1])
+            speedup = @belapsed($op($A1[])) / @belapsed($op($SA1[]))
+        elseif length(Aargs) == 2
+            A1  = Ref(Aargs[1])
+            A2  = Ref(Aargs[2])
+            SA1 = Ref(SAargs[1])
+            SA2 = Ref(SAargs[2])
+            speedup = @belapsed($op($A1[], $A2[])) / @belapsed($op($SA1[], $SA2[]))
+        elseif length(Aargs) == 3
+            A1  = Ref(Aargs[1])
+            A2  = Ref(Aargs[2])
+            A3  = Ref(Aargs[3])
+            SA1 = Ref(SAargs[1])
+            SA2 = Ref(SAargs[2])
+            SA3 = Ref(SAargs[3])
+            speedup = @belapsed($op($A1[], $A2[], $A3[])) / @belapsed($op($SA1[], $SA2[], $SA3[]))
+        else
+        end
+        println(name*" -> $(round(speedup, digits=1))x speedup")
+    end
+end
+
+simple_bench(3)