Skip to content

debug/elf: add SHT_GNU_VERDEF section parsing #63952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benbaker76 opened this issue Nov 5, 2023 · 84 comments
Closed

debug/elf: add SHT_GNU_VERDEF section parsing #63952

benbaker76 opened this issue Nov 5, 2023 · 84 comments

Comments

@benbaker76
Copy link
Contributor

I'm porting @brendangregg's eBPF profiling application profile.py to Golang. I'm attempting to use the debug/elf package for retrieving debug symbol info but have stumbled upon a limitation.

As a demonstration I've picked a library called libtiffxx.so as it's relatively small and clearly demontrates the issues I have with the package.

Let's take a look at the .gnu.version* sections contained in the library:

$ readelf -SW /usr/lib/x86_64-linux-gnu/libtiffxx.so | grep .gnu.version
  [ 6] .gnu.version      VERSYM          0000000000000882 000882 000036 02   A  4   0  2
  [ 7] .gnu.version_d    VERDEF          00000000000008b8 0008b8 000038 00   A  5   2  8
  [ 8] .gnu.version_r    VERNEED         00000000000008f0 0008f0 000090 00   A  5   3  8

There's important debug symbol information contained in the .gnu.version_d (SHT_GNU_VERDEF) section which is currently not parsed by the package. Let's look at the last four functions in the .dynsym section.

$ readelf -sW /usr/lib/x86_64-linux-gnu/libtiffxx.so | tail -4
    23: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3)
    24: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBTIFFXX_4.0
    25: 0000000000001890   169 FUNC    GLOBAL DEFAULT   15 _Z14TIFFStreamOpenPKcPSo@@LIBTIFFXX_4.0
    26: 0000000000001940    19 FUNC    GLOBAL DEFAULT   15 _Z14TIFFStreamOpenPKcPSi@@LIBTIFFXX_4.0

The debug/elf package can retreive the function name, but what about the @ or @@, the library name after it or even the number in the brackets? In my proposal and my current implementation I provide a way to access this important debug symbol information.

If we take a look at the dynamic symbol output of these four functions in the current implementation of debug/elf we have the following output:

		Symbol{
			Name:     "_ZNSt8ios_base4InitD1Ev",
			Info:     0x12,
			Other:    0x0,
			Section:  0x0,
			Value:    0x0,
			Size:     0x0,
			Version:  "GLIBCXX_3.4",
			Library:  "libstdc++.so.6",
		},
		Symbol{
			Name:     "LIBTIFFXX_4.0",
			Info:     0x11,
			Other:    0x0,
			Section:  0xFFF1,
			Value:    0x0,
			Size:     0x0,
		},
		Symbol{
			Name:     "_Z14TIFFStreamOpenPKcPSo",
			Info:     0x12,
			Other:    0x0,
			Section:  0xF,
			Value:    0x1890,
			Size:     0xA9,
		},
		Symbol{
			Name:     "_Z14TIFFStreamOpenPKcPSi",
			Info:     0x12,
			Other:    0x0,
			Section:  0xF,
			Value:    0x1940,
			Size:     0x13,
		},

In my new implemenation we get this output:

		Symbol{
			Name:     "_ZNSt8ios_base4InitD1Ev",
			Info:     0x12,
			Other:    0x0,
			Section:  0x0,
			Value:    0x0,
			Size:     0x0,
			Version:  "GLIBCXX_3.4",
			Library:  "libstdc++.so.6",
			VerInfo:  0x1,
			VerOther: 0x3,
		},
		Symbol{
			Name:     "LIBTIFFXX_4.0",
			Info:     0x11,
			Other:    0x0,
			Section:  0xFFF1,
			Value:    0x0,
			Size:     0x0,
			Version:  "LIBTIFFXX_4.0",
			Library:  "",
			VerInfo:  0x2,
			VerOther: 0x1,
		},
		Symbol{
			Name:     "_Z14TIFFStreamOpenPKcPSo",
			Info:     0x12,
			Other:    0x0,
			Section:  0xF,
			Value:    0x1890,
			Size:     0xA9,
			Version:  "LIBTIFFXX_4.0",
			Library:  "",
			VerInfo:  0x2,
			VerOther: 0x1,
		},
		Symbol{
			Name:     "_Z14TIFFStreamOpenPKcPSi",
			Info:     0x12,
			Other:    0x0,
			Section:  0xF,
			Value:    0x1940,
			Size:     0x13,
			Version:  "LIBTIFFXX_4.0",
			Library:  "",
			VerInfo:  0x2,
			VerOther: 0x1,
		},

We now have the Version entry as is often retrieved from SHT_GNU_VERNEED but in the last three entries can only be obtained from the SHT_GNU_VERDEF section. We also have two new Symbol members called VerInfo and VerOther.

So what are these new values and where do they come from? Using binutil's readelf.c as a reference we can find out where this data is located and how to extract it.

VerInfo in this case refers to the number in the brackets. In the case of _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3) the number 3 comes from VerOther which is the version number. If it's 1 we don't display it in the brackets. The single @ comes from VerInfo which can be one of three values:

// Versioned symbol info
const (
	SymUndefined byte = 0
	SymHidden    byte = 1
	SymPublic    byte = 2
)

If the value is SymHidden it will be @ and if it's SymPublic it will be @@. This is taken from readelf.c

So I have implemented the code to do the necessary parsing of the SHT_GNU_VERDEF section and am writing this proposal so I can submit my changes to Gerrit for review by the Golang team.

Currently I have made changes to the following files in src/debug/elf:

  • Change file_test.go
  • Change file.go
  • Change symbols_test.go
  • Added testdata/libtiffxx.so (should be renamed?)

As I mentioned I use libtiffxx.so as my testdata ELF binary because it's relatively small (14.4 kB) and it demonstrates the new data from which my new branch can parse.

My changes currently pass all tests except for the API check

##### API check
+pkg debug/elf, const SymHidden = 1
+pkg debug/elf, const SymHidden uint8
+pkg debug/elf, const SymPublic = 2
+pkg debug/elf, const SymPublic uint8
+pkg debug/elf, const SymUndefined = 0
+pkg debug/elf, const SymUndefined uint8
+pkg debug/elf, type ImportedSymbol struct, VerInfo uint8
+pkg debug/elf, type ImportedSymbol struct, VerOther uint8
+pkg debug/elf, type Symbol struct, VerInfo uint8
+pkg debug/elf, type Symbol struct, VerOther uint8
--- FAIL: TestCheck (26.62s)
    main_test.go:184: API differences found

Thanks for taking the time to review my proposal.

@gopherbot gopherbot added this to the Proposal milestone Nov 5, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Nov 15, 2023
@rsc rsc changed the title proposal: debug/elf: Add SHT_GNU_VERDEF section parsing support proposal: debug/elf: add SHT_GNU_VERDEF section parsing Dec 4, 2023
@rsc rsc moved this from Incoming to Active in Proposals Dec 4, 2023
@rsc
Copy link
Contributor

rsc commented Dec 4, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Dec 6, 2023

This API diff is nice and small and seems reasonable for the benefit:

##### API check
+pkg debug/elf, const SymHidden = 1
+pkg debug/elf, const SymHidden uint8
+pkg debug/elf, const SymPublic = 2
+pkg debug/elf, const SymPublic uint8
+pkg debug/elf, const SymUndefined = 0
+pkg debug/elf, const SymUndefined uint8
+pkg debug/elf, type ImportedSymbol struct, VerInfo uint8
+pkg debug/elf, type ImportedSymbol struct, VerOther uint8
+pkg debug/elf, type Symbol struct, VerInfo uint8
+pkg debug/elf, type Symbol struct, VerOther uint8

Are there any remaining concerns about this proposal?

@cherrymui
Copy link
Member

Could you explain what the meanings of VerInfo and VerOther fields are? Thanks.

@benbaker76
Copy link
Contributor Author

They come from readelf.c

If we take a look at some output from it:

$ readelf -sW /usr/lib/x86_64-linux-gnu/libtiffxx.so | tail -4
    23: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3)
    24: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBTIFFXX_4.0
    25: 0000000000001890   169 FUNC    GLOBAL DEFAULT   15 _Z14TIFFStreamOpenPKcPSo@@LIBTIFFXX_4.0
    26: 0000000000001940    19 FUNC    GLOBAL DEFAULT   15 _Z14TIFFStreamOpenPKcPSi@@LIBTIFFXX_4.0

VerInfo is synonymous with sym_info and VerOther with vna_other.

// Versioned symbol info
const (
	SymUndefined byte = 0
	SymHidden    byte = 1
	SymPublic    byte = 2
)

Comes from:

https://github.com/bminor/binutils-gdb/blob/master/binutils/readelf.c#L363

/* Versioned symbol info.  */
enum versioned_symbol_info
{
  symbol_undefined,
  symbol_hidden,
  symbol_public
};

https://github.com/bminor/binutils-gdb/blob/master/binutils/readelf.c#L13764

  if (version_string)
    {
      if (sym_info == symbol_undefined)
	printf ("@%s (%d)", version_string, vna_other);
      else
	printf (sym_info == symbol_hidden ? "@%s" : "@@%s",
		version_string);
    }

In the case of _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3) vna_other is 3 and sym_info == symbol_hidden.

So in the actual implementation in go/src/debug/elf/file.go we have:

// Versioned symbol info
const (
	SymUndefined byte = 0
	SymHidden    byte = 1
	SymPublic    byte = 2
)

type verdef struct {
	Other uint16
	Name  string
}

type verneed struct {
	File  string
	Other uint16
	Name  string
}

// gnuVersionInit parses the GNU version tables
// for use by calls to gnuVersion.
func (f *File) gnuVersionInit(str []byte) bool {
	// Accumulate verdef information.
	var def []verdef
	vd := f.SectionByType(SHT_GNU_VERDEF)
	if vd != nil {
		d, _ := vd.Data()

		i := 0
		for {
			if i+20 > len(d) {
				break
			}
			other := f.ByteOrder.Uint16(d[i : i+2])
			//flags := f.ByteOrder.Uint16(d[i+2 : i+4])
			ndx := int(f.ByteOrder.Uint16(d[i+4 : i+6]))
			cnt := f.ByteOrder.Uint16(d[i+6 : i+8])
			// hash := f.ByteOrder.Uint32(d[i+8 : i+12])
			aux := f.ByteOrder.Uint32(d[i+12 : i+16])
			next := f.ByteOrder.Uint32(d[i+16 : i+20])

			var name string
			j := i + int(aux)
			for c := 0; c < int(cnt); c++ {
				if j+8 > len(d) {
					break
				}
				vname := f.ByteOrder.Uint32(d[j : j+4])
				vnext := f.ByteOrder.Uint32(d[j+4 : j+8])
				name, _ = getString(str, int(vname))

				if c == 0 {
					if ndx >= len(def) {
						a := make([]verdef, ndx)
						copy(a, def)
						def = a
					}

					def[ndx-1] = verdef{other, name}
				}

				j += int(vnext)
			}

			if next == 0 {
				break
			}
			i += int(next)
		}
	}

	// Accumulate verneed information.
	var need []verneed
	vn := f.SectionByType(SHT_GNU_VERNEED)
	if vn != nil {
		d, _ := vn.Data()

		i := 0
		for {
			if i+16 > len(d) {
				break
			}
			vers := f.ByteOrder.Uint16(d[i : i+2])
			if vers != 1 {
				break
			}
			cnt := f.ByteOrder.Uint16(d[i+2 : i+4])
			fileoff := f.ByteOrder.Uint32(d[i+4 : i+8])
			aux := f.ByteOrder.Uint32(d[i+8 : i+12])
			next := f.ByteOrder.Uint32(d[i+12 : i+16])
			file, _ := getString(str, int(fileoff))

			var name string
			j := i + int(aux)
			for c := 0; c < int(cnt); c++ {
				if j+16 > len(d) {
					break
				}
				// hash := f.ByteOrder.Uint32(d[j : j+4])
				// flags := f.ByteOrder.Uint16(d[j+4 : j+6])
				other := f.ByteOrder.Uint16(d[j+6 : j+8])
				nameoff := f.ByteOrder.Uint32(d[j+8 : j+12])
				next := f.ByteOrder.Uint32(d[j+12 : j+16])
				name, _ = getString(str, int(nameoff))
				ndx := int(other)

				if ndx >= len(need) {
					a := make([]verneed, 2*(ndx+1))
					copy(a, need)
					need = a
				}

				need[ndx] = verneed{file, other, name}
				if next == 0 {
					break
				}
				j += int(next)
			}

			if next == 0 {
				break
			}
			i += int(next)
		}
	}

	// Versym parallels symbol table, indexing into verneed.
	vs := f.SectionByType(SHT_GNU_VERSYM)
	if vs == nil {
		return false
	}
	d, _ := vs.Data()

	f.gnuVerdef = def
	f.gnuNeed = need
	f.gnuVersym = d
	return true
}

// gnuVersion adds Library and Version information to sym,
// which came from offset i of the symbol table.
func (f *File) gnuVersion(i int) (library string, version string, info byte, other byte) {
	// Each entry is two bytes; skip undef entry at beginning.
	i = (i + 1) * 2
	if i >= len(f.gnuVersym) {
		return
	}
	s := f.gnuVersym[i:]
	if len(s) < 2 {
		return
	}
	j := int(f.ByteOrder.Uint16(s))
	if j >= 2 && j < len(f.gnuNeed) {
		n := &f.gnuNeed[j]
		if n.File != "" {
			return n.File, n.Name, SymHidden, byte(n.Other)
		}
	}

	var verInfo byte = SymUndefined

	if j&0x8000 != 0 {
		verInfo = SymHidden
	} else {
		verInfo = SymPublic
	}

	j = (j & 0x7fff) - 1
	if j >= 1 && j < len(f.gnuVerdef) {
		v := &f.gnuVerdef[j]
		return "", v.Name, verInfo, byte(v.Other)
	}

	return
}

The 0x8000 and 0x7fff from above are from:

https://github.com/bminor/binutils-gdb/blob/master/include/elf/common.h#L1306

/* This flag appears in a Versym structure.  It means that the symbol
   is hidden, and is only visible with an explicit version number.
   This is a GNU extension.  */

#define VERSYM_HIDDEN		0x8000

/* This is the mask for the rest of the Versym information.  */

#define VERSYM_VERSION		0x7fff

@rsc
Copy link
Contributor

rsc commented Dec 13, 2023

It does seem strange for Sym* constants to go into a VerInfo field. Perhaps the names should be made closer?

@benbaker76
Copy link
Contributor Author

benbaker76 commented Dec 14, 2023

It does seem strange for Sym* constants to go into a VerInfo field. Perhaps the names should be made closer?

readelf.c calls the enum versioned_symbol_info. I did consider prefixing with VerSym* but it seemed overly verbose. For example VerSymUndefined. The readelf.c enums are prefixed with symbol_* so it was in reference to that.

@benbaker76
Copy link
Contributor Author

// Versioned symbol info
const (
	VerSymUndefined byte = 0
	VerSymHidden    byte = 1
	VerSymPublic    byte = 2
)

@rsc would you prefer something like this?

@ianlancetaylor
Copy link
Member

I know you know all this, but just to reprise:

There are three sections that contain version information: SHT_GNU_VERSYM (.gnu.version), SHT_GNU_VERDEF (.gnu.version_d), and SHT_GNU_VERNEED (.gnu.version_r).

The SHT_GNU_VERSYM section contains a 16-bit value for each symbol in the symbol table. The high bit (VERSYM_HIDDEN) is set if the symbol has hidden visibility, meaning that some other executable or dynamic library can only refer to it using an explicit version number. The other 15 bits may be 0 (VER_NDX_LOCAL), meaning that the symbol is not visible outside the object, or 1 (VER_NDX_GLOBAL), meaning that the symbol has no version, or some larger value which is an index into the list of versions.

The SHT_GNU_VERDEF section defines versions. The sh_info field of the section header is the number of versions. Each version has flags, an index, a name, and a list of versions that it depends on.

The SHT_GNU_VERNEED section defines versions that are required from other dynamic objects. The sh_info field of the section header is the number of entries. Each entry refers to some other dynamic object, and has a list of requirements. Each requirement is a flag and a version name.

The flags in SHT_GNU_VERDEF and SHT_GNU_VERNEED sections are VER_FLG_BASE for the base revision, and VER_FLG_WEAK and VER_FLG_INFO which are pretty esoteric.

Anyhow, the point is, the part of the version information that is symbol-specific is the SHT_GNU_VERSYM section. We've already locked ourselves into representing that as the Version and Library fields of Symbol. We currently ignore the symbol-specific VERSYM_HIDDEN flag, and we should fix that. But I don't see a reason to express the other version information as fields of Symbol. In particular a VerOther field doesn't make sense to me; that's a property of a version, not a symbol, and I don't see a reason to duplicate it across all the symbols.

So how about:

Add a VersionIndex int32 field to Symbol set from the entry in the SHT_GNU_VERSYM section. This will be set by DynamicSymbols. If there is no SHT_GNU_VERSYM section, the field will be set to -1 for all symbols. On a 64-bit system we can squeeze this field in without increasing the size of Symbol, by putting after Other.

Add a (*File).DynamicVersions method that returns version information. It will return []DynamicVersion.

// DynamicVersion is a version defined by a dynamic object.
type DynamicVersion struct {
    Version uint16 // Version of data structure.
    Flags DynamicVersionFlags
    Index uint16
    Name string
    Deps []string
}

We define flags for the DynamicVersion Flags field.

type DynamicVersionFlags uint16
const (
    VER_FLG_BASE DynamicVersionFlags = 1
    VER_FLG_WEAK DynamicVersionFlags = 2
    VER_FLG_INFO DynamicVersionFlags = 4
)

Add a (*File).DynamicVersionNeeds method that returns dependencies. It will return []DynamicVersionNeed.

type DynamicVersionNeed struct {
    Version uint16 // version of data structure
    Name string // shared library  name
    Needs []DynamicVersionDep
}

type DynamicVersionDep struct {
    Flags DynamicVersionFlags
    Other uint16
    Dep string // Name of required version
}

I think this approach will let us extract all the information from the dynamic object, and let people write tools to do whatever they want.

@benbaker76
Copy link
Contributor Author

benbaker76 commented Feb 9, 2024

@ianlancetaylor thanks for your feedback and additional info. While your approach does appear to work I don't see the need to separate the version data from Symbol and ImportedSymbol structs. My proposal adds two additional bytes and re-uses Version and Library members without affecting any existing data.

Screenshot from 2024-02-09 15-22-56

Screenshot from 2024-02-09 15-24-21

To be sure I wrote a test which will compare the output of readelf -sW with all the libraries located in /usr/lib/x86_64-linux-gnu of Ubuntu 23.10.

//go:build linux

package main

import (
	"bufio"
	"debug/elf"
	"fmt"
	"os"
	"os/exec"
	"strconv"
	"strings"
	"testing"
)

type SymbolInfo struct {
	Num   int
	Value string
	Size  string
	Type  string
	Bind  string
	Vis   string
	Ndx   string
	Name  string
}

func TestSymbols(t *testing.T) {
	folderPath := "/usr/lib/x86_64-linux-gnu"
	searchPattern := ".so"

	files, err := searchFiles(folderPath, searchPattern)
	if err != nil {
		t.Errorf("Error %v", err)
		return
	}

	t.Logf("Found %d files", len(files))

	for _, file := range files {
		symbols, err := parseSymbols(file)
		if err != nil {
			continue
		}

		f, err := elf.Open(file)

		if err != nil {
			continue
		}

		var checkSymbols = make(map[string][]elf.Symbol)

		checkSymbols[".dynsym"], _ = f.DynamicSymbols()
		checkSymbols[".symtab"], _ = f.Symbols()

		for tableName, tableSymbols := range symbols {
			if len(tableSymbols) != len(checkSymbols[tableName]) {
				t.Errorf("expected %d got %d\n", len(tableSymbols), len(checkSymbols[tableName]))
			}

			for index, symbol := range tableSymbols {
				symbolCheck := checkSymbols[tableName][index]

				value := fmt.Sprintf("%016x", symbolCheck.Value)
				size := fmt.Sprintf("%d", symbolCheck.Size)
				if strings.HasPrefix(symbol.Size, "0x") {
					size = fmt.Sprintf("0x%x", symbolCheck.Size)
				}
				type_ := parseType(int(symbolCheck.Info & 0xf))
				bind := parseBind(int(symbolCheck.Info >> 4))
				vis := parseVisibility(int(symbolCheck.Other))
				ndx := parseNdx(int(symbolCheck.Section))
				name := getSymbolName(symbolCheck, AddLibrary, AddVersion)

				if value != symbol.Value {
					t.Errorf("expected %v got %v\n", symbol.Value, value)
				}
				if size != symbol.Size {
					t.Errorf("expected %x got %x\n", symbol.Size, size)
				}
				if type_ != symbol.Type {
					t.Errorf("expected %v got %v\n", symbol.Type, type_)
				}
				if bind != symbol.Bind {
					t.Errorf("expected %x got %v\n", symbol.Bind, bind)
				}
				if vis != symbol.Vis {
					t.Errorf("expected %v got %v\n", symbol.Vis, vis)
				}
				if ndx != symbol.Ndx {
					t.Errorf("expected %v got %v\n", symbol.Ndx, ndx)
				}
				if name != symbol.Name {
					t.Errorf("expected %v got %v\n", symbol.Name, name)
				}
			}
		}
	}
}

func searchFiles(folderPath, searchPattern string) ([]string, error) {
	var files []string

	file, err := os.Open(folderPath)
	if err != nil {
		return nil, err
	}
	defer file.Close()

	fileInfos, err := file.Readdir(-1)
	if err != nil {
		return nil, err
	}

	for _, fileInfo := range fileInfos {
		if !fileInfo.IsDir() && strings.Contains(fileInfo.Name(), searchPattern) {
			files = append(files, folderPath+"/"+fileInfo.Name())
		}
	}

	return files, nil
}

func parseSymbols(filePath string) (map[string][]SymbolInfo, error) {
	cmd := exec.Command("readelf", "-sW", filePath)
	output, err := cmd.CombinedOutput()
	if err != nil {
		return nil, err
	}

	scanner := bufio.NewScanner(strings.NewReader(string(output)))
	symbols := make(map[string][]SymbolInfo)
	var tableName string

	for scanner.Scan() {
		line := scanner.Text()

		if strings.HasPrefix(line, "Symbol table") {
			startPos := strings.Index(line, "'")
			endPos := strings.LastIndex(line, "'")
			tableName = line[startPos+1 : endPos]
			continue
		}

		if strings.HasPrefix(line, "   Num:") {
			scanner.Scan()
			continue
		}

		fields := strings.Fields(line)
		if len(fields) < 7 {
			continue
		}

		name := ""

		if len(fields) == 8 {
			name = fields[7]
		}

		symbol := SymbolInfo{
			Num:   len(symbols[tableName]),
			Value: fields[1],
			Size:  fields[2],
			Type:  fields[3],
			Bind:  fields[4],
			Vis:   fields[5],
			Ndx:   fields[6],
			Name:  name,
		}

		if len(fields) == 9 {
			symbol.Name = fields[7] + " " + fields[8]
		}

		symbols[tableName] = append(symbols[tableName], symbol)
	}

	return symbols, nil
}

func parseType(typeInt int) string {
	typeTable := map[int]string{
		0:  "NOTYPE",
		1:  "OBJECT",
		2:  "FUNC",
		3:  "SECTION",
		4:  "FILE",
		5:  "COMMON",
		6:  "TLS",
		10: "IFUNC",
		13: "LOPROC",
		15: "HIPROC",
	}

	return typeTable[typeInt]
}

func parseBind(bindInt int) string {
	bindTable := map[int]string{
		0:  "LOCAL",
		1:  "GLOBAL",
		2:  "WEAK",
		10: "UNIQUE",
		11: "UNIQUE",
		12: "UNIQUE",
	}

	return bindTable[bindInt]
}

func parseVisibility(visInt int) string {
	visTable := map[int]string{
		0: "DEFAULT",
		1: "INTERNAL",
		2: "HIDDEN",
		3: "PROTECTED",
	}

	return visTable[visInt]
}

func parseNdx(ndxInt int) string {
	ndxTable := map[int]string{
		0:      "UND",
		0xff00: "BEFORE",
		0xff01: "AFTER",
		0xff1f: "HIPROC",
		0xfff1: "ABS",
		0xfff2: "COMMON",
		0xffff: "HIRESERVE",
	}

	if val, ok := ndxTable[ndxInt]; ok {
		return val
	}

	return strconv.FormatInt(int64(ndxInt), 10)
}
$ go test ./cmd/profile -v
=== RUN   TestSymbols
    main_test.go:37: Found 3979 files
--- PASS: TestSymbols (40.48s)
PASS
ok      github.com/benbaker76/libbpf-tools/cmd/profile  40.488s

@ianlancetaylor
Copy link
Member

@benbaker76 I don't doubt that your approach works for your purposes. But your approach doesn't provide any mechanism for reading the other version data, like which versions requires which other versions. And I think it essentially duplicates the same information across a bunch of different Symbol structures, which seems more confusing than necessary.

@benbaker76
Copy link
Contributor Author

@ianlancetaylor I think you've hit the nail on the head when you said "We've already locked ourselves into representing that as the Version and Library fields of Symbol.". I was trying my best to work the changes into the current implementation.

I'm not entirely sure what particular version info is missing in my changes apart from the flags in SHT_GNU_VERDEF (which is also currently ignored when parsing SHT_GNU_VERNEED). As you mention; these give us VER_FLG_BASE and VER_FLG_WEAK and from what I've read of VER_FLG_WEAK it is a "version definition that has no global interface symbols associated with it" (reference).

But as you suggest we should include these flags and I would do that by adding another byte called VerFlags.

// Versioned symbol info
const (
	VerInfoUndefined byte = 0
	VerInfoHidden    byte = 1
	VerInfoPublic    byte = 2
)

// Version dependency specific info
const (
	VerFlagBase byte = 1
	VerFlagWeak byte = 2
	VerFlagInfo    byte = 4
)

VerOther (vna_other) is "the version index assigned to this dependency version." so must remain as it's own integer (although it looks like it should be 16-bits in size not 8).

So those are the changes I'd make but happy to go with your method as they do make better sense going forward especially if we were designing the parser from scratch and didn't have to worry about backward compatibility. I'd expect this isn't a heavily traveled part of the standard library anyway.

@benbaker76
Copy link
Contributor Author

@ianlancetaylor I'll hopefully get some time over the weekend to implement your suggestions.

@benbaker76
Copy link
Contributor Author

@ianlancetaylor here's my implementation based on your suggestions.

// A File represents an open ELF file.
type File struct {
	FileHeader
	Sections    []*Section
	Progs       []*Prog
	closer      io.Closer
	dynVers     []DynamicVersion
	dynVerNeeds []DynamicVersionNeed
	gnuVersym   []byte
}

...

// A Symbol represents an entry in an ELF symbol table section
type Symbol struct {
	Name         string
	Info, Other  byte
	VersionIndex int32
	Section      SectionIndex
	Value, Size  uint64

	// Version and Library are present only for the dynamic symbol
	// table.
	Version string
	Library string
}

...

const (
	VER_NDX_LOCAL  int32 = 0      // Symbol has local scope
	VER_NDX_GLOBAL int32 = 1      // Symbol has global scope
	VERSYM_VERSION int32 = 0x7fff // Version index mask
	VERSYM_HIDDEN  int32 = 0x8000 // Symbol is hidden
)

type DynamicVersionFlags uint16

const (
	VER_FLG_BASE DynamicVersionFlags = 1 // Version definition of the file
	VER_FLG_WEAK DynamicVersionFlags = 2 // Weak version identifier
	VER_FLG_INFO DynamicVersionFlags = 4 // Reference exists for informational purposes
)

// DynamicVersion is a version defined by a dynamic object
type DynamicVersion struct {
	Version uint16 // Version of data structure
	Flags   DynamicVersionFlags
	Index   uint16   // Version index
	Deps    []string // Dependencies
}

type DynamicVersionNeed struct {
	Version uint16              // Version of data structure
	Name    string              // Shared library name
	Needs   []DynamicVersionDep // Dependencies
}

type DynamicVersionDep struct {
	Flags DynamicVersionFlags
	Other uint16 // Version index
	Dep   string // Name of required version
}

// dynamicVersions returns version information for a dynamic object
func (f *File) dynamicVersions(str []byte) bool {
	if f.dynVers != nil {
		// Already initialized
		return true
	}

	// Accumulate verdef information.
	vd := f.SectionByType(SHT_GNU_VERDEF)
	if vd == nil {
		return false
	}
	d, _ := vd.Data()

	var dynVers []DynamicVersion
	i := 0
	for {
		if i+20 > len(d) {
			break
		}
		version := f.ByteOrder.Uint16(d[i : i+2])
		flags := DynamicVersionFlags(f.ByteOrder.Uint16(d[i+2 : i+4]))
		ndx := f.ByteOrder.Uint16(d[i+4 : i+6])
		cnt := f.ByteOrder.Uint16(d[i+6 : i+8])
		aux := f.ByteOrder.Uint32(d[i+12 : i+16])
		next := f.ByteOrder.Uint32(d[i+16 : i+20])

		var depName string
		var deps []string
		j := i + int(aux)
		for c := 0; c < int(cnt); c++ {
			if j+8 > len(d) {
				break
			}
			vname := f.ByteOrder.Uint32(d[j : j+4])
			vnext := f.ByteOrder.Uint32(d[j+4 : j+8])
			depName, _ = getString(str, int(vname))

			deps = append(deps, depName)

			j += int(vnext)
		}

		dynVers = append(dynVers, DynamicVersion{
			Version: version,
			Flags:   flags,
			Index:   ndx,
			Deps:    deps,
		})

		if next == 0 {
			break
		}
		i += int(next)
	}

	f.dynVers = dynVers

	return true
}

// DynamicVersions returns version information for a dynamic object
func (f *File) DynamicVersions() (*[]DynamicVersion, error) {
	if f.dynVers == nil {
		return nil, errors.New("DynamicVersions: not initialized")
	}

	return &f.dynVers, nil
}

// dynamicVersionNeeds returns version dependencies for a dynamic object
func (f *File) dynamicVersionNeeds(str []byte) bool {
	if f.dynVerNeeds != nil {
		// Already initialized
		return true
	}

	// Accumulate verneed information.
	vn := f.SectionByType(SHT_GNU_VERNEED)
	if vn == nil {
		return false
	}
	d, _ := vn.Data()

	var dynVerNeeds []DynamicVersionNeed
	i := 0
	for {
		if i+16 > len(d) {
			break
		}
		vers := f.ByteOrder.Uint16(d[i : i+2])
		if vers != 1 {
			break
		}
		cnt := f.ByteOrder.Uint16(d[i+2 : i+4])
		fileoff := f.ByteOrder.Uint32(d[i+4 : i+8])
		aux := f.ByteOrder.Uint32(d[i+8 : i+12])
		next := f.ByteOrder.Uint32(d[i+12 : i+16])
		file, _ := getString(str, int(fileoff))

		var deps []DynamicVersionDep
		j := i + int(aux)
		for c := 0; c < int(cnt); c++ {
			if j+16 > len(d) {
				break
			}
			flags := DynamicVersionFlags(f.ByteOrder.Uint16(d[j+4 : j+6]))
			other := f.ByteOrder.Uint16(d[j+6 : j+8])
			nameoff := f.ByteOrder.Uint32(d[j+8 : j+12])
			next := f.ByteOrder.Uint32(d[j+12 : j+16])
			depName, _ := getString(str, int(nameoff))

			deps = append(deps, DynamicVersionDep{
				Flags: flags,
				Other: other,
				Dep:   depName,
			})

			if next == 0 {
				break
			}
			j += int(next)
		}

		dynVerNeeds = append(dynVerNeeds, DynamicVersionNeed{
			Version: vers,
			Name:    file,
			Needs:   deps,
		})

		if next == 0 {
			break
		}
		i += int(next)
	}

	f.dynVerNeeds = dynVerNeeds

	return true
}

// DynamicVersionNeeds returns version dependencies for a dynamic object
func (f *File) DynamicVersionNeeds() (*[]DynamicVersionNeed, error) {
	if f.dynVerNeeds == nil {
		return nil, errors.New("DynamicVersionNeeds: not initialized")
	}

	return &f.dynVerNeeds, nil
}

// gnuVersionInit parses the GNU version tables
// for use by calls to gnuVersion.
func (f *File) gnuVersionInit(str []byte) bool {
	// Versym parallels symbol table, indexing into verneed.
	vs := f.SectionByType(SHT_GNU_VERSYM)
	if vs == nil {
		return false
	}
	d, _ := vs.Data()

	f.dynamicVersions(str)
	f.dynamicVersionNeeds(str)
	f.gnuVersym = d
	return true
}

// gnuVersion adds Library and Version information to sym,
// which came from offset i of the symbol table.
func (f *File) gnuVersion(i int) (library string, version string, versionIndex int32) {
	// Each entry is two bytes; skip undef entry at beginning.
	i = (i + 1) * 2
	if i >= len(f.gnuVersym) {
		return "", "", -1
	}
	s := f.gnuVersym[i:]
	if len(s) < 2 {
		return "", "", -1
	}
	j := int32(f.ByteOrder.Uint16(s))
	var ndx = (j & VERSYM_VERSION)

	if ndx == VER_NDX_LOCAL || ndx == VER_NDX_GLOBAL {
		return "", "", j
	}

	for _, v := range f.dynVerNeeds {
		for _, n := range v.Needs {
			if n.Other == uint16(ndx) {
				return v.Name, n.Dep, j | VERSYM_HIDDEN
			}
		}
	}

	for _, v := range f.dynVers {
		if v.Index == uint16(ndx) {
			if len(v.Deps) > 0 {
				return "", v.Deps[0], j
			}
		}
	}

	return "", "", -1
}

@rsc
Copy link
Contributor

rsc commented Feb 28, 2024

@benbaker76 can you post the full API diff for the revised version? (Run all.bash and the API check should fail, like in #63952 (comment); that's the output I'm asking for.)

@benbaker76
Copy link
Contributor Author

Thanks @rsc here's the API diff

##### API check
+pkg debug/elf, const VERSYM_HIDDEN = 32768
+pkg debug/elf, const VERSYM_HIDDEN int32
+pkg debug/elf, const VERSYM_VERSION = 32767
+pkg debug/elf, const VERSYM_VERSION int32
+pkg debug/elf, const VER_FLG_BASE = 1
+pkg debug/elf, const VER_FLG_BASE DynamicVersionFlags
+pkg debug/elf, const VER_FLG_INFO = 4
+pkg debug/elf, const VER_FLG_INFO DynamicVersionFlags
+pkg debug/elf, const VER_FLG_WEAK = 2
+pkg debug/elf, const VER_FLG_WEAK DynamicVersionFlags
+pkg debug/elf, const VER_NDX_GLOBAL = 1
+pkg debug/elf, const VER_NDX_GLOBAL int32
+pkg debug/elf, const VER_NDX_LOCAL = 0
+pkg debug/elf, const VER_NDX_LOCAL int32
+pkg debug/elf, method (*File) DynamicVersionNeeds() (*[]DynamicVersionNeed, error)
+pkg debug/elf, method (*File) DynamicVersions() (*[]DynamicVersion, error)
+pkg debug/elf, type DynamicVersion struct
+pkg debug/elf, type DynamicVersion struct, Deps []string
+pkg debug/elf, type DynamicVersion struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersion struct, Index uint16
+pkg debug/elf, type DynamicVersion struct, Version uint16
+pkg debug/elf, type DynamicVersionDep struct
+pkg debug/elf, type DynamicVersionDep struct, Dep string
+pkg debug/elf, type DynamicVersionDep struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersionDep struct, Other uint16
+pkg debug/elf, type DynamicVersionFlags uint16
+pkg debug/elf, type DynamicVersionNeed struct
+pkg debug/elf, type DynamicVersionNeed struct, Name string
+pkg debug/elf, type DynamicVersionNeed struct, Needs []DynamicVersionDep
+pkg debug/elf, type DynamicVersionNeed struct, Version uint16
+pkg debug/elf, type Symbol struct, VersionIndex int32

NOTE: I had to update the tests to pass with the addition of the VersionIndex field to the Symbol struct.

@ianlancetaylor
Copy link
Member

I'm not sure we need to export VERSYM_HIDDEN and VERSYM_VERSION. Is there a reason for a programmer to use those flags? Similarly I'm not sure we need VER_NDX_GLOBAL and VER_NDX_LOCAL.

@benbaker76
Copy link
Contributor Author

@ianlancetaylor I do use VERSYM_HIDDEN and VERSYM_VERSION in my application but I can easily define them myself. The VER_NDX_GLOBAL and VER_NDX_LOCAL were just for clarity in the library but happy to remove these too.

So here's the latest API diff:

##### API check
+pkg debug/elf, const VER_FLG_BASE = 1
+pkg debug/elf, const VER_FLG_BASE DynamicVersionFlags
+pkg debug/elf, const VER_FLG_INFO = 4
+pkg debug/elf, const VER_FLG_INFO DynamicVersionFlags
+pkg debug/elf, const VER_FLG_WEAK = 2
+pkg debug/elf, const VER_FLG_WEAK DynamicVersionFlags
+pkg debug/elf, method (*File) DynamicVersionNeeds() (*[]DynamicVersionNeed, error)
+pkg debug/elf, method (*File) DynamicVersions() (*[]DynamicVersion, error)
+pkg debug/elf, type DynamicVersion struct
+pkg debug/elf, type DynamicVersion struct, Deps []string
+pkg debug/elf, type DynamicVersion struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersion struct, Index uint16
+pkg debug/elf, type DynamicVersion struct, Version uint16
+pkg debug/elf, type DynamicVersionDep struct
+pkg debug/elf, type DynamicVersionDep struct, Dep string
+pkg debug/elf, type DynamicVersionDep struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersionDep struct, Other uint16
+pkg debug/elf, type DynamicVersionFlags uint16
+pkg debug/elf, type DynamicVersionNeed struct
+pkg debug/elf, type DynamicVersionNeed struct, Name string
+pkg debug/elf, type DynamicVersionNeed struct, Needs []DynamicVersionDep
+pkg debug/elf, type DynamicVersionNeed struct, Version uint16
+pkg debug/elf, type Symbol struct, VersionIndex int32

@ianlancetaylor
Copy link
Member

What do you use VERSYM_HIDDEN and VERSYM_VERSION for in your application? Is there anything exposed by the API we are discussing for which they would be used? Thanks.

@benbaker76
Copy link
Contributor Author

Here are a couple of functions that I'm using these constants. IMO it's really not necessary to have them exposed by API.

const (
	VER_NDX_LOCAL  int32 = 0      // Symbol has local scope
	VER_NDX_GLOBAL int32 = 1      // Symbol has global scope
	VERSYM_VERSION int32 = 0x7fff // Version index mask
	VERSYM_HIDDEN  int32 = 0x8000 // Symbol is hidden
)

...

func getDynamicVersionNeed(sym elf.Symbol, dynVerNeeds *[]elf.DynamicVersionNeed) (*elf.DynamicVersionNeed, *elf.DynamicVersionDep) {
	if dynVerNeeds == nil {
		return nil, nil
	}

	for _, v := range *dynVerNeeds {
		for _, n := range v.Needs {
			var index = sym.VersionIndex & VERSYM_VERSION
			if n.Other == uint16(index) {
				return &v, &n
			}
		}
	}
	return nil, nil
}

func getSymbolName(sym elf.Symbol, dynVers *[]elf.DynamicVersion, dynVerNeeds *[]elf.DynamicVersionNeed, options ...SymbolOption) string {
	demangleSymbol := false
	addVersion := false
	addLibrary := false
	for _, o := range options {
		switch {
		case o == Demangle:
			demangleSymbol = true
		case o == AddVersion:
			addVersion = true
		case o == AddLibrary:
			addLibrary = true
		}
	}

	suffix := ""
	if (sym.Info&0xf == 0x2 || sym.Info&0xf == 0x1 && sym.Section == 0 || sym.Section != 0xfff1) && sym.Version != "" {
		if sym.VersionIndex&VERSYM_HIDDEN != 0 {
			suffix = "@" + sym.Version
		} else {
			suffix = "@@" + sym.Version
		}
	}

	symName := sym.Name

	if addLibrary {
		symName += suffix
	}

	if demangleSymbol {
		demangleName, err := demangle.ToString(sym.Name, demangle.NoRust)

		if err == nil {
			symName = demangleName
		}
	}

	dynVerNeed, dynVerNeedDep := getDynamicVersionNeed(sym, dynVerNeeds)

	if addVersion && dynVerNeed != nil && dynVerNeedDep != nil {
		if dynVerNeedDep.Other > 1 {
			symName += fmt.Sprintf(" (%d)", dynVerNeedDep.Other)
		}
	}

	return symName
}

@ianlancetaylor
Copy link
Member

I see, thanks. I wonder if we should separate the VERSYM_HIDDEN bit into a separate bool, and keep Symbol.VersionIndex as an index or -1. Otherwise we have to check for -1, and then compare with VERSYM_HIDDEN. It's not a convenient API.

Any opinion?

@benbaker76
Copy link
Contributor Author

@ianlancetaylor without adding another field we could use getters and setters along with a private versionIndex and separate them into a method called VersionIndex() which returns versionIndex masked with 0x7fff and a getter called IsHidden() that returns versionIndex checking the VERSYM_HIDDEN bit.

// A Symbol represents an entry in an ELF symbol table section.
type Symbol struct {
	Name         string
	Info, Other  byte
	versionIndex int32
	Section      SectionIndex
	Value, Size  uint64

	// Version and Library are present only for the dynamic symbol
	// table.
	Version string
	Library string
}

// SetVersionIndex sets the version index of the symbol.
func (s *Symbol) SetVersionIndex(versionIndex int32) {
	s.versionIndex = versionIndex
}

// VersionIndex returns the version index of the symbol.
func (s *Symbol) VersionIndex() int32 {
	return s.versionIndex & 0x7fff
}

// IsHidden returns whether the symbol is hidden.
func (s *Symbol) IsHidden() bool {
	return s.versionIndex&0x8000 != 0
}
##### API check
+pkg debug/elf, const VER_FLG_BASE = 1
+pkg debug/elf, const VER_FLG_BASE DynamicVersionFlags
+pkg debug/elf, const VER_FLG_INFO = 4
+pkg debug/elf, const VER_FLG_INFO DynamicVersionFlags
+pkg debug/elf, const VER_FLG_WEAK = 2
+pkg debug/elf, const VER_FLG_WEAK DynamicVersionFlags
+pkg debug/elf, method (*File) DynamicVersionNeeds() (*[]DynamicVersionNeed, error)
+pkg debug/elf, method (*File) DynamicVersions() (*[]DynamicVersion, error)
+pkg debug/elf, method (*Symbol) IsHidden() bool
+pkg debug/elf, method (*Symbol) SetVersionIndex(int32)
+pkg debug/elf, method (*Symbol) VersionIndex() int32
+pkg debug/elf, type DynamicVersion struct
+pkg debug/elf, type DynamicVersion struct, Deps []string
+pkg debug/elf, type DynamicVersion struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersion struct, Index uint16
+pkg debug/elf, type DynamicVersion struct, Version uint16
+pkg debug/elf, type DynamicVersionDep struct
+pkg debug/elf, type DynamicVersionDep struct, Dep string
+pkg debug/elf, type DynamicVersionDep struct, Flags DynamicVersionFlags
+pkg debug/elf, type DynamicVersionDep struct, Other uint16
+pkg debug/elf, type DynamicVersionFlags uint16
+pkg debug/elf, type DynamicVersionNeed struct
+pkg debug/elf, type DynamicVersionNeed struct, Name string
+pkg debug/elf, type DynamicVersionNeed struct, Needs []DynamicVersionDep
+pkg debug/elf, type DynamicVersionNeed struct, Version uint16

@ianlancetaylor
Copy link
Member

In this approach I think SetVersionIndex should take two arguments.

But a bigger issue is that I don't think anything else in debug/elf uses methods like that. It would be better to avoid two different ways of getting Symbol information.

@benbaker76
Copy link
Contributor Author

But a bigger issue is that I don't think anything else in debug/elf uses methods like that. It would be better to avoid two different ways of getting Symbol information.

@ianlancetaylor I'm not a huge fan of the approach either.

Another option is to move IsHidden to Symbol:

// A Symbol represents an entry in an ELF symbol table section.
type Symbol struct {
	Name         string
	Info, Other  byte
	VersionIndex int32
	IsHidden     bool
	Section      SectionIndex
	Value, Size  uint64

	// Version and Library are present only for the dynamic symbol
	// table.
	Version string
	Library string
}

And another option is to create a new GnuVersion struct:

// GnuVersion represents GNU version information for a symbol
type GnuVersion struct {
	VersionIndex int32
	Version      string
	Library      string
	IsHidden     bool
}

// A Symbol represents an entry in an ELF symbol table section.
type Symbol struct {
	Name        string
	Info, Other byte
	Section     SectionIndex
	Value, Size uint64
	Version GnuVersion
}

@benbaker76
Copy link
Contributor Author

Another option is to move version out of Symbol completely and add functions like this:

func (f *File) DynamicSymbolVersions() ([]GnuVersion, error) {
	sym, str, err := f.getSymbols(SHT_DYNSYM)
	if err != nil {
		return nil, err
	}
	if !f.gnuVersionInit(str) {
		return nil, nil
	}
	gnuVerisons := make([]GnuVersion, len(sym))
	for i := range sym {
		gnuVerisons[i] = f.gnuVersion(i)
	}
	return gnuVerisons, nil
}

func (f *File) ImportedSymbolVersions() ([]GnuVersion, error) {
	sym, str, err := f.getSymbols(SHT_DYNSYM)
	if err != nil {
		return nil, err
	}
	if !f.gnuVersionInit(str) {
		return nil, nil
	}
	var gnuVersions []GnuVersion
	for i, s := range sym {
		if ST_BIND(s.Info) == STB_GLOBAL && s.Section == SHN_UNDEF {
			gnuVersions = append(gnuVersions, f.gnuVersion(i))
		}
	}

	return gnuVersions, nil
}

@aclements
Copy link
Member

I would be fine with wrapping the first Needs element of DynamicVersionNeed into the DynamicVersionNeed as the name. As far as I know the flag field of the first element is always 0. We would just have to document that the structure in the package does not quite correspond to the structure of the underlying data.

I think DynamicVersionNeed.Needs is actually fine.

I was focusing on DynamicVersion.Deps []string. These are just strings, so there are no flags to worry about. The GNU ELF version spec doesn't have much to say about the meaning of this list, but the OpenSolaris docs say "The first element of the array must exist. This element points to the version definition string this structure defines. Additional elements can be present. The number of elements is indicated by the vd_cnt value. These elements represent the dependencies of this version definition. Each of these dependencies will have its own version definition structure." This does distinguish the first element as "the version definition string this structure defines", from the rest of the elements as "dependencies of this version definition". (Now I see where the field name Deps came from.) This is why I think we should pull the first element of this out as its own field and Deps should only be the rest of the elements.

@aclements
Copy link
Member

Here's where I am with the API:

type DynamicVersion struct {
//	Version uint16 // Version of data structure. // REMOVE
	Name    string  // Name of the version defined by this index // ADD
	Index   uint16   // Version index
	Flags   DynamicVersionFlag
	Deps    []string // Names of versions depended on by this version // NEW DOC
}

type DynamicVersionNeed struct {
//	Version uint16              // Version of data structure. // REMOVE
	Name    string              // Shared library name.
	Needs   []DynamicVersionDep // Dependencies.
}

type DynamicVersionDep struct {
	Flags DynamicVersionFlag
	Index uint16 // Version index // RENAME Other -> Index
	Dep   string // Name of required version.
}

type Symbol {
    ...
    // VersionScope describes the version in which the symbol is defined.
    // This is only set for the dynamic symbol table.
    // When no symbol versioning information is available, this is VersionScopeNone.
    VersionScope SymbolVersionScope
    // If VersionScope is VersionScopeSpecific or VersionScopeHidden, this is the version index.
    // This is only set for the dynamic symbol table.
    // This index will match either [DynamicVersion.Index] in the slice returned by
    // [File.DynamicVersions], or [DynamicVersionDep.Index] in the Needs field
    // of the elements of the slice returned by [File.DynamicVersionNeeds].
    // In general, a defined symbol will have an index referring to DynamicVersions,
    // and an undefined symbol will have an index referring to some version in DynamicVersionNeeds.
    VersionIndex int16
    ...
}

// SymbolVersionScope describes the version in which a [Symbol] is defined.
// This is only used for the dynamic symbol table.
type SymbolVersionScope byte

const (
    VersionScopeNone SymbolVersionScope = iota // no symbol version available
    VersionScopeLocal // symbol has local scope
    VersionScopeGlobal // symbol has global scope
    VersionScopeSpecific // symbol is in the version given by VersionIndex
    VersionScopeHidden // symbol is in the version given by VersionIndex, and is hidden
)

@aclements
Copy link
Member

I'm unsure about VersionScopeGlobal. Quoting the OpenSolaris docs: "Versions defined by an object are assigned version indexes starting at 1 and incremented by 1 for each version. Index 1 is reserved for the first global version. If the object does not have a SHT_SUNW_verdef version definition section, then all the global symbols defined by the object receive index 1. If the object does have a version definition section, then VER_NDX_GLOBAL simply refers to the first such version."

So VER_NDX_GLOBAL is really just version index 1, and version index 1 has to be the base version. If we have VersionScopeGlobal and don't set VersionIndex, then the consumer of this API has to know VersionScopeGlobal means to look for version index 1 (or the version marked VER_FLG_BASE, which I think has to be the same thing?)

That suggests to me we shouldn't have VersionScopeGlobal, and should just set VersionIndex to 1 in this case. It's unfortunate that there's the caveat that index 1 can be used even if there's no version definition section, which complicates how to resolve VersionIndex, but in a different way.

@ianlancetaylor
Copy link
Member

To me the fact that it's possible to have symbols with VER_NDX_GLOBAL with no version definition implies that we shouldn't just use version index 1. There is no way to explicitly put symbols into version index 1, and version index 1 doesn't have a meaningful name (typically just "BASE"), so in practice you need to special case VER_NDX_GLOBAL, and you don't look up the version with that index. That is, a symbol with VER_NDX_GLOBAL doesn't have a version.

@ianlancetaylor
Copy link
Member

Sent https://go.dev/cl/635079 with the currently suggested API changes.

@ianlancetaylor
Copy link
Member

@benbaker76 Any opinion on these changes? Thanks.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/635079 mentions this issue: debug/elf: adjust version API per issue discussion

@aclements
Copy link
Member

I ran #63952 (comment) by proposal committee and there were no objections.

gopherbot pushed a commit that referenced this issue Dec 11, 2024
This updates the new version API for the discussion on #63952.

This change reveals that in fact none of the tests set the
VERSYM_HIDDEN bit. The code before this CL set the hidden flag
for symbols that appear in DynamicVersionNeed, but that is not
an accurate representation of the ELF. The readelf program
does print undefined symbols that way (with a single '@'),
but that doesn't mean that the hidden flag is set.
Leaving tests with the hidden bit set for later.

For #63952

Change-Id: Ida60831e0c9922dfc10f10c7a64bc76a2b197537
Reviewed-on: https://go-review.googlesource.com/c/go/+/635079
Reviewed-by: Austin Clements <[email protected]>
Commit-Queue: Ian Lance Taylor <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@benbaker76
Copy link
Contributor Author

@benbaker76 Any opinion on these changes? Thanks.

I don't mind the name changes but my test application is now failing.

func getSymbolName(sym elf.Symbol, dynVers []elf.DynamicVersion, dynVerNeeds []elf.DynamicVersionNeed, options ...SymbolOption) string {
	demangleSymbol := false
	addVersion := false
	addLibrary := false
	for _, o := range options {
		switch {
		case o == Demangle:
			demangleSymbol = true
		case o == AddVersion:
			addVersion = true
		case o == AddLibrary:
			addLibrary = true
		}
	}

	suffix := ""
	if (sym.Info&0xf == 0x2 || sym.Info&0xf == 0x1 && sym.Section == 0 || sym.Section != 0xfff1) && sym.Version != "" {
		if sym.VersionScope == elf.VersionScopeHidden {
			suffix = "@" + sym.Version
		} else {
			suffix = "@@" + sym.Version
		}
	}

	symName := sym.Name

	if addLibrary {
		symName += suffix
	}

	if demangleSymbol {
		demangleName, err := demangle.ToString(sym.Name, demangle.NoRust)

		if err == nil {
			symName = demangleName
		}
	}

	_, dynVerNeedDep := getDynamicVersionNeed(sym, dynVerNeeds)

	if addVersion && dynVerNeedDep != nil {
		if dynVerNeedDep.Index > 1 {
			symName += fmt.Sprintf(" (%d)", dynVerNeedDep.Index)
		}
	}

	return symName
}

Which is equivalent to readelf.c#L14314

I'm getting lots of these:

    main_test.go:117: expected memcpy@GLIBC_2.17 (2) got memcpy@@GLIBC_2.17 (2)
    main_test.go:117: expected gai_strerror@GLIBC_2.17 (2) got gai_strerror@@GLIBC_2.17 (2)
    main_test.go:117: expected freeaddrinfo@GLIBC_2.17 (2) got freeaddrinfo@@GLIBC_2.17 (2)
    main_test.go:117: expected dlerror@GLIBC_2.34 (3) got dlerror@@GLIBC_2.34 (3)
    main_test.go:117: expected strlen@GLIBC_2.17 (2) got strlen@@GLIBC_2.17 (2)

@benbaker76
Copy link
Contributor Author

These changes appear to fix it for me

Screenshot 2024-12-12 at 1 20 12 PM

@ianlancetaylor
Copy link
Member

Thanks. This is the code from readelf:

  if (version_string)
    {
      if (sym_info == symbol_undefined)
	printf ("@%s (%d)", version_string, vna_other);
      else
	printf (sym_info == symbol_hidden ? "@%s" : "@@%s",
		version_string);
    }

I think that if you do something similar then there is no need to return VersionScopeHidden if the version is found in dynVerneeds. I think that corresponds more closely to what is in the ELF file.

I'm less clear on changing j == 0 to ndx == 0. Are you seeing cases where the the version is recorded as 0x8000 or 0x8001? In that case we should return VersionScopeHidden and set VersionIndex to 0 or 1.

@benbaker76
Copy link
Contributor Author

Thanks. This is the code from readelf:

I'm not having much luck getting it to work. I'll attach my test app that compares output with readelf. Perhaps you can get it matching without the changes?

readelf_cmp.zip

@ianlancetaylor
Copy link
Member

I get no reported differences with current Go tip and this patch to your readelf_cmp.go source code.

--- readelf_cmp.go	2024-12-11 23:20:30.000000000 -0800
+++ readelf_new.go	2024-12-12 10:17:50.685140567 -0800
@@ -167,9 +167,13 @@
 		}
 	}
 
+	_, dynVerNeedDep := getDynamicVersionNeed(sym, dynVerNeeds)
+
 	suffix := ""
 	if (sym.Info&0xf == 0x2 || sym.Info&0xf == 0x1 && sym.Section == 0 || sym.Section != 0xfff1) && sym.Version != "" {
-		if sym.VersionScope == elf.VersionScopeHidden {
+		if dynVerNeedDep != nil {
+			suffix = "@" + sym.Version
+		} else if sym.VersionScope == elf.VersionScopeHidden {
 			suffix = "@" + sym.Version
 		} else {
 			suffix = "@@" + sym.Version
@@ -190,8 +194,6 @@
 		}
 	}
 
-	_, dynVerNeedDep := getDynamicVersionNeed(sym, dynVerNeeds)
-
 	if addVersion && dynVerNeedDep != nil {
 		if dynVerNeedDep.Index > 1 {
 			symName += fmt.Sprintf(" (%d)", dynVerNeedDep.Index)

@benbaker76
Copy link
Contributor Author

I get no reported differences with current Go tip and this patch to your readelf_cmp.go source code.

Thanks, that fixes one issue. The only thing remaining for me is if I run it as-is I get:

$ go run readelf_cmp.go 
Found 3086 files
expected __fuse_exited got [email protected]
expected fuse_new got [email protected]
expected __fuse_loop_mt got [email protected]
expected __fuse_teardown got [email protected]
expected __fuse_process_cmd got [email protected]
expected __fuse_read_cmd got [email protected]
expected __fuse_set_getcontext_func got [email protected]
expected fuse_main got [email protected]
expected __fuse_setup got [email protected]
mismatch in /usr/lib/x86_64-linux-gnu/libfuse.so.2.9.9
expected __fuse_exited got [email protected]
expected fuse_new got [email protected]
expected __fuse_loop_mt got [email protected]
expected __fuse_teardown got [email protected]
expected __fuse_process_cmd got [email protected]
expected __fuse_read_cmd got [email protected]
expected __fuse_set_getcontext_func got [email protected]
expected fuse_main got [email protected]
expected __fuse_setup got [email protected]
mismatch in /usr/lib/x86_64-linux-gnu/libfuse.so.2
expected rados_tmap_put got [email protected]
expected rados_tmap_get got [email protected]
expected rados_tmap_update got [email protected]
expected rados_ioctx_pool_stat got [email protected]
mismatch in /usr/lib/x86_64-linux-gnu/librados.so.2.0.0
expected rados_tmap_put got [email protected]
expected rados_tmap_get got [email protected]
expected rados_tmap_update got [email protected]
expected rados_ioctx_pool_stat got [email protected]
mismatch in /usr/lib/x86_64-linux-gnu/librados.so.2

If I make these changes (change j to ndx)

	j := int32(f.ByteOrder.Uint16(s))
	ndx := int16(j & 0x7fff)

	if ndx == 0 {
		return ndx, "", "", VersionScopeLocal
	} else if ndx == 1 {
		return ndx, "", "", VersionScopeGlobal
	}

Now I get:

$ go run readelf_cmp.go 
Found 3086 files

@ianlancetaylor
Copy link
Member

That's interesting, it looks like some of the symbols in libfuse.so.2 have a versym entry of 0x8001. This would seem to indicate a hidden symbol in the base version, which I didn't think was possible.

Your suggested change doesn't seem quite right to me, because it loses the fact that the symbol is marked hidden. I think with our current approach we need to extend SymbolVersionScope. Something like https://go.dev/cl/636095.

@aclements
Copy link
Member

Repeating what I just posted to https://go.dev/cl/636095, which I probably should have posted here:

This strongly suggests that we need to treat "hidden" as orthogonal to local/global/specific, and not bundle it into one enum. We could add another bool to Symbol, or if we don't want to make Symbol larger, we could pack all of this (including the version) into a uint16 with accessors for "Scope", "Hidden", and "Version". In fact, other than VersionScopeNone, that could just be the ELF encoding.

Roughly:

type Symbol struct {
    ...

    Version SymbolVersion
}

type SymbolVersion uint16

// Scope describes the version in which the symbol is defined.
// When no symbol versioning information is available, this is VersionScopeNone.
func (SymbolVersion) Scope() SymbolVersionScope

func (SymbolVersion) Hidden() bool

// If Scope is VersionScopeSpecific or VersionScopeHidden, this is the version index.
// This is only set for the dynamic symbol table.
// This index will match either [DynamicVersion.Index] in the slice returned by
// [File.DynamicVersions], or [DynamicVersionDep.Index] in the Needs field
// of the elements of the slice returned by [File.DynamicVersionNeeds].
// In general, a defined symbol will have an index referring to DynamicVersions,
// and an undefined symbol will have an index referring to some version in DynamicVersionNeeds.
func (SymbolVersion) Index() int16

// SymbolVersionScope describes the version in which a [Symbol] is defined.
// This is only used for the dynamic symbol table.
type SymbolVersionScope byte

const (
    VersionScopeNone SymbolVersionScope = iota // no symbol version available
    VersionScopeLocal // symbol has local scope
    VersionScopeGlobal // symbol has global scope
    VersionScopeSpecific // symbol is in the version given by VersionIndex
)

(I'm not quite sure how to fit VersionScopeNone into this. Would it be okay to drop that and return VersionScopeLocal if there's no version info?)

@ianlancetaylor
Copy link
Member

Another approach we can take is to drop VersionScopeGlobal and VersionScopeLocal and record them as 1 and 0 in the VersionIndex field. That still retains all the information at the cost of forcing people to be a bit more careful in how they interpret VersionIndex (because there won't be a matching version entry).

@aclements
Copy link
Member

In that case would VersionScope only report None/Specific/Hidden? (I think we'd need a different term for Specific in that case, and maybe it's not even a "Scope" any more.)

@ianlancetaylor
Copy link
Member

Stepping back, what we're trying to express is that a symbol version is a 16-bit integer. The high bit is set if the symbol is hidden (meaning that the symbol is only usable with an explicit version specification). The rest of the integer is a version index. The values 0 is special. Other values correspond to the DynamicVersion.Index field or the DynamicVersionDep.Index field. The value 0 means that the symbol does not have a version, and is not visible outside the dynamic object. The value 1 corresponds to a DynamicVersion.Index field, but it's a special one that isn't a real version.

And, of course, besides those values, we need to express the possibility that the symbol has no version.

From a linking perspective, I don't see any reason for a symbol version of 0x8001. That means that the symbol is in the base version, meaning that it has no version. And it means that the symbol is hidden, meaning that the version must be explicitly specified. But there is no version, so there is nothing to explicitly specify.

In the only case I know where it happens. 0x8001 appears for a symbol like __fuse_exited. This symbol is an alias for a different symbol fuse_exited with version index 2 (version FUSE_2.2, not hidden). The readelf program prints this as __fuse_exited with no version and fuse_exited@@FUSE_2.2.

In the libfuse sources this is due to a line

__asm__(".symver fuse_exited,__fuse_exited@");

In other words: a hidden symbol with no version, just as the object file is recording.

I'm not yet sure what is best here.

@aclements
Copy link
Member

My sense is that we're both under- and over-interpreting this data for the consumer.

One thing I liked in my proposed type SymbolVersion uint16 is that can literally be the value from the ELF file and we can provide helpers to interpret it. (Maybe we need a bool field to indicate if there's any version info?)

If we wanted to heavily interpret this, we would do the index lookup for people, which isn't the direction of this API.

Dropping VersionScopeGlobal and VersionScopeLocal leaves the "hidden" bit as the only thing we're interpreting, which feels like a weird middle ground to me. Why interpret that one bit and not the special meanings of 0 and 1?

Another option is we just give people the type SymbolVersion uint16 without methods and the consts VER_NDX_LOCAL and VER_NDX_GLOBAL. As far as I can tell there's no actual name for the hidden bit. I think this would be fine, honestly, but we might as well make it easier to interpret that encoding with a few methods.

@benbaker76
Copy link
Contributor Author

One thing I liked in my proposed type SymbolVersion uint16 is that can literally be the value from the ELF file and we can provide helpers to interpret it. (Maybe we need a bool field to indicate if there's any version info?)

Your proposal is much like an earlier one I made here

If we wanted to heavily interpret this, we would do the index lookup for people, which isn't the direction of this API.

Dropping VersionScopeGlobal and VersionScopeLocal leaves the "hidden" bit as the only thing we're interpreting, which feels like a weird middle ground to me. Why interpret that one bit and not the special meanings of 0 and 1?

I believe that's why we originally went with an enum and flags to set the hidden bit.

Another option is we just give people the type SymbolVersion uint16 without methods and the consts VER_NDX_LOCAL and VER_NDX_GLOBAL. As far as I can tell there's no actual name for the hidden bit. I think this would be fine, honestly, but we might as well make it easier to interpret that encoding with a few methods.

In common.h hidden bit and mask are called

/* This flag appears in a Versym structure.  It means that the symbol
   is hidden, and is only visible with an explicit version number.
   This is a GNU extension.  */

#define VERSYM_HIDDEN		0x8000

/* This is the mask for the rest of the Versym information.  */

#define VERSYM_VERSION		0x7fff

Yes and I did make a similar proposal to this here

@ianlancetaylor
Copy link
Member

Yes, I did suggest moving IsHidden out earlier. I didn't really consider a type with methods, or accessor functions like the current STB_BIND.

I do think we need to have a way to record "no version information."

So that leads us to changing everything around yet again, and coming up with

type Symbol {
    ...

    // HasVersion reports whether the symbol has any version information.
    HasVersion bool
    // VersionIndex is the symbol's version index.
    // Use the methods of the VersionIndex type to access it.
    VersionIndex VersionIndex
    ...
}

// VersionIndex is the type of a symbol version index.
type VersionIndex uint16

// IsHidden reports whether the symbol is hidden within the version.
// This means that the symbol can only be seen by specifying the exact version.
func (vi VersionIndex) IsHidden(bool) { return vi&0x8000 != 0 }

// Index returns the version index.
// If this is the value 0, it means that the symbol is local and not visible externally.
// If this is the value 1, it means that the symbol is in the base version, and has no specific version.
// Other values will match either [DynamicVersion.Index]
// in the slice returned by [File.DynamicVersions],
// or [DynamicVersiondep.Index] in the Needs field
// of the elements of the slice returned by [File.DynamicVersionNeeds].
func (vi VersionIndex) Index() uint16 { return uint16(vi & 0x7fff) }

If we adopt this approach then SymbolVersionScope goes away again.

@aclements
Copy link
Member

I'm good with @ianlancetaylor 's latest API

@benbaker76
Copy link
Contributor Author

I'm good with @ianlancetaylor 's latest API

Me too

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/637235 mentions this issue: debug/elf: adjust version API per issue discussion

gopherbot pushed a commit that referenced this issue Dec 17, 2024
This updates the new version API for the discussion on #63952.

Note that the current tests do not have symbols with hidden versions.
Leaving that for later.

For #63952

Change-Id: I1ad4b1e485429a216ba8e5b68f7f4299d120628f
Reviewed-on: https://go-review.googlesource.com/c/go/+/637235
Reviewed-by: Ian Lance Taylor <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
Reviewed-by: Austin Clements <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Commit-Queue: Ian Lance Taylor <[email protected]>
@benbaker76
Copy link
Contributor Author

Here's an updated readelf_cmp.go which is running successfully.

readelf_cmp.zip

I didn't find a need for HasVersion though.

@ianlancetaylor
Copy link
Member

It seems to me that we need HasVersion to detect the case of a shared library with no versioning information. We don't want to pretend that everything is in the local version for that case.

@aclements
Copy link
Member

The proposal committee had no objections to the re-updated API @ianlancetaylor proposed (other than the typo that IsHidden should return a bool, not take it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

Successfully merging a pull request may close this issue.

7 participants