Calculating the Publication.positionList

We need to specify for each format:
- How to create the `positionList`?
- How to find out reliably the currently displayed position from `positionList`?

Related issue: [Total progression in a publication for locators ](https://github.com/readium/architecture/issues/90)

### CBZ and PDF

Those formats are straightforward, we can read directly the number of pages for PDF and files for CBZ to build the `positionList`. To retrieve the current position, we just need the index of the page.

PDF can be a bit less efficient because we need to open the file (potentially load it entirely in memory, eg. with Swift) to read its number of pages.


### LCPDF

LCPDF contains encrypted PDF. So we can't really get the `positionList` until the license is unlocked. It might also contain several PDF, which is not very efficient if we have to open all of them to calculate the `positionList`.

An alternative would be to have the number of pages as a link property for each resource in the RWPM, then it's really efficient to build the `positionList` and doesn't require the publication's passphrase.

The `positionList` is built by adding the number of pages of each PDF in the readingOrder. Here's an example implementation in Swift: https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L50

We can find out the current position easily by keeping a separate array of positions for each resource `href`, and using the page index of the currently visible resource (eg. https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L221).

### EPUB

The tricky part that needs to be discussed...

#### How to create the `positionList`?

Among the solutions discussed to split a resource into pages:
- **characters:** This might be more accurate, but we need to parse each resource to calculate it, and we need the passphrase if the book is encrypted with LCP.
- **bytes:** This is quick and easy (read in the ZIP entries or encryption.xml for LCP) and reliable enough in my opinion. However, there's no way we can match bytes with the DOM in a web view (not necessarily a problem if we don't match accurately, see section below).
- **scroll size:** This is the most accurate on a given device (take into account images and layout), but highly inefficient since we have to load all the resources in a web view in the background. Moreover, it doesn't work well across devices because the calculated `positionList` might be different. Not a good solution IMHO.

Both the characters and bytes methods are pretty reliable to express the relative size between reading order resources and publications, as long as the chapters are not image based.

#### How to find out reliably the currently displayed position from `positionList`?

I think we agreed on a call that there's no way to accurately find the current position in an EPUB. The DOM displayed in a web view is dynamic and might not be equivalent to the one parsed from the static XHTML files. We can however approximate it:
- **Using progression in the resource to calculate the index of the position:** So far the `progression` has been a pretty reliable way to position a page in a web view, and it could work well across platforms here too. It's not such a problem if we don't match the exact position that was split arbitrarily (bytes or characters), as long as we are reliably imprecise across devices. We need to end up at the same page when sharing a `position` index between devices. On the plus side, this is easy to implement to make some test quickly.
- **Calculating the character offset:** This might be a more reliable way to match exactly the position if it was parsed using characters. It might not be 100% reliable though since the DOM in the web view is not the same as in the static XHTML. Moreover, it is much more complicated and the added value is not clear to me compared to using progression.

#### Fixed layout vs reflowable

There's the added difficulty that an EPUB can contain both fixed layout and reflowable resources. Fixed layout is straightforward, one resource = one page. But we need to take it into account when calculating the `positionList` instead of only splitting by characters/bytes.

### Side discussion

Calculating the `positionList` might be slow and memory/CPU-intensive (eg. for LCPDF we have to load all the PDFs in memory). I don't think that it's necessary to expose an asynchronous API for `Publication.positionList`. The caller can wrap it in a background process if it doesn't need the `positionList` synchronously.

However, we could benefit from having a cache in the streamer to store the calculated `positionList` (eg. as JSON).

- If we create a cache, it must be extensible for other type of data that we might need in the future (eg. information parsed from each resource needed by the navigator).
- The cache data should be associated with the [publication's release identifier](https://www.w3.org/publishing/epub3/epub-packages.html#sec-metadata-elem-identifiers-pid) and not file path, to avoid duplicates and outdated `positionList`. For information, we don't expose the release identifier in Publication, but it can be retrieved privately directly in the streamer for EPUB.
- I don't think this should be persisted by the testapp itself, because this data is actually required by the navigator. It would complexify usage while increasing the risk of wrong data, putting accurate positioning at risk. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculating the Publication.positionList #101

CBZ and PDF

LCPDF

EPUB

How to create the `positionList`?

How to find out reliably the currently displayed position from `positionList`?

Fixed layout vs reflowable

Side discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calculating the Publication.positionList #101

Description

CBZ and PDF

LCPDF

EPUB

How to create the positionList?

How to find out reliably the currently displayed position from positionList?

Fixed layout vs reflowable

Side discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How to create the `positionList`?

How to find out reliably the currently displayed position from `positionList`?