-
-
Notifications
You must be signed in to change notification settings - Fork 62
OPDS catalog languages endpoint #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #553 +/- ##
==========================================
+ Coverage 64.92% 65.17% +0.25%
==========================================
Files 53 53
Lines 3738 3765 +27
Branches 1874 1886 +12
==========================================
+ Hits 2427 2454 +27
Misses 1309 1309
Partials 2 2
Continue to review full report at Codecov.
|
@veloman-yunkan can we have the ISO codes as well in the response they are required for making requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@veloman-yunkan Thank you for your quick move on this! Regarding the XML, I have the feeling you try to follow OPDS formalism and I'm not familliar with these requirements. I prefer to let you judge with @mgautierfr on this.
What I see is that this is pretty verbose and what I would have basically foreseen is the title
node and a missing node with the ISO 639-3 language code (from the ZIM meta info Language
) to allow filtering (like requested by @MananJethwani). Maybe this should be used in place of the current id
node value? If OPDS expects from us the other fields, I would obviously keep them. Otherwise I would recommend to consider removing them if no real usage/reason for them.
Could you please as well please document the new end-points (at least the URL for the moment) at https://wiki.kiwix.org/wiki/OPDS, for both language and category end-points? This should be documented properly if we want people to use these new end-points.
One additional side remark: Kiwix Android, in its current filter displays the number of books per language. Considering that we only display the language with books, I wonder if adding a node with the number of books would be apropriate in this feed?
@MananJethwani The language code was already present in the |
@veloman-yunkan I get this error when I try to fetch data from the server response in logs ->
I don't know if this is intentional and you are working on it but thought it would be good to mention.😅 |
@MananJethwani That's expected. The language table currently contains only three languages used in the unit tests. |
@veloman-yunkan OK, so lets wait to @mgautierfr feedback to get an agreement on the XML format so you can complete the PR with all available ZIM languages and fix the error. |
We don't yet want people to start using these new end-points. We will document them once the new API is stable.
I am not going to judge about the appropriateness of this enhancement, but I see no difficulty implementing it. |
Seems good enough for me.
Do we need human-friendly names ? (To answer your question, it is probably better to use icu to handle languages)
This could be expected that we don't support language we don't know. But we must not have a internal server error.
The OPDS Spec speaks about |
The proposed "title" node is the language fullname or friendlyname. It is important to put this language fullname in the OPDS stream to make it directly usable. We can not assume here the OPDS indgester will have access to libkiwix ABI. The mapping beetween code and fullname should be done with libicu indeed. if no mapping, i would avoid to create a "title" node or put an empty string value. |
We COULD add the fullname as a helper yes. But it is not our primary data. And one important point here is that we don't know the language of the client, so the fullname will be in the language itself or in English.
Even more, we must design a API usable without libkiwix. That is why we need to design the stream closest to the standard. And the standard is ISO 639 (either version 639-1, 639-2 or 639-3. And we must decide now as it is part of the spec.) |
Language name should be in its own language.
For now, for me, this is not a requirement, at any level. Not sure if it will ever be required to have language names translated.
Like stated in an earlier comment, we need ISO 639-3 because this is what is used in the ZIM metadata. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
@veloman-yunkan Do I'm right, you need to finish this Pr? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
@kelson42 Yes, I will finish it. |
23c3a3a
to
0198cda
Compare
The version of libicu built by kiwix-build explicitly excludes the language data (among other types of data). The conversion from language code to human-friendly names is achieved in new commit 0198cda paired with kiwix/kiwix-build#494. Note that since language data in ICU contains much more information than we need for this enhancement, libicu (as modified by kiwix/kiwix-build#494) gets bigger by a couple of megabytes, whereas the solution utilizing a custom map would need only a few kilobytes in libkiwix. Please advise which solution should be adopted for this PR. |
Lets "pay" the few MBs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small issues open to discussion.
Else we have to finish kiwix/kiwix-build#494 and wait for nightly ci to run to have the last icu data.
src/opds_dumper.cpp
Outdated
std::string getLanguageSelfName(const std::string& lang) { | ||
return langMap.at(lang).selfName; | ||
const icu::Locale locale(lang.c_str()); | ||
icu::UnicodeString ustring; | ||
locale.getDisplayLanguage(locale, ustring); | ||
std::string result; | ||
ustring.toUTF8String(result); | ||
return result; | ||
}; | ||
|
||
std::string getLanguageEnglishName(const std::string& lang) { | ||
return langMap.at(lang).englishName; | ||
const icu::Locale locale(lang.c_str()); | ||
icu::UnicodeString ustring; | ||
locale.getDisplayLanguage(icu::Locale("en"), ustring); | ||
std::string result; | ||
ustring.toUTF8String(result); | ||
return result; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the performance of this (although not really important).
Creating the icu::Locale may be time conssuming (to check).
Maybe we could have only one method returning a pair of string and so get the locale only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about simply dropping getLanguageEnglishName()
. @kelson42 considers the current format of the language entry pretty verbose, so removing the <content>
node will solve both concerns at the same time.
Switching to ICU-based language handling eliminated the crash. However, for a language code unknown to ICU the code text is returned unmodified as the human-friendly name. Is that acceptable? An alternative is to convert it to a string of the form "Unknown language langcode" or similar. |
Acceptable, at least for now and in the libkiwix. |
This is also done |
0dae28e
to
2be68a9
Compare
@veloman-yunkan I have rebased. Is this PR ready for a new review pass? If yes, please re-request a review. BTW, part of the CI seems to fail. |
@kelson42 kiwix/kiwix-build#494 must be merged first |
This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
2be68a9
to
2b517d1
Compare
I have rebased. |
@veloman-yunkan why is |
@MananJethwani That is already the case. The title contains the full name of the language in itself. The ISO code (but ISO-3 rather than ISO-2) is in the |
2b517d1
to
6cd950f
Compare
Here is how it looks like to me:
Looks to be the kind of thing I was expected. LGTM |
sorry I might have checked and older version LGTM as well |
Added a new entry in /catalog/v2/root.xml that points to a not-yet-existing list of languages navigation feed.
Introduced a helper function `Library::getBookPropValueSet()` and deduplicated Library::getBooks{Languages,Creators,Publishers}() methods.
Language code to human friendly name translation is now done with the help of the ICU library. It works if the line ``` -include $(LANGSRCDIR)/resfiles.mk ``` in the file `source/data/Makefile.in` of the icu4c dependency is not commented out. Currently, the said line is commented out (along with some other include's) by the `icu4c_custom_data.patch` patch of the `kiwix-build` tool.
6cd950f
to
ab30957
Compare
Look good to me also (code and opds output). Rebased-fixup on master. |
Fixes #584
This is a preliminary version of the /catalog/v2/languages endpoint intended for collecting early feedback.
Is the format of the language list navigation feed OK?
Is there a better way of translating language ISO codes to human-friendly names than using a hardcoded table?