Skip to content

Add chars_stream/1 and chars_to_stream/{2,3} #2968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

jjtolton
Copy link

@jjtolton jjtolton commented May 24, 2025

Intended usage:

:- use_module(library(charsio)).
:- use_module(library(lists)).

?-   (
      Phrase="can convert string to char stream",
      length(Phrase, N),
      chars_to_stream(Phrase, Stream),
      get_n_chars(Stream, N, Chars),
      Phrase=Chars
     ).

Currently, I have implemented this in the dumbest possible way. Suggestions appreciated.

@triska
Copy link
Contributor

triska commented May 24, 2025

Thank you a lot for working on this!

What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?

@jjtolton
Copy link
Author

jjtolton commented May 24, 2025

Thank you a lot for working on this!

What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?

Oh interesting. What did you have in mind? memory_stream/2 where the first argument is a string?

@triska
Copy link
Contributor

triska commented May 24, 2025

memory_stream/2 where the first argument is a string?

Yes exactly, chars_stream(+Chars, -Stream, +Options) where we can open Chars as a stream and then read from it. It would be useful as a building block already to bootstrap existing predicates from library(charsio).

Alternatively, maybe open/3 with chars(Chars) as admissible source and even sink?

@jjtolton
Copy link
Author

Interesting. Would the 2 arrity be string and stream or stream and opts?

@jjtolton
Copy link
Author

jjtolton commented May 24, 2025

Tests are extremely flakey, I assume it has something to do with the hack I used to implement the memory stream.

$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"

thread 'main' panicked at src/arena.rs:210:5:
value contains invalid bit pattern for field ArenaHeader.tag: InvalidBitPattern { invalid_bytes: 0 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"
Running test "can read from and write to char stream"
Running test "can convert string to char stream"
Running test "can convert string to char stream with options"

Edit: tests fine now after better implementation

@jjtolton jjtolton changed the title Add memory_stream/1 Add chars_to_stream/{1,2,3} May 24, 2025
@jjtolton jjtolton changed the title Add chars_to_stream/{1,2,3} Add char_stream/1 and chars_to_stream/{2,3} May 24, 2025
@@ -393,3 +397,52 @@
; '$chars_base64'(Cs, Bs, Padding, Charset)
)
).

%% put_chars(+Stream, +Chars). %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this we have format/3 from library format.

@triska
Copy link
Contributor

triska commented May 25, 2025

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

@jjtolton
Copy link
Author

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

Interesting, do you suppose $memory_stream/1 could opened as binary?

This line of code is very mysterious to me but I will investigate.

 let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);

@jjtolton
Copy link
Author

Does anyone know that the self.machine_st.arena is and why it's getting passed to Stream::from_owned_string?

@@ -445,4 +426,4 @@
chars_to_stream(Chars, Stream, _) :-
must_be(chars, Chars),
'$memory_stream'(Stream),
put_chars(Stream, Chars).
maplist(put_char(Stream), Chars).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format/3 should make this much faster, since it writes more at once.

@jjtolton
Copy link
Author

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

Interesting, do you suppose $memory_stream/1 could opened as binary?

This line of code is very mysterious to me but I will investigate.

 let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);

Oh, perhaps it already is in binary?

    #[inline]
    pub fn from_owned_string(string: String, arena: &mut Arena) -> Stream {
        Stream::Byte(arena_alloc!(
            StreamLayout::new(CharReader::new(ByteStream(Cursor::new(
                string.into_bytes()
            )))),
            arena
        ))
    }

@triska
Copy link
Contributor

triska commented May 25, 2025

The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.

@triska
Copy link
Contributor

triska commented May 25, 2025

In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)

@jjtolton
Copy link
Author

The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.

Sounds good, I will investigate in the morning

@jjtolton
Copy link
Author

In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)

Semantically I just assumed it would be an empty stream for writing into.

@jjtolton jjtolton changed the title Add char_stream/1 and chars_to_stream/{2,3} Add chars_stream/1 and chars_to_stream/{2,3} May 25, 2025
@jjtolton
Copy link
Author

Does anyone have any examples of byte semantics so I can write a spec to work against? i.e., what would it look like to write a byte to (or read from) a stream vs text?

@bakaq
Copy link
Contributor

bakaq commented May 25, 2025

Bytes would allow having 0xFF bytes for example, which is invalid UTF-8. I guess just generate some random uniform bytes and see if they round-trip.

@triska
Copy link
Contributor

triska commented May 25, 2025

You can use open/3 to open a file with type(binary) to test it!

@jjtolton
Copy link
Author

jjtolton commented May 26, 2025

I settled on this as an acceptable spec:

test("can read bytes",
     (
      chars_to_stream("abc", Stream, [type(binary)]),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C),
      A=97,
      B=98,
      C=99
      )).

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? 🤔

@Skgland
Copy link
Contributor

Skgland commented May 26, 2025

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? 🤔

The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964

@jjtolton
Copy link
Author

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? 🤔

The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964

amazing... I thought builtins.pl was just autogenerated stub code until now!

@jjtolton
Copy link
Author

Ok I ripped off the parser from builtins.pl as a proof-of-concept, and managed to get the memory stream reading/writing bytes.

Is it ok that this means that the "chars" could also be bytes, depending on the options?

test("can read/write bytes",
     (
      A=97,
      B=98,
      C=99,
      chars_to_stream([A,B,C], Stream, [type(binary)]),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C),
      put_byte(Stream, A),
      put_byte(Stream, B),
      put_byte(Stream, C),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C)
      )).

or should we come up with different semantics? Because I'm not sure it's always "chars" now.

Also, if anyone has any better suggestions for the implementation of parsing the options, I'd certainly be open to that feedback.

@@ -423,7 +423,85 @@
% Stream = stream('$memory_stream'(2048)).
% ```

chars_to_stream(Chars, Stream, _) :-
must_be(chars, Chars),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this? The interface should be checked prior to any side-effects.

@jjtolton
Copy link
Author

jjtolton commented May 30, 2025

Ok I cleaned up the parsing code and the validation logic. The last two commits are a little outside my comfort/confidence zone in terms of approach and style, so if anyone has any suggestions I'd certainly be interested. I'll take at least one more refactoring pass to clean up white space issues and layout.

Especially the error messages, I just totally made those up.

%% chars_to_stream(+Chars, -Stream, +Options) :-
% Creates a character stream from a list of characters.
%
% Chars is the list of characters to write to the stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chars is the list of characters that the stream will yield, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants