[Scheme-reports] get-output-string on closed ports

Discussion:

Alex Shinn

2015-01-26 03:17:07 UTC

None of SRFI 6, R6RS or R7RS specify what happens when you
call get-output-string on a string port which has been closed.

John took a survey (http://trac.sacrideo.us/wg/wiki/GetFromClosedStringPort)
and it looks like the de facto standard is that this "is an error."
I'm inclined to add an errata to that effect (similarly for
get-output-bytevector).

The pros to allowing access to an already closed port are:
1. the reference and many other implementations allow it
2. it can be useful in some idioms

The cons are:
1. it's already an error in many implementations
2. close-output-port is expected to free resources
3. arguably relying on it is poor coding style

Thoughts?

--
Alex

Takashi Kato

2015-01-30 07:31:52 UTC

Permalink

I think accessing closed port should be an error. The reason is already listed :)

Those procedures might be kind of exceptional but if they are allow accessing closed port

then, for me, why not others such as get-u8. So as long as it's an error in general, I

would prefer to keep it.

Cheers,

_/_/
Takashi Kato
Email: ***@ymail.com

William D Clinger

2015-01-30 15:25:29 UTC

Permalink

Please do. Reading from or writing to a closed port should always
be an error.

Will Clinger

Taylan Ulrich Bayırlı/Kammer

2015-01-30 21:14:27 UTC

Permalink

Please [make it an error]. Reading from or writing to a closed port
should always be an error.

Is using `get-output-string' really "reading" from the port?

As Ray Dillinger pointed out, there is a more general mismatch between
the APIs of file ports and string ports, and as Michael Montague pointed
out, this seems to lead to usages of `get-output-string' on closed
string ports to be desirable. Therefore I would see `get-output-string'
as a special-case and not expect it to conform to typical expectations
on a port API.

Taylan

c***@ccil.org

2015-01-30 21:52:16 UTC

Permalink

Post by Taylan Ulrich BayÄ±rlÄ±/Kammer
Is using `get-output-string' really "reading" from the port?

In SRFI 6 and R7RS, it is not; it simply retrieves all the
characters that have ever been written to the port and does
not side-effect the port. The R6RS analogue, however, retrieves
what is currently available on the port and then removes those
characters from the port, which is much more like reading.

--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Deshil Holles eamus. Deshil Holles eamus. Deshil Holles eamus.
Send us, bright one, light one, Horhorn, quickening, and wombfruit. (3x)
Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa!
--Joyce, Ulysses, "Oxen of the Sun"

Michael Montague

2015-01-30 17:52:19 UTC

Permalink

Lets say I have an existing routine 'write-stuff-and-close' which works
just fine with file ports and seems like a reasonable thing to do. If
get-output-string on a closed port is an error, then I can use a string
output port with the routine, but I can't get at the output. This seems
like an arbitrary restriction.

Post by Alex Shinn
None of SRFI 6, R6RS or R7RS specify what happens when you
call get-output-string on a string port which has been closed.
John took a survey (
http://trac.sacrideo.us/wg/wiki/GetFromClosedStringPort)
and it looks like the de facto standard is that this "is an error."
I'm inclined to add an errata to that effect (similarly for
get-output-bytevector).
1. the reference and many other implementations allow it
2. it can be useful in some idioms
1. it's already an error in many implementations
2. close-output-port is expected to free resources
3. arguably relying on it is poor coding style
Thoughts?
--
Alex
_______________________________________________
Scheme-reports mailing list
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Ray Dillinger

2015-01-30 18:24:06 UTC

Permalink

Post by Michael Montague
Lets say I have an existing routine 'write-stuff-and-close' which works
just fine with file ports and seems like a reasonable thing to do. If
get-output-string on a closed port is an error, then I can use a string
output port with the routine, but I can't get at the output. This seems
like an arbitrary restriction.

And what, exactly, does it mean to "close" a string port?
Is there any reason why you cannot or should not immediately
reopen the string port you just passed 'write-stuff-and-close'
when it returns if you want to read from it? After all,
you'd have to do the same with any file port if you wanted
to read from it after passing it to that routine.

Bear

Michael Montague

2015-01-30 18:32:25 UTC

Permalink

Post by Ray Dillinger

If there was a way to reopen an output string port and read from it that
would work and there would be no issue. But the only want to get the
contents of an output string port is via get-output-string.

Ray Dillinger

2015-01-30 18:42:36 UTC

Permalink

Post by Michael Montague

Post by Ray Dillinger

Ah! I see, there is no place in the standard where it says you have to
be able to use ports with strings in all the same ways you use ports
with files. I don't know how I missed that, I should have objected to
its absence before now!

If we have string ports in the first place, the inability to use
them like files - ie, open ports on them for reading at need - is
more likely to be the standardization problem that needs to be
addressed than any idea that reading from a closed port is other
than an error.

This also brings up the interesting possibility that get-output-string
ought to be defined on files as well, converting the contents of the
whole file into a string.

Bear

Jim Rees

2015-01-30 18:24:42 UTC

Permalink

Post by Alex Shinn
John took a survey (
http://trac.sacrideo.us/wg/wiki/GetFromClosedStringPort)
and it looks like the de facto standard is that this "is an error."
I'm inclined to add an errata to that effect (similarly for
get-output-bytevector).

11 implementations return the expected string.
11 implementations throw an exception.

That doesn't look like a "de facto standard" of "is an error" to me.

The string is not logically part of the "port" per se -- it represents the
backing store the port writes to, as the file on disk is to an output file
port. A string-port is not merely a derived class of output-port, it's
a composite of a port and an interface to retrieve that backing store.
So, closing the port should only "free resources" of the port part, but
leave the backing store available.

(btw, we're really talking about both get-output-string and
get-output-bytevector, right?)

I can imagine using ports for byte-stream transformations (like
compression/decompression or other types of encodings) where finalization
of the stream is required when you know the stream is to be terminated.
flush-output-port might not be an adequate means of termination if you
prefer a single delimited dataset in the backing store rather than a
sequence of them. So, an explicit close might be required before
retrieving the finalized data.

+1 to Michael Montague's argument as well.

c***@ccil.org

2015-01-30 19:31:32 UTC

Permalink

(Consolidated response)

I agree, the more so because 'write-stuff-and-close' is essentially a
special case of the R7RS standard routine 'call-with-port'.

Post by Michael Montague
11 implementations return the expected string.
11 implementations throw an exception.

Witnesses should be weighed, not counted. In addition, this evidence is
particularly weak, because we don't know which of the first set of
implementations returned the value intentionally, and which as a
non-guaranteed side effect of how they implemented output string ports.

Post by Michael Montague
That doesn't look like a "de facto standard" of "is an error" to me.

"Is an error" does not mean that an error is signaled (exception is
raised). It means the implementation can do whatever it wants, and the
user must take the consequences whatever they are. It corresponds
to "undefined behavior" in other language standards.

Post by Michael Montague
The string is not logically part of the "port" per se -- it represents
the backing store the port writes to, as the file on disk is to an
output file port. A string-port is not merely a derived class of
output-port, it's a composite of a port and an interface to retrieve
that backing store. So, closing the port should only "free resources"
of the port part, but leave the backing store available.

I agree with this reasoning. Closing a port should mean that I/O is no
longer possible on the port, not that there is nothing at allthat can be
done with the port.

Post by Michael Montague
(btw, we're really talking about both get-output-string and
get-output-bytevector, right?)

Yes, absolutely.

Post by Michael Montague
If we have string ports in the first place, the inability to use
them like files - ie, open ports on them for reading at need - is
more likely to be the standardization problem that needs to be
addressed than any idea that reading from a closed port is other
than an error.

You can open an input port on a string whenever you like. But a
string output port is not a port opened on an existing string
(which would be of little use, since Scheme strings are not
extensible), but a port that generates a newly allocated string.
Until you call get-output-string, there is no string on which
to open a future input port.

--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
"You need a change: try Canada" "You need a change: try China"
--fortune cookies opened by a couple that I know

Alex Shinn

2015-01-31 03:32:10 UTC

Permalink

Post by Jim Rees

11 implementations return the expected string.
11 implementations throw an exception.
That doesn't look like a "de facto standard" of "is an error" to me.

That's exactly what it looks like to me. Or rather, there is
no de facto standard, so it becomes "an error." If we were
to say otherwise we would be forcing authors to reimplement
string ports, and misleading users who would expect something
to be portable when it isn't.

The string is not logically part of the "port" per se -- it represents the

Post by Jim Rees
backing store the port writes to, as the file on disk is to an output file
port. A string-port is not merely a derived class of output-port, it's
a composite of a port and an interface to retrieve that backing store.
So, closing the port should only "free resources" of the port part, but
leave the backing store available.

From the user perspective there is no separation between
the port and any backing store, and they otherwise have no
way to free the resources. You should be able to reliably
accumulate a very large port and free it, or alternately
maintain many medium ports and not worry about them
hogging memory after closing.

I can imagine using ports for byte-stream transformations (like

Post by Jim Rees
compression/decompression or other types of encodings) where finalization
of the stream is required when you know the stream is to be terminated.
flush-output-port might not be an adequate means of termination if you
prefer a single delimited dataset in the backing store rather than a
sequence of them. So, an explicit close might be required before
retrieving the finalized data.

This is interesting, but I'm not sure that implicit final output
on closing a port is a good idea. Regardless, this is purely
hypothetical.

+1 to Michael Montague's argument as well.
That's what I acknowledged as an idiom taking advantage
of this. I've personally never seen a write-and-close API
though - usually it's call-proc-and-close, where you have
a callback before the final close.

Bear writes:

This also brings up the interesting possibility that get-output-string

Post by Jim Rees
ought to be defined on files as well, converting the contents of the
whole file into a string.

You mean get-output-string on file-backed ports? (Please
don't conflate ports, strings and files, they are all different
things for good reason.) One could imagine this extension
but I'm not sure it's useful, and introduces a whole host of
issues.

Overall, I haven't seen any arguments nearly strong enough
to force authors to change their implementations.

--
Alex

Shiro Kawai

2015-01-31 06:17:19 UTC

Permalink

Post by Alex Shinn
From the user perspective there is no separation between
the port and any backing store, and they otherwise have no
way to free the resources. You should be able to reliably
accumulate a very large port and free it, or alternately
maintain many medium ports and not worry about them
hogging memory after closing.

If you lose the reference to the port, won't the memory be GC'ed?

It is generally a bad practice to rely on freeing external resources, such
as file descriptors, by GC. But for memory, usually we rely on GC.

Besides, there can be a variation that the user provides a backing storage
to a port (gauche's open-output-uvector allows that), so the association
between ports and its backing storage doesn't need to be that tight, though
you could argue that standard string port is a special case.

Alex Shinn

2015-02-02 01:59:48 UTC

Permalink

Post by Shiro Kawai

Indeed, but for ports people expect resources are freed
immediately on closing (at least I do). Users may assume
that closed ports only take a constant amount of memory,
and not bother to free references to them.

Are you arguing in favor of allowing this behavior? The
only other 3 implementors to comment were all opposed.
Regardless, making a change in favor of this would be
adding new implementation requirements post-facto, and
would be beyond the scope of an errata. As-is, the result
is unspecified, and if all of the WG members agree we
could explicitly say it "is an error," but anything more than
that should be left for R8RS.

--
Alex

Shiro Kawai

2015-02-02 21:59:12 UTC

Permalink

(Oops, I only replied this to Alex. For the record, resending to
scheme-reports.)

I'm in favor of this feature but I don't push this to be included in r7rs
errata.
I agree it's too big change, and it's best to leave it undefined for now
(or, an error in r7rs-sense).

Post by Alex Shinn

Post by Shiro Kawai

Indeed, but for ports people expect resources are freed
immediately on closing (at least I do). Users may assume
that closed ports only take a constant amount of memory,
and not bother to free references to them.
Are you arguing in favor of allowing this behavior? The
only other 3 implementors to comment were all opposed.
Regardless, making a change in favor of this would be
adding new implementation requirements post-facto, and
would be beyond the scope of an errata. As-is, the result
is unspecified, and if all of the WG members agree we
could explicitly say it "is an error," but anything more than
that should be left for R8RS.
--
Alex

Arthur A. Gleckler

2015-02-02 22:06:07 UTC

Permalink

Post by Shiro Kawai
(Oops, I only replied this to Alex. For the record, resending to
scheme-reports.)
I'm in favor of this feature but I don't push this to be included in r7rs
errata.
I agree it's too big change, and it's best to leave it undefined for now
(or, an error in r7rs-sense).

I agree on both counts. This is not an oversight, not simply a mistake,
and hence shouldn't be considered an erratum.

Alex Shinn

2015-02-05 07:22:53 UTC

Permalink

Post by Arthur A. Gleckler

I agree on both counts. This is not an oversight, not simply a mistake,
and hence shouldn't be considered an erratum.

Well, it was just an oversight on my part, but since you're disagreeing as
a WG member I won't push and we'll have to leave this unspecified.

--
Alex

leppie

2015-02-06 06:44:02 UTC

Permalink

Just to pitch in.

get-output-string is only applicable on string output ports. Calling it on
any other port, is an error.
It is up to the implementer to decide whether calling close-port on a
string port should do something or nothing.
The way R6RS handles this, prevents one from having to expose a potentially
leaky abstraction as in SRFI 6.

IMO, the behavior should just be left unspecified in the spirit of R7RS.

leppie

Post by Alex Shinn

Post by Arthur A. Gleckler

Post by Shiro Kawai
(Oops, I only replied this to Alex. For the record, resending to
scheme-reports.)
I'm in favor of this feature but I don't push this to be included in
r7rs errata.
I agree it's too big change, and it's best to leave it undefined for now
(or, an error in r7rs-sense).

I agree on both counts. This is not an oversight, not simply a mistake,
and hence shouldn't be considered an erratum.

Well, it was just an oversight on my part, but since you're disagreeing as
a WG member I won't push and we'll have to leave this unspecified.
--
Alex
_______________________________________________
Scheme-reports mailing list
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

--
http://codeplex.com/IronScheme
http://xacc.wordpress.com

Alex Shinn

2015-02-07 01:12:28 UTC

Permalink

Post by leppie
Just to pitch in.
get-output-string is only applicable on string output ports. Calling it on
any other port, is an error.
It is up to the implementer to decide whether calling close-port on a
string port should do something or nothing.

Closing the port should _always_ make further I/O an error.

The way R6RS handles this, prevents one from having to expose a potentially

Post by leppie
leaky abstraction as in SRFI 6.

I'm not sure what you mean by leaky here.

I personally prefer the R6RS API, partly because the question
of get-output-string on non-string ports becomes a non-issue,
and partly because once you introduce custom ports, then
string ports can just be a library function.

But that's completely orthogonal to the discussion, and R6RS
has the same issue: calling the get-output-string procedure on
a closed port is unspecified.

--
Alex

John Cowan

2015-02-07 01:37:07 UTC

Permalink

Post by Alex Shinn
Closing the port should _always_ make further I/O an error.

Sure. The issue is whether calling get-output-string counts as I/O.

Post by Alex Shinn
The way R6RS handles this, prevents one from having to expose a potentially

Post by leppie
leaky abstraction as in SRFI 6.

I'm not sure what you mean by leaky here.

The point is that in R6RS the procedure which exposes the chars in
the port can be called without having to have the port itself,
because it closes over the port.

Post by Alex Shinn
I personally prefer the R6RS API, partly because the question
of get-output-string on non-string ports becomes a non-issue,
and partly because once you introduce custom ports, then
string ports can just be a library function.

I don't understand why those don't apply to the R6RS version as well.

Post by Alex Shinn
But that's completely orthogonal to the discussion, and R6RS
has the same issue: calling the get-output-string procedure on
a closed port is unspecified.

Agreed.

--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
"Make a case, man; you're full of naked assertions, just like Nietzsche."
"Oh, i suffer from that, too. But you know, naked assertions or GTFO."
--heard on #scheme, sorta

Alex Shinn

2015-02-07 09:02:27 UTC

Permalink

Post by leppie

Post by leppie
The way R6RS handles this, prevents one from having to expose a

potentially

Post by leppie

Post by leppie
leaky abstraction as in SRFI 6.

I'm not sure what you mean by leaky here.

The point is that in R6RS the procedure which exposes the chars in
the port can be called without having to have the port itself,
because it closes over the port.

I still don't understand what is meant by leaky, or why
this closure is any way preferable to a single object.

Post by leppie
I personally prefer the R6RS API, partly because the question

Post by leppie
of get-output-string on non-string ports becomes a non-issue,
and partly because once you introduce custom ports, then
string ports can just be a library function.

I don't understand why those don't apply to the R6RS version as well.

I was talking about the R6RS version. Did you
mean the SRFI 6 version?

The problem with a single global get-output-string
procedure is that it needs a way to get at the underlying
buffer associated with the string port, which you can't
do portably. The best you can do is, assuming you
have weak references, maintain a global weak hash
eqv? table mapping string port to buffer.

--
Alex

John Cowan

2015-02-07 13:51:51 UTC

Permalink

Post by Alex Shinn
I still don't understand what is meant by leaky, or why
this closure is any way preferable to a single object.

Because it lessens the amount of ambient authority. In SRFI 6,
I can't give you the authority to extract objects from the output port
without also giving you the authority to write to it.

Post by Alex Shinn

Post by Alex Shinn
I personally prefer the R6RS API, partly because the question

Post by Alex Shinn
of get-output-string on non-string ports becomes a non-issue,
and partly because once you introduce custom ports, then
string ports can just be a library function.

I don't understand why those don't apply to the R6RS version as well.

I was talking about the R6RS version. Did you
mean the SRFI 6 version?

Yes.

Post by Alex Shinn
The problem with a single global get-output-string
procedure is that it needs a way to get at the underlying
buffer associated with the string port, which you can't
do portably. The best you can do is, assuming you
have weak references, maintain a global weak hash
eqv? table mapping string port to buffer.

Custom ports always involve closing over some sort of data structure,
which can be a character sequence as well as anything else.

--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
"But I am the real Strider, fortunately," he said, looking down at them
with his face softened by a sudden smile. "I am Aragorn son of Arathorn,
and if by life or death I can save you, I will."

Alex Shinn

2015-02-07 14:19:05 UTC

Permalink

Post by John Cowan

Post by Alex Shinn
I still don't understand what is meant by leaky, or why
this closure is any way preferable to a single object.

Because it lessens the amount of ambient authority. In SRFI 6,
I can't give you the authority to extract objects from the output port
without also giving you the authority to write to it.

OK.

Post by John Cowan

Custom ports always involve closing over some sort of data structure,
which can be a character sequence as well as anything else.

Somehow I've still failed to convey my point, so I'll just write code.
In R6RS the following is a rough but fully portable implementation
of output string ports:

(define (open-string-output-port)
(let ((buf '()))
(values
(make-custom-textual-output-port
""
(lambda (str start count)
(let ((ls (string->list (substring str start (+ start count)))))
(set! buf (append (reverse ls) buf))
(length ls)))
#f #f #f)
(lambda ()
(let ((res (list->string (reverse buf))))
(set! buf '())
res)))))

You can't implement SRFI 6 output string ports without
non-portable extensions.

[This was already discussed when we were voting on it.]

--
Alex

John Cowan

2015-02-07 19:08:23 UTC

Permalink

Post by Alex Shinn
You can't implement SRFI 6 output string ports without
non-portable extensions.

Okay, got it.

But R6RS custom ports were designed around the port operations available
in R6RS. If the editors had decided to use SRFI 6 ports, there probably
would have been an 'extract' operation in custom output ports.

This, by the way, is why I'm not planning to propose a custom port
abstraction for R7RS-large. There are lots of different kinds of ports in
the wider Scheme world, and every one has some extensions or restrictions
that would have to be taken account of in a custom-port scheme in order
to make it really universal.

--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Monday we watch-a Firefly's house, but he no come out. He wasn't home.
Tuesday we go to the ball game, but he fool us. He no show up. Wednesday he
go to the ball game, and we fool him. We no show up. Thursday was a
double-header. Nobody show up. Friday it rained all day. There was no ball
game, so we stayed home and we listened to it on-a the radio. --Chicolini