Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Measure twice, cut once.


devel / comp.lang.forth / Re: Faster alternative to READ-LINE

SubjectAuthor
* Faster alternative to READ-LINEdxforth
`* Re: Faster alternative to READ-LINEnone
 +- Re: Faster alternative to READ-LINELorem Ipsum
 `* Re: Faster alternative to READ-LINEdxforth
  +* Re: Faster alternative to READ-LINEAnton Ertl
  |+* Re: Faster alternative to READ-LINEdxforth
  ||`* Re: Faster alternative to READ-LINEdxforth
  || `* Re: Faster alternative to READ-LINEnone
  ||  `- Re: Faster alternative to READ-LINEdxforth
  |`- Re: Faster alternative to READ-LINEnone
  `* Re: Faster alternative to READ-LINEnone
   +* Re: Faster alternative to READ-LINEdxforth
   |`* Re: Faster alternative to READ-LINES Jack
   | `- Re: Faster alternative to READ-LINEdxforth
   `- Re: Faster alternative to READ-LINELorem Ipsum

1
Faster alternative to READ-LINE

<ua35nq$2kjcn$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24127&group=comp.lang.forth#24127

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Faster alternative to READ-LINE
Date: Sat, 29 Jul 2023 23:55:06 +1000
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <ua35nq$2kjcn$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 29 Jul 2023 13:55:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2a13246e180d9b442ff319c4b5e4b631";
logging-data="2772375"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YQPTvO/AXhc4+RBDyuD+R"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:Id39GXVl/PpB6WZxcpIcazVzaG0=
Content-Language: en-GB
 by: dxforth - Sat, 29 Jul 2023 13:55 UTC

This was prompted after trying to run an app on CP/M after successfully
doing so on MS-DOS. The former was much slower than I had expected.
The only bottle-neck I could see was READ-LINE. The solution was to
read the input file in one go (or at least in large chunks) and parse
out the lines, joining partial ones when they occurred. The code makes
use of library functions. I've attempted to explain these for anyone
thinking of porting it.

https://pastebin.com/XpBmTFXW

Re: Faster alternative to READ-LINE

<nnd$21b1e920$080504ee@e4384190d554c1ee>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24128&group=comp.lang.forth#24128

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
References: <ua35nq$2kjcn$1@dont-email.me>
Subject: Re: Faster alternative to READ-LINE
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$21b1e920$080504ee@e4384190d554c1ee>
Organization: KPN B.V.
Date: Sun, 30 Jul 2023 01:11:19 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!feed.abavia.com!abe004.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 32
Injection-Date: Sun, 30 Jul 2023 01:11:19 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2203
 by: none - Sat, 29 Jul 2023 23:11 UTC

In article <ua35nq$2kjcn$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote:
>This was prompted after trying to run an app on CP/M after successfully
>doing so on MS-DOS. The former was much slower than I had expected.
>The only bottle-neck I could see was READ-LINE. The solution was to
>read the input file in one go (or at least in large chunks) and parse
>out the lines, joining partial ones when they occurred. The code makes
>use of library functions. I've attempted to explain these for anyone
>thinking of porting it.
>
>https://pastebin.com/XpBmTFXW

As long as you are buffering you can deal out the lines without
copying as a string constant (addr n).
I called this (ACCEPT). This does away with the need to provide
a buffer. You must not of course change the string constant
returned.

(ACCEPT)
STACKEFFECT: -- sc
Accept characters from the terminal, until a RET is received and
return the result as a constant string sc. It doesn't contain any line
ending, but the buffer still does and after 1+ the string ends in a
LF. The editing functions are the same as with ACCEPT . This is
lighter on the system and sometimes easier to use than ACCEPT This
input remains valid until the next time that the console buffer is
refilled.
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Faster alternative to READ-LINE

<ec512525-d968-4838-8033-2f3fbaea7e01n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24129&group=comp.lang.forth#24129

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:4c1b:b0:63c:e899:69e5 with SMTP id qh27-20020a0562144c1b00b0063ce89969e5mr21020qvb.13.1690679389051;
Sat, 29 Jul 2023 18:09:49 -0700 (PDT)
X-Received: by 2002:a05:6830:4805:b0:6ba:3da9:bf53 with SMTP id
dg5-20020a056830480500b006ba3da9bf53mr7157892otb.3.1690679388759; Sat, 29 Jul
2023 18:09:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sat, 29 Jul 2023 18:09:48 -0700 (PDT)
In-Reply-To: <nnd$21b1e920$080504ee@e4384190d554c1ee>
Injection-Info: google-groups.googlegroups.com; posting-host=63.114.57.174; posting-account=I-_H_woAAAA9zzro6crtEpUAyIvzd19b
NNTP-Posting-Host: 63.114.57.174
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ec512525-d968-4838-8033-2f3fbaea7e01n@googlegroups.com>
Subject: Re: Faster alternative to READ-LINE
From: gnuarm.deletethisbit@gmail.com (Lorem Ipsum)
Injection-Date: Sun, 30 Jul 2023 01:09:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Lorem Ipsum - Sun, 30 Jul 2023 01:09 UTC

On Saturday, July 29, 2023 at 7:11:27 PM UTC-4, none albert wrote:
> In article <ua35nq$2kjcn$1...@dont-email.me>, dxforth <dxf...@gmail.com> wrote:
> >This was prompted after trying to run an app on CP/M after successfully
> >doing so on MS-DOS. The former was much slower than I had expected.
> >The only bottle-neck I could see was READ-LINE. The solution was to
> >read the input file in one go (or at least in large chunks) and parse
> >out the lines, joining partial ones when they occurred. The code makes
> >use of library functions. I've attempted to explain these for anyone
> >thinking of porting it.
> >
> >https://pastebin.com/XpBmTFXW
> As long as you are buffering you can deal out the lines without
> copying as a string constant (addr n).
> I called this (ACCEPT). This does away with the need to provide
> a buffer. You must not of course change the string constant
> returned.
>
> (ACCEPT)
> STACKEFFECT: -- sc
> Accept characters from the terminal, until a RET is received and
> return the result as a constant string sc. It doesn't contain any line
> ending, but the buffer still does and after 1+ the string ends in a
> LF. The editing functions are the same as with ACCEPT . This is
> lighter on the system and sometimes easier to use than ACCEPT This
> input remains valid until the next time that the console buffer is
> refilled.

How is "RET" defined? "Terminal" is not by necessity the keyboard of a computer, right? So what ASCII sequence is equated to "RET"?

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Re: Faster alternative to READ-LINE

<ua4id9$2s1s9$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24130&group=comp.lang.forth#24130

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Sun, 30 Jul 2023 12:37:29 +1000
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <ua4id9$2s1s9$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<nnd$21b1e920$080504ee@e4384190d554c1ee>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 30 Jul 2023 02:37:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9a853c00c4c31f7367616615731e45ed";
logging-data="3016585"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jbwpmIUvREnznbNxGQM+b"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:mrl0HWaM2kPV91l7K4LMp5IgosM=
Content-Language: en-GB
In-Reply-To: <nnd$21b1e920$080504ee@e4384190d554c1ee>
 by: dxforth - Sun, 30 Jul 2023 02:37 UTC

On 30/07/2023 9:11 am, albert wrote:
> ...
> As long as you are buffering you can deal out the lines without
> copying as a string constant (addr n).

AFAICS that's only feasable if the whole file fits in memory. I don't
have that option.

The main issue in my case seems to be the file-repositioning that is
the basis of most READ-LINE implementations. Under CP/M it hits hard
for some reason. I should first verify the algorithm I used is as
efficient as can be. (It came from a C compiler, so how could it not? :)

Re: Faster alternative to READ-LINE

<2023Jul30.073239@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24131&group=comp.lang.forth#24131

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Sun, 30 Jul 2023 05:32:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 44
Message-ID: <2023Jul30.073239@mips.complang.tuwien.ac.at>
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="7152c87e78f5a05a3b29348e9e8e281d";
logging-data="3046115"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18bxVwL3qu7mKt42GqYceQ6"
Cancel-Lock: sha1:+HyhfTYriOzGK5iXc9K5HCFCzz0=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 30 Jul 2023 05:32 UTC

dxforth <dxforth@gmail.com> writes:
>The main issue in my case seems to be the file-repositioning that is
>the basis of most READ-LINE implementations. Under CP/M it hits hard
>for some reason.

Not just on CP/M. E.g., on Linux I reported
<2021Oct19.095538@mips.complang.tuwien.ac.at>:

|perf stat -e cycles:u -e instructions:u ~/gforth/gforth -e '"count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
| |count-unique.in is the input for Ben Hoyt's count-unique example and
|consists of 10 bibles, 998170 lines, or 43MB. On a 4GHz Skylake
|gforth-fast takes 802M cycles or 0.201s user time for this (and 31M
|cycles or 0.011s without the call to FOO).
| |[...]
|I have also measured SwiftForth 3.11.0 with the
|same benchmark:
| |[...]
| 5,945,307,655 cycles
| 768,834,475 cycles:u
| 5,131,957,800 cycles:k
| 733,948,005 instructions:u # 0.95 insn per cycle
| | 1.488727585 seconds time elapsed
| | 0.916473000 seconds user
| 0.572295000 seconds sys

I.e., this just measures reading a 43MB file with 998170 lines with
repeated READ-LINEs. Gforth uses buffered I/O (with the buffering
implemented by glibc), while SwiftForth uses unbuffered I/O and
REPOSITION-FILE. As a result, SwiftForth spends more than 7 times
more cycles on this benchmark than Gforth. I would be surprised if
the strategy used by SwiftForth fared better than buffering on other
OSs.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Faster alternative to READ-LINE

<ua5287$2stll$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24132&group=comp.lang.forth#24132

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Sun, 30 Jul 2023 17:07:51 +1000
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <ua5287$2stll$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
<2023Jul30.073239@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 30 Jul 2023 07:07:51 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9a853c00c4c31f7367616615731e45ed";
logging-data="3045045"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dOjdJ0P2iK98CMe5QD/wH"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:JbcOQ1cav8RAToxVnnIGIYNIuKQ=
Content-Language: en-GB
In-Reply-To: <2023Jul30.073239@mips.complang.tuwien.ac.at>
 by: dxforth - Sun, 30 Jul 2023 07:07 UTC

On 30/07/2023 3:32 pm, Anton Ertl wrote:
> ...
> I.e., this just measures reading a 43MB file with 998170 lines with
> repeated READ-LINEs. Gforth uses buffered I/O (with the buffering
> implemented by glibc), while SwiftForth uses unbuffered I/O and
> REPOSITION-FILE. As a result, SwiftForth spends more than 7 times
> more cycles on this benchmark than Gforth. I would be surprised if
> the strategy used by SwiftForth fared better than buffering on other
> OSs.

For INCLUDED et al, SwiftForth reads the entire file. For some reason
they use a different EOL scanner than is used for READ-LINE.

Further experiments with CP/M showed I didn't need to read all or most
of the file to gain a substantial speed increase. A 512 byte buffer
was sufficient to gain a 10x improvement over READ-LINE. Raising it
50KB produced no noticeable benefit. /EOL was written in assembler.

Re: Faster alternative to READ-LINE

<nnd$2adc1d02$147ebd36@7d83a8a0156d8440>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24133&group=comp.lang.forth#24133

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$2adc1d02$147ebd36@7d83a8a0156d8440>
Organization: KPN B.V.
Date: Sun, 30 Jul 2023 11:29:20 +0200
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe005.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 39
Injection-Date: Sun, 30 Jul 2023 11:29:20 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2249
 by: none - Sun, 30 Jul 2023 09:29 UTC

In article <ua4id9$2s1s9$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote:
>On 30/07/2023 9:11 am, albert wrote:
>> ...
>> As long as you are buffering you can deal out the lines without
>> copying as a string constant (addr n).
>
>AFAICS that's only feasable if the whole file fits in memory. I don't
>have that option.

I invented (ACCEPT) that exactly for the situation reading from a stream.
(I never ever use READ-LINE for a file that fits in memory.)
It goes as follows:
1.
If there is a ret after the pointer in the input buffer,
return the part between the pointer and the ret
advance pointer. you're done
2.
copy the remainder to the start of the buffer
read from the sream to fill the buffer
continue at step 1

And for Rick : RET is trimmed off, so why don you care?

>The main issue in my case seems to be the file-repositioning that is
>the basis of most READ-LINE implementations. Under CP/M it hits hard
>for some reason. I should first verify the algorithm I used is as
>efficient as can be. (It came from a C compiler, so how could it not? :)

Under CP/M file segments are visible. I would read only whole segments
which made the above algorithm even simpler. The copying step
is superfluous.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Faster alternative to READ-LINE

<nnd$31530b15$0209f5d0@7d83a8a0156d8440>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24134&group=comp.lang.forth#24134

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me> <2023Jul30.073239@mips.complang.tuwien.ac.at>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$31530b15$0209f5d0@7d83a8a0156d8440>
Organization: KPN B.V.
Date: Sun, 30 Jul 2023 11:34:38 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe004.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 56
Injection-Date: Sun, 30 Jul 2023 11:34:38 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2960
 by: none - Sun, 30 Jul 2023 09:34 UTC

In article <2023Jul30.073239@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>dxforth <dxforth@gmail.com> writes:
>>The main issue in my case seems to be the file-repositioning that is
>>the basis of most READ-LINE implementations. Under CP/M it hits hard
>>for some reason.
>
>Not just on CP/M. E.g., on Linux I reported
><2021Oct19.095538@mips.complang.tuwien.ac.at>:
>
>|perf stat -e cycles:u -e instructions:u ~/gforth/gforth -e
>'"count-unique.in" r/o open-file throw constant f create buf 256 allot :
>foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr
>bye'
>|
>|count-unique.in is the input for Ben Hoyt's count-unique example and
>|consists of 10 bibles, 998170 lines, or 43MB. On a 4GHz Skylake
>|gforth-fast takes 802M cycles or 0.201s user time for this (and 31M
>|cycles or 0.011s without the call to FOO).
>|
>|[...]
>|I have also measured SwiftForth 3.11.0 with the
>|same benchmark:
>|
>|[...]
>| 5,945,307,655 cycles
>| 768,834,475 cycles:u
>| 5,131,957,800 cycles:k
>| 733,948,005 instructions:u # 0.95 insn per cycle
>|
>| 1.488727585 seconds time elapsed
>|
>| 0.916473000 seconds user
>| 0.572295000 seconds sys
>
>I.e., this just measures reading a 43MB file with 998170 lines with
>repeated READ-LINEs. Gforth uses buffered I/O (with the buffering
>implemented by glibc), while SwiftForth uses unbuffered I/O and
>REPOSITION-FILE. As a result, SwiftForth spends more than 7 times
>more cycles on this benchmark than Gforth. I would be surprised if
>the strategy used by SwiftForth fared better than buffering on other
>OSs.

I now possess a workstation with 256 Gbyte. Not slurping a mere
43 Mbyte file seems ludicrous.
Was there ever a CP/M system with 43 Mbyte disk storage?

(The results are interesting, nevertheless, but hardly surprising.)

>- anton
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Faster alternative to READ-LINE

<ua5hdr$2ub47$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24135&group=comp.lang.forth#24135

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Sun, 30 Jul 2023 21:26:51 +1000
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <ua5hdr$2ub47$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
<nnd$2adc1d02$147ebd36@7d83a8a0156d8440>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 30 Jul 2023 11:26:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9a853c00c4c31f7367616615731e45ed";
logging-data="3091591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Kqhdy7AI/KPd0+Pdr4301"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:/EM65lE9i7Kt9jP+UVQLHcrAPyA=
Content-Language: en-GB
In-Reply-To: <nnd$2adc1d02$147ebd36@7d83a8a0156d8440>
 by: dxforth - Sun, 30 Jul 2023 11:26 UTC

On 30/07/2023 7:29 pm, albert wrote:
> In article <ua4id9$2s1s9$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote:
>> On 30/07/2023 9:11 am, albert wrote:
>>> ...
>>> As long as you are buffering you can deal out the lines without
>>> copying as a string constant (addr n).
>>
>> AFAICS that's only feasable if the whole file fits in memory. I don't
>> have that option.
>
> I invented (ACCEPT) that exactly for the situation reading from a stream.
> (I never ever use READ-LINE for a file that fits in memory.)
> It goes as follows:
> 1.
> If there is a ret after the pointer in the input buffer,
> return the part between the pointer and the ret
> advance pointer. you're done
> 2.
> copy the remainder to the start of the buffer
> read from the sream to fill the buffer
> continue at step 1

That crossed my mind.

>> The main issue in my case seems to be the file-repositioning that is
>> the basis of most READ-LINE implementations. Under CP/M it hits hard
>> for some reason. I should first verify the algorithm I used is as
>> efficient as can be. (It came from a C compiler, so how could it not? :)
>
> Under CP/M file segments are visible. I would read only whole segments
> which made the above algorithm even simpler. The copying step
> is superfluous.

That I didn't consider ... just because a buffer is a given size, doesn't
mean one has to fill it. In any case, I'm back to using a small buffer
and the copying appears to have negligible impact.

Re: Faster alternative to READ-LINE

<72f6e3c5-e608-4b46-a76e-74fbca700307n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24136&group=comp.lang.forth#24136

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:b407:0:b0:767:3d3e:4de9 with SMTP id d7-20020a37b407000000b007673d3e4de9mr21306qkf.4.1690728085385;
Sun, 30 Jul 2023 07:41:25 -0700 (PDT)
X-Received: by 2002:a05:6830:12c1:b0:6b9:a422:9f with SMTP id
a1-20020a05683012c100b006b9a422009fmr8713566otq.1.1690728085060; Sun, 30 Jul
2023 07:41:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sun, 30 Jul 2023 07:41:24 -0700 (PDT)
In-Reply-To: <ua5hdr$2ub47$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:3f7a:20d0:824:ea95:a79f:181b;
posting-account=V5nGoQoAAAC_P2U0qnxm2kC0s1jNJXJa
NNTP-Posting-Host: 2600:1700:3f7a:20d0:824:ea95:a79f:181b
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee>
<ua4id9$2s1s9$1@dont-email.me> <nnd$2adc1d02$147ebd36@7d83a8a0156d8440> <ua5hdr$2ub47$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <72f6e3c5-e608-4b46-a76e-74fbca700307n@googlegroups.com>
Subject: Re: Faster alternative to READ-LINE
From: sdwjack69@gmail.com (S Jack)
Injection-Date: Sun, 30 Jul 2023 14:41:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1940
 by: S Jack - Sun, 30 Jul 2023 14:41 UTC

On Sunday, July 30, 2023 at 6:26:56 AM UTC-5, dxforth wrote:

I do the same as Albert, read in a buffer then point to a
line and work down the buffer till no more line feed then
move the residue up and do again.
Back in DOS days had heard that 4096 byte buffer was
optimum for the read so that was what I used but I also
used much smaller buffer without problem.
This is still on a line basis. So I also had a stream
input for things like production HTML which has no line
ending.
Note also that Jones Forth inputs on a word basis not line,
why no OK prompt, and should have no problem with steams.
--
me

Re: Faster alternative to READ-LINE

<5d8be8d6-a6fb-407a-a6a2-2155c8953e37n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24137&group=comp.lang.forth#24137

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:8707:0:b0:76c:a96d:810c with SMTP id j7-20020a378707000000b0076ca96d810cmr5226qkd.7.1690737046973;
Sun, 30 Jul 2023 10:10:46 -0700 (PDT)
X-Received: by 2002:a05:6870:d899:b0:1bb:52fa:7cf6 with SMTP id
dv25-20020a056870d89900b001bb52fa7cf6mr9117970oab.2.1690737046612; Sun, 30
Jul 2023 10:10:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sun, 30 Jul 2023 10:10:46 -0700 (PDT)
In-Reply-To: <nnd$2adc1d02$147ebd36@7d83a8a0156d8440>
Injection-Info: google-groups.googlegroups.com; posting-host=65.207.89.54; posting-account=I-_H_woAAAA9zzro6crtEpUAyIvzd19b
NNTP-Posting-Host: 65.207.89.54
References: <ua35nq$2kjcn$1@dont-email.me> <nnd$21b1e920$080504ee@e4384190d554c1ee>
<ua4id9$2s1s9$1@dont-email.me> <nnd$2adc1d02$147ebd36@7d83a8a0156d8440>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5d8be8d6-a6fb-407a-a6a2-2155c8953e37n@googlegroups.com>
Subject: Re: Faster alternative to READ-LINE
From: gnuarm.deletethisbit@gmail.com (Lorem Ipsum)
Injection-Date: Sun, 30 Jul 2023 17:10:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2641
 by: Lorem Ipsum - Sun, 30 Jul 2023 17:10 UTC

On Sunday, July 30, 2023 at 5:29:24 AM UTC-4, none albert wrote:
> In article <ua4id9$2s1s9$1...@dont-email.me>, dxforth <dxf...@gmail.com> wrote:
> >On 30/07/2023 9:11 am, albert wrote:
> >> ...
> >> As long as you are buffering you can deal out the lines without
> >> copying as a string constant (addr n).
> >
> >AFAICS that's only feasable if the whole file fits in memory. I don't
> >have that option.
> I invented (ACCEPT) that exactly for the situation reading from a stream.
> (I never ever use READ-LINE for a file that fits in memory.)
> It goes as follows:
> 1.
> If there is a ret after the pointer in the input buffer,
> return the part between the pointer and the ret
> advance pointer. you're done
> 2.
> copy the remainder to the start of the buffer
> read from the sream to fill the buffer
> continue at step 1
>
> And for Rick : RET is trimmed off, so why don you care?

Because it has to be sent to be recognized. The specification for RET determines what is sent from the terminal.

This is not purely about the program. Programs have to live in a world of things. There are a lot more devices using <$10 MCUs than there are >$100 CPUs.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Re: Faster alternative to READ-LINE

<ua743r$324br$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24138&group=comp.lang.forth#24138

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Mon, 31 Jul 2023 11:51:56 +1000
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <ua743r$324br$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
<nnd$2adc1d02$147ebd36@7d83a8a0156d8440> <ua5hdr$2ub47$1@dont-email.me>
<72f6e3c5-e608-4b46-a76e-74fbca700307n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 31 Jul 2023 01:51:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="17b79da9c09855deade7792043e19dd8";
logging-data="3215739"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EWIjB24giMIjchrkB0Hvn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:UZ2JtL2tcoudUipvExsXdA4DjqY=
Content-Language: en-GB
In-Reply-To: <72f6e3c5-e608-4b46-a76e-74fbca700307n@googlegroups.com>
 by: dxforth - Mon, 31 Jul 2023 01:51 UTC

On 31/07/2023 12:41 am, S Jack wrote:
> On Sunday, July 30, 2023 at 6:26:56 AM UTC-5, dxforth wrote:
>
> I do the same as Albert, read in a buffer then point to a
> line and work down the buffer till no more line feed then
> move the residue up and do again.
> Back in DOS days had heard that 4096 byte buffer was
> optimum for the read so that was what I used but I also
> used much smaller buffer without problem.

What happens should a string exceed the buffer size? Mine keeps
working but has the opposite problem of potentially overwriting
memory if there's no length check.

Re: Faster alternative to READ-LINE

<uaa03r$3ijne$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24141&group=comp.lang.forth#24141

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Tue, 1 Aug 2023 14:02:03 +1000
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <uaa03r$3ijne$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<nnd$21b1e920$080504ee@e4384190d554c1ee> <ua4id9$2s1s9$1@dont-email.me>
<2023Jul30.073239@mips.complang.tuwien.ac.at> <ua5287$2stll$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 1 Aug 2023 04:02:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e4e7a73992bdb46a120c43ef48d69f0a";
logging-data="3755758"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Fevd8l5fG+Cl+WkPmSgQe"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:NJF8n0+pxTJZo79jeXBlHSMLvQ8=
In-Reply-To: <ua5287$2stll$1@dont-email.me>
Content-Language: en-GB
 by: dxforth - Tue, 1 Aug 2023 04:02 UTC

On 30/07/2023 5:07 pm, dxforth wrote:
> ...
> Further experiments with CP/M showed I didn't need to read all or most
> of the file to gain a substantial speed increase. A 512 byte buffer
> was sufficient to gain a 10x improvement over READ-LINE. Raising it
> 50KB produced no noticeable benefit. /EOL was written in assembler.

Unfortunately it has a bug. If the last byte in the buffer is the
first char of a CRLF it will result in an additional empty line.
Initial tests with a large buffer didn't reveal the problem. The EOL
scanner itself is fine. It simply wasn't intended for use in a stream
which doesn't backtrack.

Re: Faster alternative to READ-LINE

<nnd$5af4f366$61549126@dd4c164c3dc5c7d7>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24142&group=comp.lang.forth#24142

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
References: <ua35nq$2kjcn$1@dont-email.me> <2023Jul30.073239@mips.complang.tuwien.ac.at> <ua5287$2stll$1@dont-email.me> <uaa03r$3ijne$1@dont-email.me>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$5af4f366$61549126@dd4c164c3dc5c7d7>
Organization: KPN B.V.
Date: Tue, 01 Aug 2023 10:43:10 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe004.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 27
Injection-Date: Tue, 01 Aug 2023 10:43:10 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2016
 by: none - Tue, 1 Aug 2023 08:43 UTC

In article <uaa03r$3ijne$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote:
>On 30/07/2023 5:07 pm, dxforth wrote:
>> ...
>> Further experiments with CP/M showed I didn't need to read all or most
>> of the file to gain a substantial speed increase. A 512 byte buffer
>> was sufficient to gain a 10x improvement over READ-LINE. Raising it
>> 50KB produced no noticeable benefit. /EOL was written in assembler.
>
>Unfortunately it has a bug. If the last byte in the buffer is the
>first char of a CRLF it will result in an additional empty line.
>Initial tests with a large buffer didn't reveal the problem. The EOL
>scanner itself is fine. It simply wasn't intended for use in a stream
>which doesn't backtrack.
>
Use a circular buffer array that corresponds to disk sectors.
If you discover that you can't do READ-LINE because the line endings
are not present in the buffers, read the buffers that have become
empty. Then try again.
How long the line endings are doesn't matter.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Faster alternative to READ-LINE

<uab0im$3ljm0$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24143&group=comp.lang.forth#24143

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dxforth@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Faster alternative to READ-LINE
Date: Tue, 1 Aug 2023 23:16:06 +1000
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <uab0im$3ljm0$1@dont-email.me>
References: <ua35nq$2kjcn$1@dont-email.me>
<2023Jul30.073239@mips.complang.tuwien.ac.at> <ua5287$2stll$1@dont-email.me>
<uaa03r$3ijne$1@dont-email.me> <nnd$5af4f366$61549126@dd4c164c3dc5c7d7>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 1 Aug 2023 13:16:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e4e7a73992bdb46a120c43ef48d69f0a";
logging-data="3854016"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/y4nMAcwFWvkx+hXaqF1yt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:CpWrdhxfWq2n8CNPc1MfwjFv5xo=
In-Reply-To: <nnd$5af4f366$61549126@dd4c164c3dc5c7d7>
Content-Language: en-GB
 by: dxforth - Tue, 1 Aug 2023 13:16 UTC

On 1/08/2023 6:43 pm, albert wrote:
> In article <uaa03r$3ijne$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote:
>> On 30/07/2023 5:07 pm, dxforth wrote:
>>> ...
>>> Further experiments with CP/M showed I didn't need to read all or most
>>> of the file to gain a substantial speed increase. A 512 byte buffer
>>> was sufficient to gain a 10x improvement over READ-LINE. Raising it
>>> 50KB produced no noticeable benefit. /EOL was written in assembler.
>>
>> Unfortunately it has a bug. If the last byte in the buffer is the
>> first char of a CRLF it will result in an additional empty line.
>> Initial tests with a large buffer didn't reveal the problem. The EOL
>> scanner itself is fine. It simply wasn't intended for use in a stream
>> which doesn't backtrack.
>>
> Use a circular buffer array that corresponds to disk sectors.
> If you discover that you can't do READ-LINE because the line endings
> are not present in the buffers, read the buffers that have become
> empty. Then try again.
> How long the line endings are doesn't matter.

In the end I used a buffered structure I previously implemented for
reading files a byte at a time. All that was needed was turning the
bytes into lines. Usage is similar to READ-LINE.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor