Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Friction is a drag.


devel / comp.lang.tcl / Safe handling of lists

SubjectAuthor
* Safe handling of listsLuc
+- Safe handling of listsGerald Lester
+* Safe handling of listset99
|`* Safe handling of listsLuc
| +* Safe handling of listset99
| |`* Safe handling of listsLuc
| | `- Safe handling of listsAlan Grunwald
| `* Safe handling of listsRalf Fassel
|  `* Safe handling of listset99
|   +- Safe handling of listsRich
|   `* Safe handling of listsLuc
|    `* Safe handling of listsRich
|     +* Safe handling of listset99
|     |`- Safe handling of listsRich
|     `* Safe handling of listsRalf Fassel
|      `* Safe handling of listsRich
|       `* Safe handling of listsLuc
|        +- Safe handling of listset99
|        `- Safe handling of listsRich
+* Safe handling of listsPaul Obermeier
|`* Safe handling of listsLuc
| `* Safe handling of listsRich
|  `- Safe handling of listsPaul Obermeier
`* Safe handling of listsPeter Dean
 `* Safe handling of listsChristian Gollwitzer
  `- Safe handling of listsPeter Dean

Pages:12
Safe handling of lists

<20231126162914.126fd99f@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12839&group=comp.lang.tcl#12839

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Safe handling of lists
Date: Sun, 26 Nov 2023 16:29:14 -0300
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <20231126162914.126fd99f@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="e35824f83d1547aacfa9561f882f02c0";
logging-data="3559845"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/l3+Kd5sAX7Y5k2zP4XS7ScfIxvzeXv5E="
Cancel-Lock: sha1:1/ORIZ/UIFJ5rlOAFISSbgb83GY=
 by: Luc - Sun, 26 Nov 2023 19:29 UTC

Me again. I have a problem dealing with lists.

I wanted to count the words in a text widget that contains the text
of a file. I decided to treat the whole thing like a list and iterate
over it to count the list elements, possibily filtering some things
out.

It worked fine with a small file, but a large (very large) file
triggers this:

list element in quotes followed by "," instead of space
while executing
"foreach w $::FILECONTENT {
incr wordcount
}"
(procedure "p.wc" line 5)
invoked from within
"p.wc"

Also relevant,

set ::FILECONTENT [$::text get 1.0 end]

It's probably something obvious that I am missing again. Can someone
please enlighten me?

--
Luc
>>

Re: Safe handling of lists

<6kN8N.86907$_Oab.36082@fx15.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12840&group=comp.lang.tcl#12840

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Safe handling of lists
Content-Language: en-US
Newsgroups: comp.lang.tcl
References: <20231126162914.126fd99f@lud1.home>
From: Gerald.Lester@gmail.com (Gerald Lester)
In-Reply-To: <20231126162914.126fd99f@lud1.home>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 37
Message-ID: <6kN8N.86907$_Oab.36082@fx15.iad>
X-Complaints-To: abuse@fastusenet.org
NNTP-Posting-Date: Sun, 26 Nov 2023 19:50:26 UTC
Organization: fastusenet - www.fastusenet.org
Date: Sun, 26 Nov 2023 13:50:26 -0600
X-Received-Bytes: 1747
 by: Gerald Lester - Sun, 26 Nov 2023 19:50 UTC

On 11/26/23 13:29, Luc wrote:
> Me again. I have a problem dealing with lists.
>
> I wanted to count the words in a text widget that contains the text
> of a file. I decided to treat the whole thing like a list and iterate
> over it to count the list elements, possibily filtering some things
> out.
>
> It worked fine with a small file, but a large (very large) file
> triggers this:
>
>
> list element in quotes followed by "," instead of space
> while executing
> "foreach w $::FILECONTENT {
> incr wordcount
> }"
> (procedure "p.wc" line 5)
> invoked from within
> "p.wc"
>
> Also relevant,
>
> set ::FILECONTENT [$::text get 1.0 end]
>
> It's probably something obvious that I am missing again. Can someone
> please enlighten me?
>

Every list is a string, but not every string is a list.

I would suggest that you take a look a the following builtins:
tcl_endOfWord str start
tcl_startOfNextWord str start
tcl_startOfPreviousWord str start
tcl_wordBreakAfter str start
tcl_wordBreakBefore str start

Re: Safe handling of lists

<uk0dnk$3dr16$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12841&group=comp.lang.tcl#12841

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: et99@rocketship1.me (et99)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 13:35:48 -0800
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <uk0dnk$3dr16$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Nov 2023 21:35:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5eee136b7aa366e086c4d34edcca5877";
logging-data="3599398"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+oX9X4+mMlA2xsxWCN8cy"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:viy02db/+tFjpJNwcoPxkbcejj4=
Content-Language: en-US
In-Reply-To: <20231126162914.126fd99f@lud1.home>
 by: et99 - Sun, 26 Nov 2023 21:35 UTC

On 11/26/2023 11:29 AM, Luc wrote:
> Me again. I have a problem dealing with lists.
>
> I wanted to count the words in a text widget that contains the text
> of a file. I decided to treat the whole thing like a list and iterate
> over it to count the list elements, possibily filtering some things
> out.
>
> It worked fine with a small file, but a large (very large) file
> triggers this:
>
>
> list element in quotes followed by "," instead of space
> while executing
> "foreach w $::FILECONTENT {
> incr wordcount
> }"
> (procedure "p.wc" line 5)
> invoked from within
> "p.wc"
>
> Also relevant,
>
> set ::FILECONTENT [$::text get 1.0 end]
>
> It's probably something obvious that I am missing again. Can someone
> please enlighten me?
>
I think you want [split]

Re: Safe handling of lists

<20231126190047.0403d10d@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12842&group=comp.lang.tcl#12842

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 19:00:47 -0300
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <20231126190047.0403d10d@lud1.home>
References: <20231126162914.126fd99f@lud1.home>
<uk0dnk$3dr16$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="e35824f83d1547aacfa9561f882f02c0";
logging-data="3603748"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+K/GzZ08114EEKR4BpOzFH0XpkZBlWuQI="
Cancel-Lock: sha1:1KXLHu1B2ehi4sl4Q3dPvPTxrbI=
 by: Luc - Sun, 26 Nov 2023 22:00 UTC

On Sun, 26 Nov 2023 13:35:48 -0800, et99 wrote:

>I think you want [split]
>
**************************

I am using split now. It's faster and "list safe" so it solves
the problem I presented first.

The new problem now is that cleaning up the huge string for proper
counting is not fast enough.

proc p.wc {} {
set wordcount 0
set content [$::text get 1.0 end]
set cleancontent [string map "\n { } \t { }" $content]
set wordcount [llength [split $cleancontent { }]]
return $wordcount
}

Since it's called whenever some change is made to the text widget,
typing becomes unacceptably slow.

And I still haven't addded a line to clean all the multiple
consecutive spaces, which changes the tally. I can't use regexp
because it's too slow for what I want.

Another (debatable?) problem is that the old code gave me a count
that was a lot closer to the output of 'wc -w' in a terminal.
This new one is way off, and it gives me a lower count! One would
think it would be higher because of the consecutive spaces.

--
Luc
>>

Re: Safe handling of lists

<uk0gsb$3e868$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12843&group=comp.lang.tcl#12843

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: obermeier@poSoft.de (Paul Obermeier)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 23:29:19 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <uk0gsb$3e868$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Nov 2023 22:29:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b9144680d38f0829c65215c9bcd36a3e";
logging-data="3612872"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18M3+0RHNnpx9jlziZTEn4bVoEG9kz56Y8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3mTJtGeDnu1UmAafeV6da9u9lmA=
In-Reply-To: <20231126162914.126fd99f@lud1.home>
 by: Paul Obermeier - Sun, 26 Nov 2023 22:29 UTC

Am 26.11.2023 um 20:29 schrieb Luc:
> Me again. I have a problem dealing with lists.
>
> I wanted to count the words in a text widget that contains the text
> of a file. I decided to treat the whole thing like a list and iterate
> over it to count the list elements, possibily filtering some things
> out.
>
> It worked fine with a small file, but a large (very large) file
> triggers this:
>
>
> list element in quotes followed by "," instead of space
> while executing
> "foreach w $::FILECONTENT {
> incr wordcount
> }"
> (procedure "p.wc" line 5)
> invoked from within
> "p.wc"
>
> Also relevant,
>
> set ::FILECONTENT [$::text get 1.0 end]
>
> It's probably something obvious that I am missing again. Can someone
> please enlighten me?
>

Take a look at my CAWT extension. I contains a CountWords procedure,
see https://www.tcl3d.org/cawt/download/CawtReference-Cawt.html#::Cawt::CountWords

Paul

Re: Safe handling of lists

<20231126195458.37750cb8@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12844&group=comp.lang.tcl#12844

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 19:54:58 -0300
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <20231126195458.37750cb8@lud1.home>
References: <20231126162914.126fd99f@lud1.home>
<uk0gsb$3e868$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="e35824f83d1547aacfa9561f882f02c0";
logging-data="3620765"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lv3QWvt+GCxjr0WwVZVUlbqwEsEvCO8s="
Cancel-Lock: sha1:KE+RKvGlZF6i/Kcmm2y9nyETOjc=
 by: Luc - Sun, 26 Nov 2023 22:54 UTC

On Sun, 26 Nov 2023 23:29:19 +0100, Paul Obermeier wrote:

>Take a look at my CAWT extension. I contains a CountWords procedure,
>see
>https://www.tcl3d.org/cawt/download/CawtReference-Cawt.html#::Cawt::CountWords
>
>Paul
**************************

"CAWT (COM Automation With Tcl) is a utility package based on Twapi
to script Microsoft Windows® applications with Tcl."

Thanks, but I am on Linux.

--
Luc
>>

Re: Safe handling of lists

<uk0ira$3eho5$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12845&group=comp.lang.tcl#12845

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 23:03:06 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <uk0ira$3eho5$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0gsb$3e868$1@dont-email.me> <20231126195458.37750cb8@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 26 Nov 2023 23:03:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="da40531bcf7dfd78c39ad908ce20b3b9";
logging-data="3622661"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qNIFui6bmKfCjop+kcVpP"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.117 (x86_64))
Cancel-Lock: sha1:hKBsF2Jg9EvrNmcMd3G7KHo8C9Y=
 by: Rich - Sun, 26 Nov 2023 23:03 UTC

Luc <luc@sep.invalid> wrote:
> On Sun, 26 Nov 2023 23:29:19 +0100, Paul Obermeier wrote:
>
>>Take a look at my CAWT extension. I contains a CountWords procedure,
>>see
>>https://www.tcl3d.org/cawt/download/CawtReference-Cawt.html#::Cawt::CountWords
>>
>>Paul
> **************************
>
>
> "CAWT (COM Automation With Tcl) is a utility package based on Twapi
> to script Microsoft Windows® applications with Tcl."

But, if the source is available, you could look at the source to see
how it performs "word counting" and use that for inspiration.

Re: Safe handling of lists

<uk0n44$3f5vk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12846&group=comp.lang.tcl#12846

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: obermeier@poSoft.de (Paul Obermeier)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 01:15:52 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uk0n44$3f5vk$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0gsb$3e868$1@dont-email.me>
<20231126195458.37750cb8@lud1.home> <uk0ira$3eho5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Nov 2023 00:16:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="75556486e2b435ae3a7993dabc7da4ff";
logging-data="3643380"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18m7WDB2FdTEk9Oh96xKS1U629rZQIfaDk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:94eEoLl4iqsOEuevVwstvVIL4Mg=
In-Reply-To: <uk0ira$3eho5$1@dont-email.me>
 by: Paul Obermeier - Mon, 27 Nov 2023 00:15 UTC

Am 27.11.2023 um 00:03 schrieb Rich:
> Luc <luc@sep.invalid> wrote:
>> On Sun, 26 Nov 2023 23:29:19 +0100, Paul Obermeier wrote:
>>
>>> Take a look at my CAWT extension. I contains a CountWords procedure,
>>> see
>>> https://www.tcl3d.org/cawt/download/CawtReference-Cawt.html#::Cawt::CountWords
>>>
>>> Paul
>> **************************
>>
>>
>> "CAWT (COM Automation With Tcl) is a utility package based on Twapi
>> to script Microsoft Windows® applications with Tcl."
>
> But, if the source is available, you could look at the source to see
> how it performs "word counting" and use that for inspiration.

That was the idea.

Re: Safe handling of lists

<uk0och$3f972$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12847&group=comp.lang.tcl#12847

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: et99@rocketship1.me (et99)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Sun, 26 Nov 2023 16:37:37 -0800
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <uk0och$3f972$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Nov 2023 00:37:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b9b08d2bd5a015e61ce612e7a4b778fb";
logging-data="3646690"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19yfG083b8SXM+dYH83uG5C"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:Vn8imSKgAh6COphV3C8r/pdsQgo=
Content-Language: en-US
In-Reply-To: <20231126190047.0403d10d@lud1.home>
 by: et99 - Mon, 27 Nov 2023 00:37 UTC

On 11/26/2023 2:00 PM, Luc wrote:
> On Sun, 26 Nov 2023 13:35:48 -0800, et99 wrote:
>
>> I think you want [split]
>>
> **************************
>
> I am using split now. It's faster and "list safe" so it solves
> the problem I presented first.
>
> The new problem now is that cleaning up the huge string for proper
> counting is not fast enough.
>
>
> proc p.wc {} {
> set wordcount 0
> set content [$::text get 1.0 end]
> set cleancontent [string map "\n { } \t { }" $content]
> set wordcount [llength [split $cleancontent { }]]
> return $wordcount
> }
>
>
> Since it's called whenever some change is made to the text widget,
> typing becomes unacceptably slow.
>
> And I still haven't addded a line to clean all the multiple
> consecutive spaces, which changes the tally. I can't use regexp
> because it's too slow for what I want.
>
> Another (debatable?) problem is that the old code gave me a count
> that was a lot closer to the output of 'wc -w' in a terminal.
> This new one is way off, and it gives me a lower count! One would
> think it would be higher because of the consecutive spaces.
>

Just wondering, why the string map to change newlines and tabs to spaces. Split can take those plus spaces in the splitchars string. In fact, I think that's the default anyway.

Also, are you saying this is calculated on every char the user types? Is that to keep a wordcount in say a status area? How big is the text we're talking about here?

Re: Safe handling of lists

<20231127000547.27b68113@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12851&group=comp.lang.tcl#12851

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 00:05:47 -0300
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <20231127000547.27b68113@lud1.home>
References: <20231126162914.126fd99f@lud1.home>
<uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<uk0och$3f972$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="545231d0b6700bdf54da8ac8f8b6a353";
logging-data="3728366"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1etkwTyWJPbhmRHaQ1SUVKT2c0gP2120="
Cancel-Lock: sha1:PlVVLbsu+YokwWYJR+pdDJ0Q/Jg=
 by: Luc - Mon, 27 Nov 2023 03:05 UTC

On Sun, 26 Nov 2023 16:37:37 -0800, et99 wrote:

>Just wondering, why the string map to change newlines and tabs to spaces.
>Split can take those plus spaces in the splitchars string. In fact, I
>think that's the default anyway.
>
>Also, are you saying this is calculated on every char the user types? Is
>that to keep a wordcount in say a status area? How big is the text we're
>talking about here?
>
**************************

I decided to change all the newlines to spaces because I was afraid that

this
that

might become 'thisthat' rather than 'this that' after the split.

It probably wouldn't, but I wanted to make sure.

Yes, it's a sort of text editor and the word count must be calculated
after every change.

(I had big plans for it but personal issues have forced me to put it
on the back burner for God knows how long. I'm trying to fix this code
right now because I got a copywriting job where counting words in real
time is very useful. A ton of other things will be fixed... someday.)

Anyway, there is a status bar with some information and the word count
is supposed to be updated at every touch of the keyboard. I currently
filter out arrow key movements. Need to rewrite the code and make it
smarter.

The "big text" I am using to assess performance is 18MB.

$ wc /home/tcl/bigtext.txt
218993 2758398 18421662 /home/tcl/bigtext.txt

--
Luc
>>

Re: Safe handling of lists

<uk1gcs$3m2bp$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12853&group=comp.lang.tcl#12853

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: peter@invalid.org (Peter Dean)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 17:27:24 +1000
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <uk1gcs$3m2bp$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Nov 2023 07:27:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1eb62e49ac3734016b364412a249a18e";
logging-data="3869049"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XJvAozh37YxhTWd/9xGxl"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:qaJC3TONXNPGqOxJDY3JpgvRkg8=
In-Reply-To: <20231126162914.126fd99f@lud1.home>
Content-Language: en-US
 by: Peter Dean - Mon, 27 Nov 2023 07:27 UTC

On 27/11/23 05:29, Luc wrote:
> Me again. I have a problem dealing with lists.
>
> I wanted to count the words in a text widget that contains the text
> of a file. I decided to treat the whole thing like a list and iterate
> over it to count the list elements, possibily filtering some things
> out.
>
> It worked fine with a small file, but a large (very large) file
> triggers this:
>
>
> list element in quotes followed by "," instead of space
> while executing
> "foreach w $::FILECONTENT {
> incr wordcount
> }"
> (procedure "p.wc" line 5)
> invoked from within
> "p.wc"
>
> Also relevant,
>
> set ::FILECONTENT [$::text get 1.0 end]
>
> It's probably something obvious that I am missing again. Can someone
> please enlighten me?
>
tcllib has splitx which splits on regexp eg

::textutil::split::splitx $l {\s+}

splits on runs of spaces

Re: Safe handling of lists

<uk23nn$3p1ni$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12855&group=comp.lang.tcl#12855

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam.nurdglaw@gmail.com (Alan Grunwald)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 12:57:27 +0000
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <uk23nn$3p1ni$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home> <uk0och$3f972$1@dont-email.me>
<20231127000547.27b68113@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Nov 2023 12:57:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="117f6facb67785ae2aef0959f1323e6a";
logging-data="3966706"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QU90QYvjknyHkgc2hXMzL0yUIGq61Wpc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:GzRa/1f/JtPEss92+JGPYg3kq9c=
Content-Language: en-US
In-Reply-To: <20231127000547.27b68113@lud1.home>
 by: Alan Grunwald - Mon, 27 Nov 2023 12:57 UTC

On 27/11/2023 03:05, Luc wrote:
> On Sun, 26 Nov 2023 16:37:37 -0800, et99 wrote:
>
>> Just wondering, why the string map to change newlines and tabs to spaces.
>> Split can take those plus spaces in the splitchars string. In fact, I
>> think that's the default anyway.
>>
>> Also, are you saying this is calculated on every char the user types? Is
>> that to keep a wordcount in say a status area? How big is the text we're
>> talking about here?
>>
> **************************
>
> I decided to change all the newlines to spaces because I was afraid that
>
> this
> that
>
> might become 'thisthat' rather than 'this that' after the split.
>
> It probably wouldn't, but I wanted to make sure.
>
> Yes, it's a sort of text editor and the word count must be calculated
> after every change.
>
> (I had big plans for it but personal issues have forced me to put it
> on the back burner for God knows how long. I'm trying to fix this code
> right now because I got a copywriting job where counting words in real
> time is very useful. A ton of other things will be fixed... someday.)
>
> Anyway, there is a status bar with some information and the word count
> is supposed to be updated at every touch of the keyboard. I currently
> filter out arrow key movements. Need to rewrite the code and make it
> smarter.
>
> The "big text" I am using to assess performance is 18MB.
>
> $ wc /home/tcl/bigtext.txt
> 218993 2758398 18421662 /home/tcl/bigtext.txt
>
Surely you only need to update the word count if the character inserted
or deleted is a word separator? I assume you can tell whether this is
the case.

Alan

Re: Safe handling of lists

<ygacyvv45en.fsf@panther.akutech-local.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12856&group=comp.lang.tcl#12856

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!news.mb-net.net!open-news-network.org!news.mind.de!bolzen.all.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralfixx@gmx.de (Ralf Fassel)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 15:50:08 +0100
Lines: 36
Message-ID: <ygacyvv45en.fsf@panther.akutech-local.de>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net vMe+6Vn+LM5TaUCqFqOaygDSapjrQgSCX+XCzwCKlUthZD794=
Cancel-Lock: sha1:yBff3cuwx+MpsS3Swer5hya0Fb8= sha1:Ks4+5Tua9DArjftOh+Gh+CBxaA4= sha256:HC29DjR2UGu6mx2Ag5ZNQzMPAaA8WHxR6DmR9dfTqBg=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Mon, 27 Nov 2023 14:50 UTC

* Luc <luc@sep.invalid>
| I am using split now. It's faster and "list safe" so it solves
| the problem I presented first.
>
| The new problem now is that cleaning up the huge string for proper
| counting is not fast enough.
>
>
| proc p.wc {} {
| set wordcount 0
| set content [$::text get 1.0 end]
| set cleancontent [string map "\n { } \t { }" $content]
| set wordcount [llength [split $cleancontent { }]]
| return $wordcount
| }
>
>
| Since it's called whenever some change is made to the text widget,
| typing becomes unacceptably slow.

You could set up a timer to do the real work after a short period
(500ms) of keyboard-idle, and return the old count else. Quick typists
see the updated count only after they stop typing. Else you would need
to keep track of what is inserted/deleted and incr/decr the count based
on that (fragile).

| And I still haven't addded a line to clean all the multiple
| consecutive spaces, which changes the tally. I can't use regexp
| because it's too slow for what I want.

As someone else suggested, ::textutil::split::splitx might be a solution
(though it might be even slower due to the use of regexps, need to
test), or else check the [string length] of elements and count them only
as words if > 0.

R'

Re: Safe handling of lists

<uk3f1u$3vuo5$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12862&group=comp.lang.tcl#12862

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: et99@rocketship1.me (et99)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 17:16:46 -0800
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <uk3f1u$3vuo5$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 01:16:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a181ead967eb5cb46f046af422e50252";
logging-data="4193029"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eMPJzv4lK1bHlWnMYZwhh"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:QhtYzbioEyJYbVbi6QIszgvKvas=
Content-Language: en-US
In-Reply-To: <ygacyvv45en.fsf@panther.akutech-local.de>
 by: et99 - Tue, 28 Nov 2023 01:16 UTC

On 11/27/2023 6:50 AM, Ralf Fassel wrote:
> * Luc <luc@sep.invalid>
> | I am using split now. It's faster and "list safe" so it solves
> | the problem I presented first.
>>
> | The new problem now is that cleaning up the huge string for proper
> | counting is not fast enough.
>>
>>
> | proc p.wc {} {
> | set wordcount 0
> | set content [$::text get 1.0 end]
> | set cleancontent [string map "\n { } \t { }" $content]
> | set wordcount [llength [split $cleancontent { }]]
> | return $wordcount
> | }
>>
>>
> | Since it's called whenever some change is made to the text widget,
> | typing becomes unacceptably slow.
>
> You could set up a timer to do the real work after a short period
> (500ms) of keyboard-idle, and return the old count else. Quick typists
> see the updated count only after they stop typing. Else you would need
> to keep track of what is inserted/deleted and incr/decr the count based
> on that (fragile).
>
> | And I still haven't addded a line to clean all the multiple
> | consecutive spaces, which changes the tally. I can't use regexp
> | because it's too slow for what I want.
>
> As someone else suggested, ::textutil::split::splitx might be a solution
> (though it might be even slower due to the use of regexps, need to
> test), or else check the [string length] of elements and count them only
> as words if > 0.
>
> R'

I was wondering just how accurate it has to be if one is dealing with a 20mb file with perhaps 2 million words.

What about just counting all the spaces, tabs, and newlines in the text by using

set totchars [string length $txt]
set nowhites [string map {\n {} \t {} { } {}} $txt]
set wordcount [expr { $totchars - [string length $nowhites] }]

Won't this be as accurate as using split? And with all string operations, no costs of creating lists.

In timing tests, the [string length] calls seemed to be near zero, I guess the string objects have the length saved in them.

But I was also thinking, there could be a threshold, say for smaller files, i.e. where the text extracted

$::text get 1.0 end

was less than some value, then use a more accurate method, since impact on typing would be smaller.

I suspect that there has to be a file read to get the 20megs in, where an extra second to set it up won't matter. Then an accurate vs. quicker method might yield a percentage difference. That percent could be factored into any new quick counts.

But as suggested, doing it only when the user has stopped typing makes sense to me.

And there's always the possibility of doing some of the heavy lifting in a separate thread.

Interesting problem. The text editor I use doesn't count words dynamically, but rather has a statistics command.

Re: Safe handling of lists

<uk3h00$5p7$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12863&group=comp.lang.tcl#12863

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 01:49:52 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <uk3h00$5p7$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me> <20231126190047.0403d10d@lud1.home> <ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me>
Injection-Date: Tue, 28 Nov 2023 01:49:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8647d52affd0e67407bec722d5c4ff07";
logging-data="5927"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ypRO7lPgHHoe/+DlW4Se3"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.117 (x86_64))
Cancel-Lock: sha1:iMhnt/AvoM9uhZcOtMsdohkFVNo=
 by: Rich - Tue, 28 Nov 2023 01:49 UTC

et99 <et99@rocketship1.me> wrote:
> On 11/27/2023 6:50 AM, Ralf Fassel wrote:
>> As someone else suggested, ::textutil::split::splitx might be a
>> solution (though it might be even slower due to the use of regexps,
>> need to test), or else check the [string length] of elements and
>> count them only as words if > 0.
>
> I was wondering just how accurate it has to be if one is dealing with
> a 20mb file with perhaps 2 million words.

It would seem that for such a large file that being a "little bit
incorrect" on a real-time display could be acceptable. Esp. if there
were a way to request an "accurate" (and slower) word count for the
few times the exact value is needed.

> What about just counting all the spaces, tabs, and newlines in the
> text by using
>
> set totchars [string length $txt]
> set nowhites [string map {\n {} \t {} { } {}} $txt]
> set wordcount [expr { $totchars - [string length $nowhites] }]
>
> Won't this be as accurate as using split? And with all string
> operations, no costs of creating lists.

But there is the creating of a copy of a 20MB string for the output of
string map. That is probably still faster that all the small
allocations of word size strings to populate a list.

> In timing tests, the [string length] calls seemed to be near zero, I
> guess the string objects have the length saved in them.

For Tcl_Obj's holding strings, there is an explicit length count field
in the struct, so "length of string" devolves to "retreive the length
field of the Tcl_Obj".

> And there's always the possibility of doing some of the heavy lifting
> in a separate thread.

If an accurate, and nearly realtime, count is needed, and Luc does not
want the GUI event loop to block while the count occurs, then using a
second thread might be reasonable. That is if the time to 'get' the
text and send it off to the second thread does not itself become the
slow point.

Re: Safe handling of lists

<20231127230734.14bb8204@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12864&group=comp.lang.tcl#12864

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 23:07:34 -0300
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <20231127230734.14bb8204@lud1.home>
References: <20231126162914.126fd99f@lud1.home>
<uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de>
<uk3f1u$3vuo5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="108ff039ca95aae49b013c3486380a4d";
logging-data="5021"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+fiAesHYbpTMf90JrBWnKLYTg7D0MLHNE="
Cancel-Lock: sha1:T1TY0+6hLOPeFDJlTAkTz4nfxYs=
 by: Luc - Tue, 28 Nov 2023 02:07 UTC

On Mon, 27 Nov 2023 17:16:46 -0800, et99 wrote:

> I was wondering just how accurate it has to be if one is dealing with a
> 20mb file with perhaps 2 million words.

For me, it doesn't, except that I'm using the same application for small
and large files and I want the application to be able to handle all cases.

Right now, I am only enabling the wc proc when I am doing my copywriting
work. At all other times, it is disabled because typing becomes
unacceptably slow on the large files I use regularly.

> And there's always the possibility of doing some of the heavy lifting
> in a separate thread.

Hmm. That never crossed my mind. I don't remember ever coding with
threads. I will have to look into that possibility.

> Interesting problem. The text editor I use doesn't count words
> dynamically, but rather has a statistics command.

Now that did cross my mind. I wrote on MS Word and other word processors
for most of my life and that's how they all do it. But my own editor has
this or that nice little feature I made for myself that makes it all a
better experience for me so I prefer to use it now.

That's why we learn to code, right?

The problem is, a statistics command doesn't really cut it.

I am assigned a topic and a word count, usually 500 or 700. Rarely,
1200. There are two approaches I can take:

1. Splurge carelessly and rewrite to prune excesses later.

2. Manage my verbosity as I go along. Sort of a regularity rally.

I like #2 better. It's actually more enjoyable and it saves me quite
some time. Sometimes the deadline is long, sometimes it isn't.

I can only manage my verbosity as I go along if I know how many words
I have put into it in real time.

This is a very old idea. Even when I used MS Word on Windows, and that
was literally 20 to 26 years ago, I craved something like that.
I can finally have it.

Or can I? Let's see.

This is just a quick reply. I will take a more careful look at other
aspects of your comments later. I will see what I can do with the code
ideas you contributed. If I do find a good solution, I will wikify it.

Many thanks for your interest.

--
Luc
>>

Re: Safe handling of lists

<uk3l9u$4j95$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12865&group=comp.lang.tcl#12865

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 03:03:26 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 86
Message-ID: <uk3l9u$4j95$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me> <20231126190047.0403d10d@lud1.home> <ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me> <20231127230734.14bb8204@lud1.home>
Injection-Date: Tue, 28 Nov 2023 03:03:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8647d52affd0e67407bec722d5c4ff07";
logging-data="150821"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rMDdL4/AjcJ6KwjFxOagU"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.117 (x86_64))
Cancel-Lock: sha1:FWY5U7+W1Y2ccwIAUbVkaNK7IMU=
 by: Rich - Tue, 28 Nov 2023 03:03 UTC

Luc <luc@sep.invalid> wrote:
> On Mon, 27 Nov 2023 17:16:46 -0800, et99 wrote:
>
>> I was wondering just how accurate it has to be if one is dealing
>> with a 20mb file with perhaps 2 million words.
>
> For me, it doesn't, except that I'm using the same application for
> small and large files and I want the application to be able to handle
> all cases.
>
> Right now, I am only enabling the wc proc when I am doing my
> copywriting work. At all other times, it is disabled because typing
> becomes unacceptably slow on the large files I use regularly.

Thing is, given what you've described so far of your code, you are
attempting to brute-force a real-time count, and doing so you've
created something on the order of an O(n^2) or worse algorithm.

You are "counting" lots of things over and over that you previously
counted after the last keystroke, even though that keystroke only made
(usually) a one character change to the entire file.

You instead want to think about how to not count any more than you have
to for each "count cycle".

I pulled a copy of Gutenburg's copy of War and Peace (because it is a
long book) from here: https://gutenberg.org/cache/epub/2600/pg2600.txt

Then I concatenated six duplicates into a single file (to make an
approximately 20MB file). I named that file "20mb".

Then I set out to see what could be done by trying to not count
everything after any change. This is ugly demostration code below, all
of this would ideally be wrapped inside a oo::object and made prettier,
but you get the raw demo below:

#!/usr/bin/wish

set wc 0

label .wc -textvariable wc
text .t
pack .wc
pack .t

set fd [open 20mb RDONLY]
.t insert end [read $fd]
close $fd

set num_lines [.t count -lines 0.0 end]

# prefill per-line word-count cache
set lcc [list 0]
for {set i 1} {$i < $num_lines} {incr i} {
lappend lcc [llength [regexp -all -inline {\S+} [.t get $i.0 "$i.0 lineend"]]]
}

# initial load word count
set wc [tcl::mathop::+ {*}$lcc]

# set modified flag of text widget to false
.t edit modified 0

proc modified {} {
global lcc
global wc
lassign [split [.t index insert] .] line_num
lset lcc $line_num [llength [regexp -all -inline {\S+} [.t get $line_num.0 "$line_num.0 lineend"]]]
set wc [tcl::mathop::+ {*}$lcc]
.t edit modified 0
}

bind .t <<Modified>> [list modified]

This above counts, in real time, as I type, with the 20mb 6x War and
Peace file loaded. If I get going typing I can just begin to sense a
latency on typing, but I do really have to get on a good roll for that.
The one place I *do* see a latency is for keyboard autorepeat, there is
a clear delay then.

But, while hammering out this demo I noticed that <<Modified>> is
called twice for every keystroke (meaning this above is still doing
twice the work it needs to do). I simply did not want to be bothered
with working out why <<Modified>> is called twice with every keystroke,
nor with working out how to not call it twice for every keystroke.

Re: Safe handling of lists

<uk3sti$5gvq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12866&group=comp.lang.tcl#12866

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!newsfeed.endofthelinebbs.com!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: et99@rocketship1.me (et99)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Mon, 27 Nov 2023 21:13:22 -0800
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <uk3sti$5gvq$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me>
<20231127230734.14bb8204@lud1.home> <uk3l9u$4j95$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 05:13:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a181ead967eb5cb46f046af422e50252";
logging-data="181242"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zeTGIpSpuzBz0/zmp96UG"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:K1KZAHU5hOT5VEwvzF4BvGuuZxQ=
Content-Language: en-US
In-Reply-To: <uk3l9u$4j95$1@dont-email.me>
 by: et99 - Tue, 28 Nov 2023 05:13 UTC

On 11/27/2023 7:03 PM, Rich wrote:
> Luc <luc@sep.invalid> wrote:
>> On Mon, 27 Nov 2023 17:16:46 -0800, et99 wrote:
>>
>>> I was wondering just how accurate it has to be if one is dealing
>>> with a 20mb file with perhaps 2 million words.
>>
>> For me, it doesn't, except that I'm using the same application for
>> small and large files and I want the application to be able to handle
>> all cases.
>>
>> Right now, I am only enabling the wc proc when I am doing my
>> copywriting work. At all other times, it is disabled because typing
>> becomes unacceptably slow on the large files I use regularly.
>
> Thing is, given what you've described so far of your code, you are
> attempting to brute-force a real-time count, and doing so you've
> created something on the order of an O(n^2) or worse algorithm.
>
> You are "counting" lots of things over and over that you previously
> counted after the last keystroke, even though that keystroke only made
> (usually) a one character change to the entire file.
>
> You instead want to think about how to not count any more than you have
> to for each "count cycle".
>
> I pulled a copy of Gutenburg's copy of War and Peace (because it is a
> long book) from here: https://gutenberg.org/cache/epub/2600/pg2600.txt
>
> Then I concatenated six duplicates into a single file (to make an
> approximately 20MB file). I named that file "20mb".
>
> Then I set out to see what could be done by trying to not count
> everything after any change. This is ugly demostration code below, all
> of this would ideally be wrapped inside a oo::object and made prettier,
> but you get the raw demo below:
>
> #!/usr/bin/wish
>
> set wc 0
>
> label .wc -textvariable wc
> text .t
> pack .wc
> pack .t
>
> set fd [open 20mb RDONLY]
> .t insert end [read $fd]
> close $fd
>
> set num_lines [.t count -lines 0.0 end]
>
> # prefill per-line word-count cache
> set lcc [list 0]
> for {set i 1} {$i < $num_lines} {incr i} {
> lappend lcc [llength [regexp -all -inline {\S+} [.t get $i.0 "$i.0 lineend"]]]
> }
>
> # initial load word count
> set wc [tcl::mathop::+ {*}$lcc]
>
> # set modified flag of text widget to false
> .t edit modified 0
>
> proc modified {} {
> global lcc
> global wc
> lassign [split [.t index insert] .] line_num
> lset lcc $line_num [llength [regexp -all -inline {\S+} [.t get $line_num.0 "$line_num.0 lineend"]]]
> set wc [tcl::mathop::+ {*}$lcc]
> .t edit modified 0
> }
>
> bind .t <<Modified>> [list modified]
>
> This above counts, in real time, as I type, with the 20mb 6x War and
> Peace file loaded. If I get going typing I can just begin to sense a
> latency on typing, but I do really have to get on a good roll for that.
> The one place I *do* see a latency is for keyboard autorepeat, there is
> a clear delay then.
>
> But, while hammering out this demo I noticed that <<Modified>> is
> called twice for every keystroke (meaning this above is still doing
> twice the work it needs to do). I simply did not want to be bothered
> with working out why <<Modified>> is called twice with every keystroke,
> nor with working out how to not call it twice for every keystroke.
>
Hmmm, a line cache. Cool.

And since lcc is a list one can insert and delete quickly when lines are added or removed keeping it in sync with the text widget.

How to tell just what changed (e.g. a paste in the middle) I can't say.

I think for changes on a line, the [tcl::mathop::+ {*}$lcc] can adjust the wc using the line's (new value - its old value) saving possibly a 20k item arglist for the mathop.

But I think this should work pretty well.

Re: Safe handling of lists

<uk4905$70ek$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12867&group=comp.lang.tcl#12867

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: auriocus@gmx.de (Christian Gollwitzer)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 09:39:33 +0100
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <uk4905$70ek$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk1gcs$3m2bp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 08:39:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2af6d790429acdd6ff9c8f3c7ca81e6f";
logging-data="229844"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+QFfKOGAOZVoCdimRzk2EDJ32x1IrSq4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:c9kCcMzyvl9Slvf9V9pQ8O4kTDw=
In-Reply-To: <uk1gcs$3m2bp$1@dont-email.me>
 by: Christian Gollwitzer - Tue, 28 Nov 2023 08:39 UTC

Am 27.11.23 um 08:27 schrieb Peter Dean:
> tcllib has splitx which splits on regexp eg
>
> ::textutil::split::splitx $l {\s+}
>
> splits on runs of spaces
For only the count, it is not required to split the list. regex can do
the counting:

set string {This is a sentence with whitespaces in it.}
regex -all {\s+} $string

returns the number of blanks. With the uppercase \S it returns the
number of non-white (there could be whitespace before and after)

Christian

Re: Safe handling of lists

<uk4bbl$7cpc$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12868&group=comp.lang.tcl#12868

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: p.dean@gmx.com (Peter Dean)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 09:19:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 22
Sender: <peter@flo.localdomain>
Message-ID: <uk4bbl$7cpc$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk1gcs$3m2bp$1@dont-email.me> <uk4905$70ek$1@dont-email.me>
Injection-Date: Tue, 28 Nov 2023 09:19:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="08ebe7ee917dac053a1c740f7dcf7c42";
logging-data="242476"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18DUI/qjon2gntKnTRNu3Y/"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.63-1-lts (x86_64))
Cancel-Lock: sha1:fteW/cLhxZbW/5xlVA1BLuhyz/Y=
 by: Peter Dean - Tue, 28 Nov 2023 09:19 UTC

Christian Gollwitzer <auriocus@gmx.de> wrote:
> Am 27.11.23 um 08:27 schrieb Peter Dean:
>> tcllib has splitx which splits on regexp eg
>>
>> ::textutil::split::splitx $l {\s+}
>>
>> splits on runs of spaces
> For only the count, it is not required to split the list. regex can do
> the counting:
>
> set string {This is a sentence with whitespaces in it.}
> regex -all {\s+} $string
>
> returns the number of blanks. With the uppercase \S it returns the
> number of non-white (there could be whitespace before and after)
>
> Christian
>

impressive
better to predict what the real question was than that asked

Re: Safe handling of lists

<ygazfyyypy1.fsf@panther.akutech-local.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12873&group=comp.lang.tcl#12873

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!news.samoylyk.net!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralfixx@gmx.de (Ralf Fassel)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 14:20:38 +0100
Lines: 20
Message-ID: <ygazfyyypy1.fsf@panther.akutech-local.de>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de>
<uk3f1u$3vuo5$1@dont-email.me> <20231127230734.14bb8204@lud1.home>
<uk3l9u$4j95$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net fHXNIN7uWunocPJM8orGHgdykZ63n6GlvaAI48HMmb28CnC5U=
Cancel-Lock: sha1:m7qc6EvlCxAzL8hqnsKqDtXR1aU= sha1:40EfyYUpoVtpVShqhSMkk3Bharo= sha256:dQQYMOzRyhgIWC+OK8Me5W5BlnLy1SMIjmZaJ5BjfWg=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Tue, 28 Nov 2023 13:20 UTC

* Rich <rich@example.invalid>
| But, while hammering out this demo I noticed that <<Modified>> is
| called twice for every keystroke (meaning this above is still doing
| twice the work it needs to do). I simply did not want to be bothered
| with working out why <<Modified>> is called twice with every keystroke,

Might be due to the fact that you change the 'modified' flag in the
callback, and looking at the C code for the text widget which handles
the ".t edit modified 0":

/*
* Only issue the <<Modified>> event if the flag actually changed.
* However, degree of modified-ness doesn't matter. [Bug 1799782]
*/
if ((!oldModified) != (!setModified)) {
GenerateModifiedEvent(textPtr);
}

HTH
R'

Re: Safe handling of lists

<uk57ou$bui8$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12874&group=comp.lang.tcl#12874

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 17:24:46 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <uk57ou$bui8$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me> <20231126190047.0403d10d@lud1.home> <ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me> <20231127230734.14bb8204@lud1.home> <uk3l9u$4j95$1@dont-email.me> <uk3sti$5gvq$1@dont-email.me>
Injection-Date: Tue, 28 Nov 2023 17:24:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8647d52affd0e67407bec722d5c4ff07";
logging-data="391752"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jOax8swa6+WCcT6GVhpZ8"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.117 (x86_64))
Cancel-Lock: sha1:lFL6CsOKdoVGV/3awSO6nuLLvEo=
 by: Rich - Tue, 28 Nov 2023 17:24 UTC

et99 <et99@rocketship1.me> wrote:
> On 11/27/2023 7:03 PM, Rich wrote:
>> Luc <luc@sep.invalid> wrote:
>>> Right now, I am only enabling the wc proc when I am doing my
>>> copywriting work. At all other times, it is disabled because
>>> typing becomes unacceptably slow on the large files I use
>>> regularly.
>>
>> Thing is, given what you've described so far of your code, you are
>> attempting to brute-force a real-time count, and doing so you've
>> created something on the order of an O(n^2) or worse algorithm.
>>
>> You are "counting" lots of things over and over that you previously
>> counted after the last keystroke, even though that keystroke only made
>> (usually) a one character change to the entire file.
>>
>> You instead want to think about how to not count any more than you have
>> to for each "count cycle".
>>
> Hmmm, a line cache. Cool.

It reduces the need to count to only counting the current line being
edited. Which is where this example derives almost all of its speedup.

> And since lcc is a list one can insert and delete quickly when lines
> are added or removed keeping it in sync with the text widget.

Yes, the example does not try to adjust the list for insert/delete
operations. A real version would need to track inserts/deletes and
adjust the line count cache accordingly. And yes, being a list,
inserting/deleting one or more line items is reasonably quick.

> How to tell just what changed (e.g. a paste in the middle) I can't say.

Probably have to shim the text widget and watch for all the operations
that occur, and adjust the cache accordingly.

> I think for changes on a line, the [tcl::mathop::+ {*}$lcc] can
> adjust the wc using the line's (new value - its old value) saving
> possibly a 20k item arglist for the mathop.

Yes, I suspect there are opportunities here to avoid having to iterate
over the entire list. I did not try to add those opportunities.

Re: Safe handling of lists

<uk57s3$bui8$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12875&group=comp.lang.tcl#12875

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 17:26:27 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <uk57s3$bui8$2@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me> <20231126190047.0403d10d@lud1.home> <ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me> <20231127230734.14bb8204@lud1.home> <uk3l9u$4j95$1@dont-email.me> <ygazfyyypy1.fsf@panther.akutech-local.de>
Injection-Date: Tue, 28 Nov 2023 17:26:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8647d52affd0e67407bec722d5c4ff07";
logging-data="391752"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+S0jqrE/jafI3VgKMvNE1e"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.117 (x86_64))
Cancel-Lock: sha1:tuKsJu+f5iQQrij3AyQukpZzaYQ=
 by: Rich - Tue, 28 Nov 2023 17:26 UTC

Ralf Fassel <ralfixx@gmx.de> wrote:
> * Rich <rich@example.invalid>
> | But, while hammering out this demo I noticed that <<Modified>> is
> | called twice for every keystroke (meaning this above is still doing
> | twice the work it needs to do). I simply did not want to be bothered
> | with working out why <<Modified>> is called twice with every keystroke,
>
> Might be due to the fact that you change the 'modified' flag in the
> callback, and looking at the C code for the text widget which handles
> the ".t edit modified 0":
>
> /*
> * Only issue the <<Modified>> event if the flag actually changed.
> * However, degree of modified-ness doesn't matter. [Bug 1799782]
> */
> if ((!oldModified) != (!setModified)) {
> GenerateModifiedEvent(textPtr);
> }

Ah, that would be why then.

As I said, I did not bother digging to find out why, nor to think about
how to avoid the double calls.

Re: Safe handling of lists

<20231129015454.4c84bdc9@lud1.home>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12876&group=comp.lang.tcl#12876

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: luc@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Wed, 29 Nov 2023 01:54:54 -0300
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <20231129015454.4c84bdc9@lud1.home>
References: <20231126162914.126fd99f@lud1.home>
<uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de>
<uk3f1u$3vuo5$1@dont-email.me>
<20231127230734.14bb8204@lud1.home>
<uk3l9u$4j95$1@dont-email.me>
<ygazfyyypy1.fsf@panther.akutech-local.de>
<uk57s3$bui8$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="403e987a6c455cabf6753abd5bcdf228";
logging-data="706644"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/74EIL6FfDW+sayC+XITP3DQIEFzdQGwc="
Cancel-Lock: sha1:Vbl/FUqnLCooL4j0PP3LIvT0c68=
 by: Luc - Wed, 29 Nov 2023 04:54 UTC

OK, I found a good solution. It's similar to what some of you suggested,
but different enough. I don't want to post all the code because the whole
mechanism spans across multiple procs and the procs can be long and
complicated, but here is a description of the mechanism and the core procs:

- Open file, immediately count all words in the buffer (slow), count all
words in the current line (fast).
That big count is done only once with a thorough enough cleanup, including
removal of multiple spaces. So it's accurate.

- Create three global variables: ::WholeBufferWC, ::CurrentLineWC and
::MostBufferWC.

In case you're wondering, $::MostBufferWC is everything except the current
line.

$::MostBufferWC = $::WholeBufferWC - $::CurrentLineWC
or
$::CurrentLineWC + $::MostBufferWC = $::WholeBufferWC

You get the picture.

- Also store the number of the current line in ::CURRLINE. My code
already did that anyway for the status bar.

- Additionally, create the ::PREVLINE variable which will be used in the
new, improved proc.
That is where the magic is. It lets me monitor never more than one or two
lines at a time. So it's fast. Tested and approved on the large file.

I am using two procs now:

wc is called occasionally and used for counting words cleanly, taking care
of all the spaces.

wcglobal is called at every touch and does the whole line monitor
management, getting counts from wc whenever necessary. Here they are:

proc p.wc {content} {
set wordcount 0
regsub -all {[\s]+} [string trim $content] { } content
set wordcount [llength [split $content { }]]
return $wordcount
}

proc p.wcglobal {} {
set ::currindex [$::text index insert]
lassign [split $::currindex "."] ::CURRLINE ::CURRCOL
set ::currentlinecontent [$::text get $::CURRLINE.0 "$::CURRLINE.0 lineend"]
set ::CurrentLineWC [p.wc $::currentlinecontent]

if {$::CURRLINE == $::PREVLINE} {
set ::WholeBufferWC [expr {$::MostBufferWC + $::CurrentLineWC}]
}
# else do not add the current line

set ::MostBufferWC [expr {$::WholeBufferWC - $::CurrentLineWC}]

set ::PREVLINE $::CURRLINE

if {$::WholeBufferWC == 0} {return ""}
if {$::WholeBufferWC == 1} {return "1 word"}
if {$::WholeBufferWC > 1} {return "$::WholeBufferWC words"}
}

Tested with arrow keys navigation, random mouse clicks and pressing Return
at the end or in the middle of lines.

It works!

I'm eating my own dog food, so if there are bugs, I find see them soon.

Thank you for all the help once again.

--
Luc
>>

Re: Safe handling of lists

<uk6l3f$mlqu$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=12877&group=comp.lang.tcl#12877

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: et99@rocketship1.me (et99)
Newsgroups: comp.lang.tcl
Subject: Re: Safe handling of lists
Date: Tue, 28 Nov 2023 22:18:23 -0800
Organization: A noiseless patient Spider
Lines: 125
Message-ID: <uk6l3f$mlqu$1@dont-email.me>
References: <20231126162914.126fd99f@lud1.home> <uk0dnk$3dr16$1@dont-email.me>
<20231126190047.0403d10d@lud1.home>
<ygacyvv45en.fsf@panther.akutech-local.de> <uk3f1u$3vuo5$1@dont-email.me>
<20231127230734.14bb8204@lud1.home> <uk3l9u$4j95$1@dont-email.me>
<ygazfyyypy1.fsf@panther.akutech-local.de> <uk57s3$bui8$2@dont-email.me>
<20231129015454.4c84bdc9@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 29 Nov 2023 06:18:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7b6b603e2ea13295a0842856719e340f";
logging-data="743262"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19fU4uiOKZVont70Lv8hmku"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:4N09NvLLGXgvYEwkDEhcNkNJ6hM=
In-Reply-To: <20231129015454.4c84bdc9@lud1.home>
Content-Language: en-US
 by: et99 - Wed, 29 Nov 2023 06:18 UTC

On 11/28/2023 8:54 PM, Luc wrote:
> OK, I found a good solution. It's similar to what some of you suggested,
> but different enough. I don't want to post all the code because the whole
> mechanism spans across multiple procs and the procs can be long and
> complicated, but here is a description of the mechanism and the core procs:
>
> - Open file, immediately count all words in the buffer (slow), count all
> words in the current line (fast).
> That big count is done only once with a thorough enough cleanup, including
> removal of multiple spaces. So it's accurate.
>
> - Create three global variables: ::WholeBufferWC, ::CurrentLineWC and
> ::MostBufferWC.
>
> In case you're wondering, $::MostBufferWC is everything except the current
> line.
>
> $::MostBufferWC = $::WholeBufferWC - $::CurrentLineWC
> or
> $::CurrentLineWC + $::MostBufferWC = $::WholeBufferWC
>
> You get the picture.
>
> - Also store the number of the current line in ::CURRLINE. My code
> already did that anyway for the status bar.
>
> - Additionally, create the ::PREVLINE variable which will be used in the
> new, improved proc.
> That is where the magic is. It lets me monitor never more than one or two
> lines at a time. So it's fast. Tested and approved on the large file.
>
> I am using two procs now:
>
> wc is called occasionally and used for counting words cleanly, taking care
> of all the spaces.
>
> wcglobal is called at every touch and does the whole line monitor
> management, getting counts from wc whenever necessary. Here they are:
>
>
> proc p.wc {content} {
> set wordcount 0
> regsub -all {[\s]+} [string trim $content] { } content
> set wordcount [llength [split $content { }]]
> return $wordcount
> }
>
>
> proc p.wcglobal {} {
> set ::currindex [$::text index insert]
> lassign [split $::currindex "."] ::CURRLINE ::CURRCOL
> set ::currentlinecontent [$::text get $::CURRLINE.0 "$::CURRLINE.0 lineend"]
> set ::CurrentLineWC [p.wc $::currentlinecontent]
>
> if {$::CURRLINE == $::PREVLINE} {
> set ::WholeBufferWC [expr {$::MostBufferWC + $::CurrentLineWC}]
> }
> # else do not add the current line
>
> set ::MostBufferWC [expr {$::WholeBufferWC - $::CurrentLineWC}]
>
> set ::PREVLINE $::CURRLINE
>
> if {$::WholeBufferWC == 0} {return ""}
> if {$::WholeBufferWC == 1} {return "1 word"}
> if {$::WholeBufferWC > 1} {return "$::WholeBufferWC words"}
> }
>
>
> Tested with arrow keys navigation, random mouse clicks and pressing Return
> at the end or in the middle of lines.
>
> It works!
>
> I'm eating my own dog food, so if there are bugs, I find see them soon.
>
> Thank you for all the help once again.
>
>

That looks great. I think if you want to have some fun, you could run your full recount in a second thread.

package require Thread

set ::tid [thread::create {
proc p.wc {content} {
set wordcount 0
regsub -all {[\s]+} [string trim $content] { } content
set wordcount [llength [split $content { }]]
return $wordcount
}
proc recount {main_tid var} {
set words [p.wc [tsv::get text x]]
thread::send -async $main_tid "set ::$var $words"
}
thread::wait
}]

# when you want a clean full update

tsv::set text x [.t get 1.0 end] ;# copy to a thread shared var (20mb -> ~20ms)
unset -nocomplain ::count_from_thread
thread::send -async $::tid "recount [thread::id] count_from_thread"

Then when you are doing a line count update in wcglobal,

test [info exist ::count_from_thread] and if it exists,
then use that for your current ::WholeBufferWC, if not, just use the
current value until that variable gets set, maybe then after
another few chars are entered by the user it will be ready. But
there should be no impact on the user's typing (in theory) :)

But don't queue another request until count_from_thread exists.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor