Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

You will never amount to much. -- Munich Schoolmaster, to Albert Einstein, age 10


devel / comp.unix.shell / using Unicode codepoints in a bash script

SubjectAuthor
* using Unicode codepoints in a bash scriptparis2venice
+- using Unicode codepoints in a bash scriptparis2venice
+* using Unicode codepoints in a bash scriptRussell Marks
|+* using Unicode codepoints in a bash scriptparis2venice
||`* using Unicode codepoints in a bash scriptRussell Marks
|| `* using Unicode codepoints in a bash scriptChristian Weisgerber
||  `- using Unicode codepoints in a bash scriptRussell Marks
|`- using Unicode codepoints in a bash scriptparis2venice
`* using Unicode codepoints in a bash scriptChristian Weisgerber
 `- using Unicode codepoints in a bash scriptparis2venice

1
using Unicode codepoints in a bash script

<aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7551&group=comp.unix.shell#7551

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a0c:e70e:0:b0:67a:6e8a:aabf with SMTP id d14-20020a0ce70e000000b0067a6e8aaabfmr8380qvn.12.1701244867240;
Wed, 29 Nov 2023 00:01:07 -0800 (PST)
X-Received: by 2002:a17:90a:db0b:b0:285:6fed:4c60 with SMTP id
g11-20020a17090adb0b00b002856fed4c60mr3711769pjv.6.1701244866981; Wed, 29 Nov
2023 00:01:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Wed, 29 Nov 2023 00:01:06 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=45.86.208.9; posting-account=-6wfVAoAAAC4Q16XRVdS4mm061q0uGye
NNTP-Posting-Host: 45.86.208.9
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
Subject: using Unicode codepoints in a bash script
From: paris2venice@gmail.com (paris2venice)
Injection-Date: Wed, 29 Nov 2023 08:01:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1930
 by: paris2venice - Wed, 29 Nov 2023 08:01 UTC

I am trying to use Unicode codepoints along with Unicode UTF8s in a bash script in order to compare codepoints and their matching UTF8 in a database of the 1071 Egyptian hieroglyphs.

So what I am trying to do is ensure that each record is accurately validated in the 1071 lines of my file.

It works in the bash shell e.g.
codepoint=13000
printf "\U000${codepoint}\n"
ð“€€

However, if I put the same code into a script, e.g.
cat > cpt
#!/bin/bash
codepoint=13000
printf "\U000${codepoint}\n"

chmod +x cpt
bash -x ./cpt
+ codepoint=13000
+ printf '\U00013000\n'
\U00013000

So instead of creating the hieroglyph, the script just ignores the same exact code. Is there any way around this? Thanks for any help.

bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
Copyright (C) 2007 Free Software Foundation, Inc.

Re: using Unicode codepoints in a bash script

<a51a49ed-bd74-449f-b0ad-e59811ff3ef2n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7552&group=comp.unix.shell#7552

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a05:620a:8d0a:b0:76d:e9c0:9109 with SMTP id rb10-20020a05620a8d0a00b0076de9c09109mr391451qkn.7.1701245252381;
Wed, 29 Nov 2023 00:07:32 -0800 (PST)
X-Received: by 2002:a63:1143:0:b0:5be:3925:b5b7 with SMTP id
3-20020a631143000000b005be3925b5b7mr3076262pgr.5.1701245252070; Wed, 29 Nov
2023 00:07:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Wed, 29 Nov 2023 00:07:31 -0800 (PST)
In-Reply-To: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=45.86.208.9; posting-account=-6wfVAoAAAC4Q16XRVdS4mm061q0uGye
NNTP-Posting-Host: 45.86.208.9
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a51a49ed-bd74-449f-b0ad-e59811ff3ef2n@googlegroups.com>
Subject: Re: using Unicode codepoints in a bash script
From: paris2venice@gmail.com (paris2venice)
Injection-Date: Wed, 29 Nov 2023 08:07:32 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1199
 by: paris2venice - Wed, 29 Nov 2023 08:07 UTC

I did chsh to the 5.0.17 version of bash but had the same issue.

Re: using Unicode codepoints in a bash script

<RmI9N.1308297$D%n3.74768@usenetxs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7559&group=comp.unix.shell#7559

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!news.hispagatos.org!news.nntp4.net!paganini.bofh.team!2.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx01.ams4.POSTED!not-for-mail
From: zgedneil@spam^H^H^H^Hgmail.com (Russell Marks)
Newsgroups: comp.unix.shell
Subject: Re: using Unicode codepoints in a bash script
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
Organization: this space intentionally left blank
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Lines: 44
Message-ID: <RmI9N.1308297$D%n3.74768@usenetxs.com>
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Wed, 29 Nov 2023 15:01:05 UTC
Date: Wed, 29 Nov 2023 15:01:05 GMT
X-Received-Bytes: 2159
 by: Russell Marks - Wed, 29 Nov 2023 15:01 UTC

paris2venice <paris2venice@gmail.com> wrote:

> I am trying to use Unicode codepoints along with Unicode UTF8s in a
> bash script in order to compare codepoints and their matching UTF8 in
> a database of the 1071 Egyptian hieroglyphs.
>
> So what I am trying to do is ensure that each record is accurately
> validated in the 1071 lines of my file.
>
> It works in the bash shell e.g.
> codepoint=13000
> printf "\U000${codepoint}\n"
> ð“€€

It sounds like you're on macOS, so I suspect the interactive shell
you're using may be zsh, not bash - and probably a newer version.

> However, if I put the same code into a script, e.g.
> cat > cpt
> #!/bin/bash
> codepoint=13000
> printf "\U000${codepoint}\n"
>
> chmod +x cpt
> bash -x ./cpt
> + codepoint=13000
> + printf '\U00013000\n'
> \U00013000
>
> So instead of creating the hieroglyph, the script just ignores the
> same exact code. Is there any way around this? Thanks for any
> help.
>
> bash --version
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
> Copyright (C) 2007 Free Software Foundation, Inc.

That's a very old version of bash. To quote the CHANGES file from a
newer version, "Fixed several bugs with the handling of valid and
invalid unicode character values when used with the \u and \U escape
sequences to printf and $'...'." So the old version not having those
fixes might be the problem.

-Rus.

Re: using Unicode codepoints in a bash script

<slrnumek55.odp.naddy@lorvorc.mips.inka.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7561&group=comp.unix.shell#7561

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!news.swapon.de!weretis.net!feeder8.news.weretis.net!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: naddy@mips.inka.de (Christian Weisgerber)
Newsgroups: comp.unix.shell
Subject: Re: using Unicode codepoints in a bash script
Date: Wed, 29 Nov 2023 14:54:29 -0000 (UTC)
Message-ID: <slrnumek55.odp.naddy@lorvorc.mips.inka.de>
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 29 Nov 2023 14:54:29 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
logging-data="25018"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)
 by: Christian Weisgerber - Wed, 29 Nov 2023 14:54 UTC

On 2023-11-29, paris2venice <paris2venice@gmail.com> wrote:
>
> It works in the bash shell e.g.
> codepoint=13000
> printf "\U000${codepoint}\n"
> ð“€€
>
> However, if I put the same code into a script, e.g.
> cat > cpt
> #!/bin/bash
> codepoint=13000
> printf "\U000${codepoint}\n"
> \U00013000
>
> So instead of creating the hieroglyph, the script just ignores the same exact code.

That doesn't happen. Something isn't like you say it is.

> bash --version
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
> Copyright (C) 2007 Free Software Foundation, Inc.

Presumably that is the version you use to execute the script.
It is ancient and may not yet support the \U syntax.

What bash are you using interactively?

$ echo $BASH_VERSION

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Re: using Unicode codepoints in a bash script

<76a0c6da-8c10-40b6-972e-883b76242991n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7564&group=comp.unix.shell#7564

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a0c:e70e:0:b0:67a:6e8a:aabf with SMTP id d14-20020a0ce70e000000b0067a6e8aaabfmr40899qvn.12.1701281597440;
Wed, 29 Nov 2023 10:13:17 -0800 (PST)
X-Received: by 2002:a05:6820:1516:b0:58d:3c4b:ee40 with SMTP id
ay22-20020a056820151600b0058d3c4bee40mr1544934oob.0.1701281597093; Wed, 29
Nov 2023 10:13:17 -0800 (PST)
Path: i2pn2.org!rocksolid2!news.neodome.net!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Wed, 29 Nov 2023 10:13:16 -0800 (PST)
In-Reply-To: <RmI9N.1308297$D%n3.74768@usenetxs.com>
Injection-Info: google-groups.googlegroups.com; posting-host=45.86.208.9; posting-account=-6wfVAoAAAC4Q16XRVdS4mm061q0uGye
NNTP-Posting-Host: 45.86.208.9
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com> <RmI9N.1308297$D%n3.74768@usenetxs.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <76a0c6da-8c10-40b6-972e-883b76242991n@googlegroups.com>
Subject: Re: using Unicode codepoints in a bash script
From: paris2venice@gmail.com (paris2venice)
Injection-Date: Wed, 29 Nov 2023 18:13:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3398
 by: paris2venice - Wed, 29 Nov 2023 18:13 UTC

On Wednesday, November 29, 2023 at 7:01:12 AM UTC-8, Russell Marks wrote:
> paris2venice wrote:
>
> > I am trying to use Unicode codepoints along with Unicode UTF8s in a
> > bash script in order to compare codepoints and their matching UTF8 in
> > a database of the 1071 Egyptian hieroglyphs.
> >
> > So what I am trying to do is ensure that each record is accurately
> > validated in the 1071 lines of my file.
> >
> > It works in the bash shell e.g.
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> > ð“€€
> It sounds like you're on macOS, so I suspect the interactive shell
> you're using may be zsh, not bash - and probably a newer version.

Thanks for your reply, Russell. I do not know zsh so I don't use it. And I disliked Apple trying to decide for me which shell I should use so I ignored them. That was years ago.

> > However, if I put the same code into a script, e.g.
> > cat > cpt
> > #!/bin/bash
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> >
> > chmod +x cpt
> > bash -x ./cpt
> > + codepoint=13000
> > + printf '\U00013000\n'
> > \U00013000
> >
> > So instead of creating the hieroglyph, the script just ignores the
> > same exact code. Is there any way around this? Thanks for any
> > help.
> >
> > bash --version
> > GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
> > Copyright (C) 2007 Free Software Foundation, Inc.

> That's a very old version of bash. To quote the CHANGES file from a
> newer version, "Fixed several bugs with the handling of valid and
> invalid unicode character values when used with the \u and \U escape
> sequences to printf and $'...'." So the old version not having those
> fixes might be the problem.
>
> -Rus.

That's interesting. Did you see my following comment about trying it with version 5.0.17? I had the same exact results.

In any case, the UTF-8 does not fail even with the 3.2.57(1) release:

utf8a=80 utf8b=80
utf8_hg=$( printf "\xF0\x93\x${utf8a}\x${utf8b}" )
echo $utf8_hg
ð“€€

Re: using Unicode codepoints in a bash script

<ee68cffe-a8ac-4964-b136-67d65278873dn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7565&group=comp.unix.shell#7565

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a05:622a:514:b0:41c:b481:5e8c with SMTP id l20-20020a05622a051400b0041cb4815e8cmr669913qtx.4.1701284329447;
Wed, 29 Nov 2023 10:58:49 -0800 (PST)
X-Received: by 2002:a05:6830:3a88:b0:6d7:cfc6:c35d with SMTP id
dj8-20020a0568303a8800b006d7cfc6c35dmr798650otb.3.1701284329244; Wed, 29 Nov
2023 10:58:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.chmurka.net!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Wed, 29 Nov 2023 10:58:49 -0800 (PST)
In-Reply-To: <slrnumek55.odp.naddy@lorvorc.mips.inka.de>
Injection-Info: google-groups.googlegroups.com; posting-host=45.86.208.9; posting-account=-6wfVAoAAAC4Q16XRVdS4mm061q0uGye
NNTP-Posting-Host: 45.86.208.9
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com> <slrnumek55.odp.naddy@lorvorc.mips.inka.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ee68cffe-a8ac-4964-b136-67d65278873dn@googlegroups.com>
Subject: Re: using Unicode codepoints in a bash script
From: paris2venice@gmail.com (paris2venice)
Injection-Date: Wed, 29 Nov 2023 18:58:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: paris2venice - Wed, 29 Nov 2023 18:58 UTC

On Wednesday, November 29, 2023 at 7:30:10 AM UTC-8, Christian Weisgerber wrote:
> On 2023-11-29, paris2venice wrote:
> >
> > It works in the bash shell e.g.
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> > ð“€€
> >
> > However, if I put the same code into a script, e.g.
> > cat > cpt
> > #!/bin/bash
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> > \U00013000
> >
> > So instead of creating the hieroglyph, the script just ignores the same exact code.
> That doesn't happen. Something isn't like you say it is.

Well, did you try replicating my results? The very simple bash code is right there.

> > bash --version
> > GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
> > Copyright (C) 2007 Free Software Foundation, Inc.
> Presumably that is the version you use to execute the script.
> It is ancient and may not yet support the \U syntax.

But it does support the \U syntax as with the UTF-8, just not the codepoint..
The basic function of my shell script is to compare Unicode's codepoint with the UTF-8 for all 1071 hieroglyphs.

The first hieroglyph defined in Unicode is a seated man referred to as A1 in the Gardiner classification.
Its codepoint is "13000" and its UTF-8 is the matched pair of "80 80" and its hieroglyph is ð“€€.

utf8a=80 utf8b=80
utf8_hg=$( printf "\xF0\x93\x${utf8a}\x${utf8b}" )
echo $utf8_hg
ð“€€

So the UTF-8 works fine.

codepoint=13000
cp_hg=$( echo -e "\U000$codepoint" )
echo $cp_hg
ð“€€
bash -version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
Copyright (C) 2007 Free Software Foundation, Inc.

So the codepoint works fine in the shell and, as you can see, using the old version.
Unfortunately, not in the shell.

> What bash are you using interactively?

As I wrote at the end, I typically use the old version ... 3.2.57(1). I downloaded 5.0.17 years ago but I stopped using it because I wanted my other much more intensive script (which uses both bash and AppleScript) to work for any user who might download it. I can't really use a script for public consumption if Apple doesn't stay up to date which they don't.

> $ echo $BASH_VERSION
echo $BASH_VERSION
5.0.17(1)-release

That's just at the moment though.

> --
> Christian "naddy" Weisgerber

Re: using Unicode codepoints in a bash script

<6ce50f01-5ea8-472a-b987-3fefee5be796n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7566&group=comp.unix.shell#7566

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a05:620a:664b:b0:775:8ccf:f084 with SMTP id qg11-20020a05620a664b00b007758ccff084mr485358qkn.2.1701285212202;
Wed, 29 Nov 2023 11:13:32 -0800 (PST)
X-Received: by 2002:a05:6808:6084:b0:3b8:3a91:10f7 with SMTP id
de4-20020a056808608400b003b83a9110f7mr641619oib.3.1701285211622; Wed, 29 Nov
2023 11:13:31 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Wed, 29 Nov 2023 11:13:31 -0800 (PST)
In-Reply-To: <RmI9N.1308297$D%n3.74768@usenetxs.com>
Injection-Info: google-groups.googlegroups.com; posting-host=45.86.208.9; posting-account=-6wfVAoAAAC4Q16XRVdS4mm061q0uGye
NNTP-Posting-Host: 45.86.208.9
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com> <RmI9N.1308297$D%n3.74768@usenetxs.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6ce50f01-5ea8-472a-b987-3fefee5be796n@googlegroups.com>
Subject: Re: using Unicode codepoints in a bash script
From: paris2venice@gmail.com (paris2venice)
Injection-Date: Wed, 29 Nov 2023 19:13:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3129
 by: paris2venice - Wed, 29 Nov 2023 19:13 UTC

On Wednesday, November 29, 2023 at 7:01:12 AM UTC-8, Russell Marks wrote:
> paris2venice wrote:
>
> > I am trying to use Unicode codepoints along with Unicode UTF8s in a
> > bash script in order to compare codepoints and their matching UTF8 in
> > a database of the 1071 Egyptian hieroglyphs.
> >
> > So what I am trying to do is ensure that each record is accurately
> > validated in the 1071 lines of my file.
> >
> > It works in the bash shell e.g.
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> > ð“€€
> It sounds like you're on macOS, so I suspect the interactive shell
> you're using may be zsh, not bash - and probably a newer version.
> > However, if I put the same code into a script, e.g.
> > cat > cpt
> > #!/bin/bash
> > codepoint=13000
> > printf "\U000${codepoint}\n"
> >
> > chmod +x cpt
> > bash -x ./cpt
> > + codepoint=13000
> > + printf '\U00013000\n'
> > \U00013000
> >
> > So instead of creating the hieroglyph, the script just ignores the
> > same exact code. Is there any way around this? Thanks for any
> > help.
> >
> > bash --version
> > GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
> > Copyright (C) 2007 Free Software Foundation, Inc.
> That's a very old version of bash. To quote the CHANGES file from a
> newer version, "Fixed several bugs with the handling of valid and
> invalid unicode character values when used with the \u and \U escape
> sequences to printf and $'...'." So the old version not having those
> fixes might be the problem.
>
> -Rus.

Ciao again.

I just realized that the codepoint uses the \U and the UTF-8 only uses the \x so by replacing my shebang from /bin/bash i.e. 3.2.57(1) to /usr/local/bin/bash i.e. 5.0.17, the script does succeed. So many thanks.

Re: using Unicode codepoints in a bash script

<hF_9N.1380479$LJv3.364456@usenetxs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7570&group=comp.unix.shell#7570

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx02.ams4.POSTED!not-for-mail
From: zgedneil@spam^H^H^H^Hgmail.com (Russell Marks)
Newsgroups: comp.unix.shell
Subject: Re: using Unicode codepoints in a bash script
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
<RmI9N.1308297$D%n3.74768@usenetxs.com>
<76a0c6da-8c10-40b6-972e-883b76242991n@googlegroups.com>
Organization: this space intentionally left blank
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Lines: 43
Message-ID: <hF_9N.1380479$LJv3.364456@usenetxs.com>
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Thu, 30 Nov 2023 11:49:33 UTC
Date: Thu, 30 Nov 2023 11:49:33 GMT
X-Received-Bytes: 2363
 by: Russell Marks - Thu, 30 Nov 2023 11:49 UTC

paris2venice <paris2venice@gmail.com> wrote:

> Russell Marks wrote:
>> paris2venice wrote:
[...]
>> > So instead of creating the hieroglyph, the script just ignores the
>> > same exact code. Is there any way around this? Thanks for any
>> > help.
>> >
>> > bash --version
>> > GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
>> > Copyright (C) 2007 Free Software Foundation, Inc.
>
>> That's a very old version of bash. To quote the CHANGES file from a
>> newer version, "Fixed several bugs with the handling of valid and
>> invalid unicode character values when used with the \u and \U escape
>> sequences to printf and $'...'." So the old version not having those
>> fixes might be the problem.
[...]
> That's interesting. Did you see my following comment about trying
> it with version 5.0.17? I had the same exact results.

That version is also a bit old still, but I'm surprised at it giving
you the same trouble (assuming that printf is the builtin version).

Playing around with this on Linux, one way to nearly replicate your
result with a newer bash is "LC_ALL=C printf '\U00013000\n'" which for
me will output "\u00013000". So I suppose there could be a locale
issue involved.

> In any case, the UTF-8 does not fail even with the 3.2.57(1)
> release:
>
> utf8a=80 utf8b=80
> utf8_hg=$( printf "\xF0\x93\x${utf8a}\x${utf8b}" )
> echo $utf8_hg
> ð“€€

That's something at least, and given what you said in a later post
about wanting to cope with Apple's old bash version for the sake of
users, you might not have much alternative.

-Rus.

Re: using Unicode codepoints in a bash script

<slrnumh5tj.1prs.naddy@lorvorc.mips.inka.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7571&group=comp.unix.shell#7571

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: naddy@mips.inka.de (Christian Weisgerber)
Newsgroups: comp.unix.shell
Subject: Re: using Unicode codepoints in a bash script
Date: Thu, 30 Nov 2023 14:09:55 -0000 (UTC)
Message-ID: <slrnumh5tj.1prs.naddy@lorvorc.mips.inka.de>
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
<RmI9N.1308297$D%n3.74768@usenetxs.com>
<76a0c6da-8c10-40b6-972e-883b76242991n@googlegroups.com>
<hF_9N.1380479$LJv3.364456@usenetxs.com>
Injection-Date: Thu, 30 Nov 2023 14:09:55 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
logging-data="59261"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)
 by: Christian Weisgerber - Thu, 30 Nov 2023 14:09 UTC

On 2023-11-30, Russell Marks <zgedneil@spam^H^H^H^Hgmail.com> wrote:

> Playing around with this on Linux, one way to nearly replicate your
> result with a newer bash is "LC_ALL=C printf '\U00013000\n'" which for
> me will output "\u00013000". So I suppose there could be a locale
> issue involved.

But if you execute the commands in question first on the command
line, then in a minimal script as shown, the same locale settings
will be used for both.

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Re: using Unicode codepoints in a bash script

<cS4aN.1566376$iuU8.1534120@usenetxs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7572&group=comp.unix.shell#7572

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!nntp.comgw.net!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx06.ams4.POSTED!not-for-mail
From: zgedneil@spam^H^H^H^Hgmail.com (Russell Marks)
Newsgroups: comp.unix.shell
Subject: Re: using Unicode codepoints in a bash script
References: <aa6382e4-af81-47fd-9ed0-ae72d8277374n@googlegroups.com>
<RmI9N.1308297$D%n3.74768@usenetxs.com>
<76a0c6da-8c10-40b6-972e-883b76242991n@googlegroups.com>
<hF_9N.1380479$LJv3.364456@usenetxs.com>
<slrnumh5tj.1prs.naddy@lorvorc.mips.inka.de>
Organization: this space intentionally left blank
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
Lines: 22
Message-ID: <cS4aN.1566376$iuU8.1534120@usenetxs.com>
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Thu, 30 Nov 2023 18:52:56 UTC
Date: Thu, 30 Nov 2023 18:52:56 GMT
X-Received-Bytes: 1708
 by: Russell Marks - Thu, 30 Nov 2023 18:52 UTC

Christian Weisgerber <naddy@mips.inka.de> wrote:

> On 2023-11-30, Russell Marks <zgedneil@spam^H^H^H^Hgmail.com> wrote:
>
>> Playing around with this on Linux, one way to nearly replicate your
>> result with a newer bash is "LC_ALL=C printf '\U00013000\n'" which for
>> me will output "\u00013000". So I suppose there could be a locale
>> issue involved.
>
> But if you execute the commands in question first on the command
> line, then in a minimal script as shown, the same locale settings
> will be used for both.

True. The old 3.x bash presumably had Unicode bugs though, and the
differing output of "\U" vs. "\u" could hint at differing causes.
Also, if the 3.x bash binary is from Apple while the 5.x one isn't (as
seems likely), I imagine that the binaries could potentially be using
different libraries and/or locale configs.

Still, I have to admit this is all pretty speculative.

-Rus.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor