Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Schshschshchsch. -- The Gorn, "Arena", stardate 3046.2


devel / comp.lang.tcl / Re: pdf4tcl and Chinese characters

SubjectAuthor
* pdf4tcl and Chinese charactersHarald Oehlmann
`* Re: pdf4tcl and Chinese charactersRich
 `* Re: pdf4tcl and Chinese charactersHarald Oehlmann
  `* Re: pdf4tcl and Chinese charactersRich
   `* Re: pdf4tcl and Chinese characterslamuzz...@gmail.com
    `* Re: pdf4tcl and Chinese charactersHarald Oehlmann
     `- Re: pdf4tcl and Chinese charactersHarald Oehlmann

1
pdf4tcl and Chinese characters

<uork4f$1tfuv$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13227&group=comp.lang.tcl#13227

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wortkarg3@yahoo.com (Harald Oehlmann)
Newsgroups: comp.lang.tcl
Subject: pdf4tcl and Chinese characters
Date: Wed, 24 Jan 2024 19:15:43 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uork4f$1tfuv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 24 Jan 2024 18:15:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="17f6fcf64f0c2c8854973e1befa4f3de";
logging-data="2015199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/AwBDevCxm2zaGELao9DA2"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:n9CdkMPaCYejCmawH4kZjKNTepo=
Content-Language: en-GB
 by: Harald Oehlmann - Wed, 24 Jan 2024 18:15 UTC

Thanks for great pdf4tcl !

I have a string with Chinese characters.
I output them with pdf4tcl:

pdf setFont {9 p} Helvetica
pdf setFillColor black
pdf text "实地"

I only get question marks.
The interesting ::pdf4tcl::createFont command should be used to select
256 glyphs. Well, Chinese language has a magnitude of this.

Has anybody solved this issue ?

Thanks for any hin,
Harald

pdf4tcl 0.9.4 on TCL 8.6.13...

Re: pdf4tcl and Chinese characters

<uorlko$1tp9d$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13228&group=comp.lang.tcl#13228

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: pdf4tcl and Chinese characters
Date: Wed, 24 Jan 2024 18:41:28 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <uorlko$1tp9d$1@dont-email.me>
References: <uork4f$1tfuv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 24 Jan 2024 18:41:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2abe0e15a4d5b6beff2c38d1284ede0a";
logging-data="2024749"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oZXKvQqa9P4D2PtaCEiQu"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:snyQTNjXUZ6NcCuz724arAmjXg4=
 by: Rich - Wed, 24 Jan 2024 18:41 UTC

Harald Oehlmann <wortkarg3@yahoo.com> wrote:
> Thanks for great pdf4tcl !
>
> I have a string with Chinese characters.
> I output them with pdf4tcl:
>
> pdf setFont {9 p} Helvetica
> pdf setFillColor black
> pdf text "实地"
>
> I only get question marks.
> The interesting ::pdf4tcl::createFont command should be used to select
> 256 glyphs. Well, Chinese language has a magnitude of this.
>
> Has anybody solved this issue ?
>
> Thanks for any hin,
> Harald
>
>
> pdf4tcl 0.9.4 on TCL 8.6.13...

You are bumping into a PDF limitation.

Each "font" within a PDF can address at most 256 characters. This is a
limit from very early in PDF's lifetime, and creates a real PIA for
using non-ASCII characters.

Basically you have to create a "custom" font in the pdf using
::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
byte values) to actual character glyphs. Then you have to "change
font" to your custom font in order to draw these characters, and use
your custom assigned code point value for the glyph you want output.

I.e., ASCII assigns 65 decimal to capital A. Using
::pdf4tcl::createFont you can assign 65 decimal to output the glyph 实
and then when you want to output that glyph, you 'change font' to your
custom font and output 65 decimal as the "character".

Re: pdf4tcl and Chinese characters

<uornq1$1u6j6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13229&group=comp.lang.tcl#13229

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!rocksolid2!news.neodome.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wortkarg3@yahoo.com (Harald Oehlmann)
Newsgroups: comp.lang.tcl
Subject: Re: pdf4tcl and Chinese characters
Date: Wed, 24 Jan 2024 20:18:25 +0100
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <uornq1$1u6j6$1@dont-email.me>
References: <uork4f$1tfuv$1@dont-email.me> <uorlko$1tp9d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 24 Jan 2024 19:18:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="17f6fcf64f0c2c8854973e1befa4f3de";
logging-data="2038374"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/B757ECVC0PGXajwD8yjlO"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:e8C7zOr3xbDofqFcHt430xUZcak=
In-Reply-To: <uorlko$1tp9d$1@dont-email.me>
Content-Language: en-GB
 by: Harald Oehlmann - Wed, 24 Jan 2024 19:18 UTC

Am 24.01.2024 um 19:41 schrieb Rich:
> Harald Oehlmann <wortkarg3@yahoo.com> wrote:
>> Thanks for great pdf4tcl !
>>
>> I have a string with Chinese characters.
>> I output them with pdf4tcl:
>>
>> pdf setFont {9 p} Helvetica
>> pdf setFillColor black
>> pdf text "实地"
>>
>> I only get question marks.
>> The interesting ::pdf4tcl::createFont command should be used to select
>> 256 glyphs. Well, Chinese language has a magnitude of this.
>>
>> Has anybody solved this issue ?
>>
>> Thanks for any hin,
>> Harald
>>
>>
>> pdf4tcl 0.9.4 on TCL 8.6.13...
>
> You are bumping into a PDF limitation.
>
> Each "font" within a PDF can address at most 256 characters. This is a
> limit from very early in PDF's lifetime, and creates a real PIA for
> using non-ASCII characters.
>
> Basically you have to create a "custom" font in the pdf using
> ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
> byte values) to actual character glyphs. Then you have to "change
> font" to your custom font in order to draw these characters, and use
> your custom assigned code point value for the glyph you want output.
>
> I.e., ASCII assigns 65 decimal to capital A. Using
> ::pdf4tcl::createFont you can assign 65 decimal to output the glyph 实
> and then when you want to output that glyph, you 'change font' to your
> custom font and output 65 decimal as the "character".
>
Thank you, Rich. That is what I feared.
Is there nobody out there who has automated this?
I suppose, this is not easy...
You also want to have one text field with one font, otherwise, the text
is interrupted, I suppose.

So, I will try to create a function, which assembles the glyphs of one
text, then creates a font and then outputs it.
In a 2nd step, an optimization may be done to find one font with 256
characters, which assembles as many text snippets as possible.

I have to sleep on this...

Thanks,
Harald

Re: pdf4tcl and Chinese characters

<uoruq2$1vau5$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13231&group=comp.lang.tcl#13231

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: pdf4tcl and Chinese characters
Date: Wed, 24 Jan 2024 21:17:54 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 72
Message-ID: <uoruq2$1vau5$1@dont-email.me>
References: <uork4f$1tfuv$1@dont-email.me> <uorlko$1tp9d$1@dont-email.me> <uornq1$1u6j6$1@dont-email.me>
Injection-Date: Wed, 24 Jan 2024 21:17:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2abe0e15a4d5b6beff2c38d1284ede0a";
logging-data="2075589"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Iw41s8tl5LPdbGW+EjJCV"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:zXWYdyF22EcDzaoYmdP1mwApVpc=
 by: Rich - Wed, 24 Jan 2024 21:17 UTC

Harald Oehlmann <wortkarg3@yahoo.com> wrote:
> Am 24.01.2024 um 19:41 schrieb Rich:
>> Harald Oehlmann <wortkarg3@yahoo.com> wrote:
>>> Thanks for great pdf4tcl !
>>>
>>> I have a string with Chinese characters.
>>> I output them with pdf4tcl:
>>> ...
>>>
>>> I only get question marks.
>>> The interesting ::pdf4tcl::createFont command should be used to select
>>> 256 glyphs. Well, Chinese language has a magnitude of this.
>>
>> You are bumping into a PDF limitation.
>>
>> Each "font" within a PDF can address at most 256 characters. This is a
>> limit from very early in PDF's lifetime, and creates a real PIA for
>> using non-ASCII characters.
>>
>> Basically you have to create a "custom" font in the pdf using
>> ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
>> ...
>>
> Thank you, Rich. That is what I feared.
> Is there nobody out there who has automated this?

Not that I'm aware of for pdf4tcl. Possibly for some other library for
some other language.

> I suppose, this is not easy...

Not trivial, not rocket science either.

> You also want to have one text field with one font, otherwise, the text
> is interrupted, I suppose.

Depending upon what you mean by text field, you can switch fonts before
drawing each glyph if you like and it will have no impact on the final
viewing of the pdf. If by field you mean a data entry field, then I
have no idea there.

When you delve down into the PDF internals, you find that PDF is
nothing more than instructions to place glyphs at x,y positions on a
sheet of virtual paper. I.e., internally it is very much like the
Tcl canvas widget. Which is why 'font switches' don't cause problems
with the render (unless you, the creator, create vastly different
actual fonts for 'effect'). But if the plural "fonts" are all of the
same size and all from the same base, font switches are invisible in
the final render.

> So, I will try to create a function, which assembles the glyphs of one
> text, then creates a font and then outputs it.

Yes, you either have to decide what glyphs you want ahead of time, and
'pre-create' fonts to draw those glyphs, or you have to analyze the
characters you want to "print" for the pdf (or for the current page)
and create a custom font for those characters.

The one advantage you get for the second method is that most unicode
TTF font files are huge, and if you create a custom internal font for
only the used characters, pdf4tcl only embeds the glyphs for the
characters you actually use, which means if you only use 1% of the
glyphs, you only store 1% of the font file into the pdf, making the pdf
smaller.

> In a 2nd step, an optimization may be done to find one font with 256
> characters, which assembles as many text snippets as possible.

Yes, it will be possible to do so, sometimes. For Chinese, given the
huge number of total characters, this may be difficult to do in a
general sense for all possibilities, but you might come close.

Re: pdf4tcl and Chinese characters

<6e451903-6c15-44ff-8141-b23d75d39782n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13233&group=comp.lang.tcl#13233

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:620a:4551:b0:783:3636:7799 with SMTP id u17-20020a05620a455100b0078336367799mr16826qkp.15.1706138825073;
Wed, 24 Jan 2024 15:27:05 -0800 (PST)
X-Received: by 2002:a05:620a:444c:b0:783:a460:6a24 with SMTP id
w12-20020a05620a444c00b00783a4606a24mr13534qkp.5.1706138824717; Wed, 24 Jan
2024 15:27:04 -0800 (PST)
Path: i2pn2.org!rocksolid2!news.neodome.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Wed, 24 Jan 2024 15:27:04 -0800 (PST)
In-Reply-To: <uoruq2$1vau5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=190.139.63.124; posting-account=Cgy5dQoAAADca6APzXXyHNx6hvwj5mk6
NNTP-Posting-Host: 190.139.63.124
References: <uork4f$1tfuv$1@dont-email.me> <uorlko$1tp9d$1@dont-email.me>
<uornq1$1u6j6$1@dont-email.me> <uoruq2$1vau5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6e451903-6c15-44ff-8141-b23d75d39782n@googlegroups.com>
Subject: Re: pdf4tcl and Chinese characters
From: lamuzzachiodi@gmail.com (lamuzz...@gmail.com)
Injection-Date: Wed, 24 Jan 2024 23:27:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2328
 by: lamuzz...@gmail.com - Wed, 24 Jan 2024 23:27 UTC

Harald,
take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
There are an example (utf8.tcl, pasted below) with chinese characters using font simhei.ttf.
May be this help you.
Saludos,

Alejandro

#--- utf8.tcl -----------
package require tclfpdf
namespace import ::tclfpdf::*

Init;
AddPage;
# Add a Unicode font (uses UTF-8)
AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
SetFont "DejaVu" "" 14;
Write 8 " -----
English: Hello World
Greek: Γειά σου κόσμος
Polish: Witaj świecie
Portuguese: Olá mundo
Spanish: Hola mundo
Russian: Здравствулте мир
Vietnamese: Xin chào thế giới
------";
Ln 10;
AddFont "simhei" "" "simhei.ttf" 1;
SetFont "simhei" "" 20;
Write 10 "Chinese: 你好世界";
#Select a standard font (uses windows-1252)
SetFont "Arial" "" 14;
Ln 10;
Write 5 "The file size of this PDF is only 16 KB.";
Output "utf8.pdf";

Re: pdf4tcl and Chinese characters

<uot3g3$27k1l$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13238&group=comp.lang.tcl#13238

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wortkarg3@yahoo.com (Harald Oehlmann)
Newsgroups: comp.lang.tcl
Subject: Re: pdf4tcl and Chinese characters
Date: Thu, 25 Jan 2024 08:44:03 +0100
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <uot3g3$27k1l$1@dont-email.me>
References: <uork4f$1tfuv$1@dont-email.me> <uorlko$1tp9d$1@dont-email.me>
<uornq1$1u6j6$1@dont-email.me> <uoruq2$1vau5$1@dont-email.me>
<6e451903-6c15-44ff-8141-b23d75d39782n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 25 Jan 2024 07:44:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2d9d206ec0a506602eaaa53273d80203";
logging-data="2347061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+kIl2pfIHdBjpkXzvCZXvK"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SDgFlJUV1moP1tmCb6uOcjEosB4=
In-Reply-To: <6e451903-6c15-44ff-8141-b23d75d39782n@googlegroups.com>
Content-Language: en-GB
 by: Harald Oehlmann - Thu, 25 Jan 2024 07:44 UTC

Muchas gracias, Alejandro,
looks promissing,
Harald

Am 25.01.2024 um 00:27 schrieb lamuzz...@gmail.com:
> Harald,
> take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
> There are an example (utf8.tcl, pasted below) with chinese characters using font simhei.ttf.
> May be this help you.
> Saludos,
>
> Alejandro
>
> #--- utf8.tcl -----------
> package require tclfpdf
> namespace import ::tclfpdf::*
>
> Init;
> AddPage;
> # Add a Unicode font (uses UTF-8)
> AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
> SetFont "DejaVu" "" 14;
> Write 8 " -----
> English: Hello World
> Greek: Γειά σου κόσμος
> Polish: Witaj świecie
> Portuguese: Olá mundo
> Spanish: Hola mundo
> Russian: Здравствулте мир
> Vietnamese: Xin chào thế giới
> ------";
> Ln 10;
> AddFont "simhei" "" "simhei.ttf" 1;
> SetFont "simhei" "" 20;
> Write 10 "Chinese: 你好世界";
> #Select a standard font (uses windows-1252)
> SetFont "Arial" "" 14;
> Ln 10;
> Write 5 "The file size of this PDF is only 16 KB.";
> Output "utf8.pdf";

Re: pdf4tcl and Chinese characters

<urn9gg$3sn1p$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=13394&group=comp.lang.tcl#13394

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wortkarg3@yahoo.com (Harald Oehlmann)
Newsgroups: comp.lang.tcl
Subject: Re: pdf4tcl and Chinese characters
Date: Wed, 28 Feb 2024 13:38:41 +0100
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <urn9gg$3sn1p$1@dont-email.me>
References: <uork4f$1tfuv$1@dont-email.me> <uorlko$1tp9d$1@dont-email.me>
<uornq1$1u6j6$1@dont-email.me> <uoruq2$1vau5$1@dont-email.me>
<6e451903-6c15-44ff-8141-b23d75d39782n@googlegroups.com>
<uot3g3$27k1l$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 12:38:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="75088c361c1919e3a5ff940ba499003a";
logging-data="4086841"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2rynwhI2WcUHfqJdIpn0J"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Ltx60Ux8rEKY6C2WzBYWsF0mUEw=
Content-Language: en-GB
In-Reply-To: <uot3g3$27k1l$1@dont-email.me>
 by: Harald Oehlmann - Wed, 28 Feb 2024 12:38 UTC

Dear team,

please allow me to give a status on this.
I looked into tclfpdf, great stuff.
Specially the Error routine to call "exit" is quite wiered.
But now, this issue is solved using PDF4TCL as described at the very end of:
https://wiki.tcl-lang.org/page/pdf4tcl

It is quite manual and an automated process like in tclfpdf would be
great. The pdf4tcl ticket tracked GOT 6 new tickets.

bUT ANYWAY; THANK YOU ALL AND TAKE CARE;
hARALD

Am 25.01.2024 um 08:44 schrieb Harald Oehlmann:
> Muchas gracias, Alejandro,
> looks promissing,
> Harald
>
> Am 25.01.2024 um 00:27 schrieb lamuzz...@gmail.com:
>> Harald,
>> take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
>> There are an example (utf8.tcl, pasted below) with chinese characters
>> using font simhei.ttf.
>> May be this help you.
>> Saludos,
>>
>> Alejandro
>>
>> #--- utf8.tcl -----------
>> package require tclfpdf
>> namespace import  ::tclfpdf::*
>>
>> Init;
>> AddPage;
>> # Add a Unicode font (uses UTF-8)
>> AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
>> SetFont "DejaVu" "" 14;
>> Write 8 "        -----
>> English: Hello World
>> Greek: Γειά σου κόσμος
>> Polish: Witaj świecie
>> Portuguese: Olá mundo
>> Spanish: Hola mundo
>> Russian: Здравствулте мир
>> Vietnamese: Xin chào thế giới
>>         ------";
>> Ln 10;
>> AddFont "simhei" "" "simhei.ttf" 1;
>> SetFont "simhei" "" 20;
>> Write 10 "Chinese: 你好世界";
>> #Select a standard font (uses windows-1252)
>> SetFont  "Arial" "" 14;
>> Ln 10;
>> Write 5 "The file size of this PDF is only 16 KB.";
>> Output "utf8.pdf";
>

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor