Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Do molecular biologists wear designer genes?


devel / comp.protocols.dicom / Re: Encoding of Traditional Chinese using ISO IR-100

SubjectAuthor
* Encoding of Traditional Chinese using ISO IR-100Sebastian Meyer
`* Encoding of Traditional Chinese using ISO IR-100David Gobbi
 `* Encoding of Traditional Chinese using ISO IR-100Sebastian Meyer
  `* Encoding of Traditional Chinese using ISO IR-100David Gobbi
   `* Encoding of Traditional Chinese using ISO IR-100David Gobbi
    `- Encoding of Traditional Chinese using ISO IR-100Sebastian Meyer

1
Encoding of Traditional Chinese using ISO IR-100

<d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=671&group=comp.protocols.dicom#671

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:620a:15d6:b0:74e:9036:34f3 with SMTP id o22-20020a05620a15d600b0074e903634f3mr978528qkm.15.1682695501927;
Fri, 28 Apr 2023 08:25:01 -0700 (PDT)
X-Received: by 2002:a05:6214:908:b0:5ef:4440:bf7a with SMTP id
dj8-20020a056214090800b005ef4440bf7amr786655qvb.10.1682695501712; Fri, 28 Apr
2023 08:25:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 08:25:01 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=82.198.201.83; posting-account=klpRsgoAAABbMeo4tWqaNw_dLxFda3Mq
NNTP-Posting-Host: 82.198.201.83
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
Subject: Encoding of Traditional Chinese using ISO IR-100
From: meyer@mevis.de (Sebastian Meyer)
Injection-Date: Fri, 28 Apr 2023 15:25:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Sebastian Meyer - Fri, 28 Apr 2023 15:25 UTC

Hello all,

we received DICOM data from a country using Traditional Chinese. Much to our surprise, the data is encoded with (0008,0005) SpecificCharacterSet "ISO_IR 100". Our application reads the strings according to the standard and produces unreadable stuff, while other applications recognize the Chinese characters.
Next surprise: When dcmdump-ing the file, notepad++ also recognizes the Chinese characters correctly, so I can add a few lines of the dump. This is obviously a two-byte encoding, but I could not find any reference. What is the magic here?
Thank you, Sebastian

# Dicom-File-Format

# Dicom-Meta-Information-Header
# Used TransferSyntax: Little Endian Explicit
(0002,0000) UL 198 # 4, 1 FileMetaInformationGroupLength
(0002,0001) OB 00\01 # 2, 1 FileMetaInformationVersion
(0002,0002) UI =CTImageStorage # 26, 1 MediaStorageSOPClassUID
(0002,0003) UI [1.2.840.113619.2......................] # 56, 1 MediaStorageSOPInstanceUID
(0002,0010) UI =LittleEndianImplicit # 18, 1 TransferSyntaxUID
(0002,0012) UI [1.2.410.200010.99.3.5] # 22, 1 ImplementationClassUID
(0002,0013) SH [INF_4.5] # 8, 1 ImplementationVersionName

# Dicom-Data-Set
# Used TransferSyntax: Little Endian Implicit
(0008,0005) CS [ISO_IR 100] # 10, 1 SpecificCharacterSet
(0008,0016) UI =CTImageStorage # 26, 1 SOPClassUID
(0008,0018) UI [1.2.840.113619.2......................] # 56, 1 SOPInstanceUID
(0008,0060) CS [CT] # 2, 1 Modality
(0008,0070) LO [GE MEDICAL SYSTEMS] # 18, 1 Manufacturer
(0008,0080) LO [....綜合醫院] # 14, 1 InstitutionName
(0008,1030) LO [Low Dose Lung CT 低劑量肺部檢查(......補助)] # 44, 1 StudyDescription
(0008,1090) LO [Optima CT660] # 12, 1 ManufacturerModelName

Re: Encoding of Traditional Chinese using ISO IR-100

<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=672&group=comp.protocols.dicom#672

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:620a:1106:b0:742:9e15:3e0 with SMTP id o6-20020a05620a110600b007429e1503e0mr1026321qkk.5.1682700651404;
Fri, 28 Apr 2023 09:50:51 -0700 (PDT)
X-Received: by 2002:ad4:5a46:0:b0:5ef:6cae:c975 with SMTP id
ej6-20020ad45a46000000b005ef6caec975mr828051qvb.4.1682700651091; Fri, 28 Apr
2023 09:50:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 09:50:50 -0700 (PDT)
In-Reply-To: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.gobbi@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 16:50:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 4
 by: David Gobbi - Fri, 28 Apr 2023 16:50 UTC

Are you sure that these Chinese characters aren't just encoded in utf-8 in the DICOM file?

If they are, then notepad++ will probably be able to autodetect that they are utf-8 and display them properly, while applications that strictly enforce "ISO_IR 100" will display garbage.

If you can post a hex dump of one of the data elements that contains the characters, I could inspect it to see what encoding is used.

Re: Encoding of Traditional Chinese using ISO IR-100

<e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=673&group=comp.protocols.dicom#673

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:6214:18f1:b0:5ef:4729:9896 with SMTP id ep17-20020a05621418f100b005ef47299896mr841319qvb.1.1682702335825;
Fri, 28 Apr 2023 10:18:55 -0700 (PDT)
X-Received: by 2002:a05:622a:1a20:b0:3f0:abe7:24a2 with SMTP id
f32-20020a05622a1a2000b003f0abe724a2mr2142088qtb.10.1682702335606; Fri, 28
Apr 2023 10:18:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:18:55 -0700 (PDT)
In-Reply-To: <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=82.198.201.83; posting-account=klpRsgoAAABbMeo4tWqaNw_dLxFda3Mq
NNTP-Posting-Host: 82.198.201.83
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com> <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: meyer@mevis.de (Sebastian Meyer)
Injection-Date: Fri, 28 Apr 2023 17:18:55 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Sebastian Meyer - Fri, 28 Apr 2023 17:18 UTC

Hi David,
here is a dump of two of the affected tags, created with dcmdump:
>dcmdump +Qn +L +P InstitutionName +P StudyDescription "chinese_org.dcm"
(0008,0080) LO [&#169;&#201;&#164;&#175;&#186;&#238;&#166;X&#194;&#229;&#176;|] # 14, 1 InstitutionName
(0008,1030) LO [Low Dose Lung CT &#167;C&#190;&#175;&#182;q&#170;&#205;&#179;&#161;&#192;&#203;&#172;d(&#174;&#231;&#182;&#233;&#165;&#171;&#184;&#201;&#167;U)] # 44, 1 StudyDescription

Thanks for your help!

David Gobbi schrieb am Freitag, 28. April 2023 um 18:50:53 UTC+2:
> Are you sure that these Chinese characters aren't just encoded in utf-8 in the DICOM file?
>
> If they are, then notepad++ will probably be able to autodetect that they are utf-8 and display them properly, while applications that strictly enforce "ISO_IR 100" will display garbage.
>
> If you can post a hex dump of one of the data elements that contains the characters, I could inspect it to see what encoding is used.

Re: Encoding of Traditional Chinese using ISO IR-100

<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=674&group=comp.protocols.dicom#674

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:ad4:55cc:0:b0:5ef:52a8:bb8d with SMTP id bt12-20020ad455cc000000b005ef52a8bb8dmr873691qvb.0.1682704007325;
Fri, 28 Apr 2023 10:46:47 -0700 (PDT)
X-Received: by 2002:a05:622a:1ba4:b0:3e1:5755:7bbf with SMTP id
bp36-20020a05622a1ba400b003e157557bbfmr2123924qtb.5.1682704007107; Fri, 28
Apr 2023 10:46:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:46:46 -0700 (PDT)
In-Reply-To: <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.gobbi@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 17:46:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1992
 by: David Gobbi - Fri, 28 Apr 2023 17:46 UTC

The encoding is Big5. I'm suprised that Notepad++ could autodetect it. Here is how I checked the encoding in Python:

b = bytearray([169,201,164,175,186,238,166,194,229,176])
print(b.decode('big5'))
怡仁綜汕撠

I tried UTF-8 first, then Big5, and voila, that's what it was.

Although Big5 isn't part of the DICOM standard, I have a tool that can dump these files:
https://github.com/dgobbi/vtk-dicom/wiki/dicomdump
https://github.com/dgobbi/vtk-dicom/releases/tag/v0.8.14

dicomdump --charset big5 chinese_org.dcm

Re: Encoding of Traditional Chinese using ISO IR-100

<9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=675&group=comp.protocols.dicom#675

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:ae9:f312:0:b0:74d:562e:440d with SMTP id p18-20020ae9f312000000b0074d562e440dmr811489qkg.6.1682704437188;
Fri, 28 Apr 2023 10:53:57 -0700 (PDT)
X-Received: by 2002:a05:622a:1a92:b0:3e3:7dd2:47fc with SMTP id
s18-20020a05622a1a9200b003e37dd247fcmr2116485qtc.10.1682704437003; Fri, 28
Apr 2023 10:53:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:53:56 -0700 (PDT)
In-Reply-To: <393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.gobbi@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 17:53:57 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 6
 by: David Gobbi - Fri, 28 Apr 2023 17:53 UTC

On Friday, 28 April 2023 at 11:46:49 UTC-6, David Gobbi wrote:

> dicomdump --charset big5 chinese_org.dcm

Ah, I forgot that for this to work, the original CharacterSet has to be removed first, e.g.

dcmodify -e 0008,0005 chinese_org.dcm

Re: Encoding of Traditional Chinese using ISO IR-100

<30dd1659-02f5-4e72-a0a6-7b0cfcb84d1fn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=676&group=comp.protocols.dicom#676

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:620a:1001:b0:74e:4595:f39 with SMTP id z1-20020a05620a100100b0074e45950f39mr1087607qkj.11.1682705243955;
Fri, 28 Apr 2023 11:07:23 -0700 (PDT)
X-Received: by 2002:ac8:5703:0:b0:3ef:3af7:1c40 with SMTP id
3-20020ac85703000000b003ef3af71c40mr2170233qtw.3.1682705243737; Fri, 28 Apr
2023 11:07:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 11:07:23 -0700 (PDT)
In-Reply-To: <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=82.198.201.83; posting-account=klpRsgoAAABbMeo4tWqaNw_dLxFda3Mq
NNTP-Posting-Host: 82.198.201.83
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com> <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <30dd1659-02f5-4e72-a0a6-7b0cfcb84d1fn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: meyer@mevis.de (Sebastian Meyer)
Injection-Date: Fri, 28 Apr 2023 18:07:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 11
 by: Sebastian Meyer - Fri, 28 Apr 2023 18:07 UTC

Thank you so much!

I was surprised by this capability of notepad++ as well. Then I learned that the primary author has Taiwanese roots: https://donho.github.io/

David Gobbi schrieb am Freitag, 28. April 2023 um 19:53:58 UTC+2:
> On Friday, 28 April 2023 at 11:46:49 UTC-6, David Gobbi wrote:
>
> > dicomdump --charset big5 chinese_org.dcm
>
> Ah, I forgot that for this to work, the original CharacterSet has to be removed first, e.g.
>
> dcmodify -e 0008,0005 chinese_org.dcm

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor