Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"Nuclear war would really set back cable." -- Ted Turner


devel / comp.unix.shell / Re: Musings about inspecting and processing binary data with shell

SubjectAuthor
* Musings about inspecting and processing binary data with shellJanis Papanagnou
+- Musings about inspecting and processing binary data with shellKaz Kylheku
`* Musings about inspecting and processing binary data with shellComputer Nerd Kev
 `* Musings about inspecting and processing binary data with shellJanis Papanagnou
  +- Musings about inspecting and processing binary data with shellComputer Nerd Kev
  `* Musings about inspecting and processing binary data with shellKaz Kylheku
   `- Musings about inspecting and processing binary data with shellJanis Papanagnou

1
Musings about inspecting and processing binary data with shell

<ucqg6g$3c7b4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6593&group=comp.unix.shell#6593

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Musings about inspecting and processing binary data with shell
Date: Thu, 31 Aug 2023 18:47:12 +0200
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <ucqg6g$3c7b4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 31 Aug 2023 16:47:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4ccd7202d6d6c281dd5a035f301de0f2";
logging-data="3546468"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+xKM+3saSmkqQIuY+e2CSW"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:0VooL4+xPxqFomgH/9wE0/Z2DVk=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Janis Papanagnou - Thu, 31 Aug 2023 16:47 UTC

When I saw a recent post where data got extracted from a binary file
it made me think about what would be the "right way" to do such jobs.

Shells (as other Unix tools) have problems at least with binary '\0'.
Kornshell supports binary data with 'typeset -b'; it stores the data
in a MIME format. I couldn't see, though, how to _process_ the binary
raw data within Kornshell easily. (If anyone has experiences here I'd
certainly like to hear!)
The 'od' tool allows displaying binary data in various formats, but
it works on a whole data stream (not on individual fields).
Are there any tools that support a more flexible inspection of binary
data?

I was thinking of some data specification and a tool to work with that
specification and binary data files. My current experimental hack has
a data specification of a form as shown in this example

4 X magic (41424300)
4 S version (31323334)
2 - skip (55ee)
2 D reserved (0004)
0 X variable data (11223344)
8 D start of header (0000000000000100)
8 D end of header (00000000000002ff)
4 D auth size (00000020)
0 B auth data (...)
4 X marker (ffffffff)
4 X label (deadbeef)
4 - skip (00000000)
3 S EOT (454f54)
1 Z illegal format

basically defining the number of octets ("bytes") of a field, a type
that indicates the desired interpretation and output format, and an
informal text that describes the field (the numeric data in brackets
here is only for my tests).
Skipping of fields is possible (with type = '-'), and variable length
data could be processed (with length = 0) depending on a previous len
data element. Endian'ness could be supported for numeric data fields.
(An extension might support null-terminated data fields and distantly
located length fields.) It would create something like

0x41424300 magic (41424300)
'1234' version (31323334)
4 reserved (0004)
0x11223344 variable data (11223344)
256 start of header (0000000000000100)
767 end of header (00000000000002ff)
32 auth size (00000020)
<0102040811121418212224284142444881828488000000000101010180808080> auth
data (...)
0xffffffff marker (ffffffff)
0xdeadbeef label (deadbeef)
'EOT' EOT (454f54)
*** Error: unsupported format 'Z'! (Use X, D, B, S, or -)

Before I continue working on my hacked sample script I'd be interested
to know whether such a tool with similar functionality already exists
[in the free Linux world]; I would think this is a common task so that
some usable tool certainly should exists but my own cursory search did
not lead anywhere. So any hints are welcome.

Janis

Re: Musings about inspecting and processing binary data with shell

<20230831114951.16@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6594&group=comp.unix.shell#6594

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.shell
Subject: Re: Musings about inspecting and processing binary data with shell
Date: Thu, 31 Aug 2023 19:04:52 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <20230831114951.16@kylheku.com>
References: <ucqg6g$3c7b4$1@dont-email.me>
Injection-Date: Thu, 31 Aug 2023 19:04:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="70d02f2c57901333f8c8fdfdb470c4f7";
logging-data="3590096"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UJlNXiPAnrP8P3fPzv9mYHCcrX/lVD9Q="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:2Kb8X4zP3rf0cr8b5YQATdTqexg=
 by: Kaz Kylheku - Thu, 31 Aug 2023 19:04 UTC

On 2023-08-31, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> Before I continue working on my hacked sample script I'd be interested
> to know whether such a tool with similar functionality already exists
> [in the free Linux world]; I would think this is a common task so that
> some usable tool certainly should exists but my own cursory search did
> not lead anywhere. So any hints are welcome.

The file utility and /etc/magic and all has a langauge for inspecting
fields in binaries and reporting. Usually this is compiled in some
way nowadays, so you don't find the source in /etc/magic; I've
not looked into that in depth.

Scripting languages have pack/unpack langauges based on brief, usually
single-character codes, inspired by Perl.

FFI capabilities in languages can be used for dealing with binary
data: instead of a pack notation in a string you declare structs
with typed and named fields.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Musings about inspecting and processing binary data with shell

<64f26b69@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6595&group=comp.unix.shell#6595

  copy link   Newsgroups: comp.unix.shell
Message-ID: <64f26b69@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Musings about inspecting and processing binary data with shell
Newsgroups: comp.unix.shell
References: <ucqg6g$3c7b4$1@dont-email.me>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 2 Sep 2023 08:53:29 +1000
Organization: Ausics - https://www.ausics.net
Lines: 22
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Fri, 1 Sep 2023 22:53 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> The 'od' tool allows displaying binary data in various formats, but
> it works on a whole data stream (not on individual fields).
> Are there any tools that support a more flexible inspection of binary
> data?
>
> I was thinking of some data specification and a tool to work with that
> specification and binary data files. My current experimental hack has
> a data specification of a form as shown in this example

If I'm following you, then this sounds like a description of
something like GNU Poke:

http://www.jemarch.net/poke

Not something I've had a use for myself since finding out about it
recently, but it seems like a comprehensive solution to the
problem.

--
__ __
#_ < |\| |< _#

Re: Musings about inspecting and processing binary data with shell

<ucvk6d$f6kh$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6596&group=comp.unix.shell#6596

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Musings about inspecting and processing binary data with shell
Date: Sat, 2 Sep 2023 17:26:04 +0200
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <ucvk6d$f6kh$1@dont-email.me>
References: <ucqg6g$3c7b4$1@dont-email.me> <64f26b69@news.ausics.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 2 Sep 2023 15:26:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a7ea6a9c3404f98248f88a34a6709641";
logging-data="498321"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19P78PWjYprpuV9Jir8ZRKA"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:H46uAeG9bEYliK57mcki+iTKpYk=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <64f26b69@news.ausics.net>
 by: Janis Papanagnou - Sat, 2 Sep 2023 15:26 UTC

On 02.09.2023 00:53, Computer Nerd Kev wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> The 'od' tool allows displaying binary data in various formats, but
>> it works on a whole data stream (not on individual fields).
>> Are there any tools that support a more flexible inspection of binary
>> data?
>>
>> I was thinking of some data specification and a tool to work with that
>> specification and binary data files. My current experimental hack has
>> a data specification of a form as shown in this example
>
> If I'm following you, then this sounds like a description of
> something like GNU Poke:
>
> http://www.jemarch.net/poke
>
> Not something I've had a use for myself since finding out about it
> recently, but it seems like a comprehensive solution to the
> problem.

This is really overwhelming! - Indeed it seems to cover what I was
looking for, but yet much much more; a complete programming language
with control constructs and exception handling, just to name one big
part of the package. So I'm not quite decided that it's what I'd use.
I certainly don't want to write a program[*] to extract some data,
for my purpose the advertised declarative approach[**] would be it.
I'll have to work through the docs to see whether some basic features
are actually supported (e.g. I'm not sure whether simple fixed length
strings (without \0 termination) are supported; I suppose they are,
but some statement I read in the docs made me cautious, so I'll have
to see). - All in all an interesting tool, so thanks for the link!

BTW, in the poke docs I saw examples WRT endian'ness, like the spec
little int a;
big int b;
int c;
In the past I've assumed that endian'ness is a machine characteristic
and would not change within a protocol element. The example taken from
the poke docs suggests that there may be different elements. Of course
we can think about different payload data in a single protocol element,
but is that usual? - I'm coming from the ITU-T ASN.1/BER perspective,
where the ASN.1 data spec is agnostic and endian'ness should happen
in the encoding and decoding process for a specific source and target
architecture. - The answer would lead either to a data spec (like in
poke) to specify that property separately with every data element, or
as a single parameter for the processing.

Janis

[*] An example can be found in the poke docs:
http://www.jemarch.net/poke-3.3-manual/poke.html#elfextractor

[**] http://www.jemarch.net/poke-3.3-manual/poke.html#Motivation

Re: Musings about inspecting and processing binary data with shell

<64f3c3b3@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6597&group=comp.unix.shell#6597

  copy link   Newsgroups: comp.unix.shell
Message-ID: <64f3c3b3@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Musings about inspecting and processing binary data with shell
Newsgroups: comp.unix.shell
References: <ucqg6g$3c7b4$1@dont-email.me> <64f26b69@news.ausics.net> <ucvk6d$f6kh$1@dont-email.me>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 3 Sep 2023 09:22:28 +1000
Organization: Ausics - https://www.ausics.net
Lines: 21
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Sat, 2 Sep 2023 23:22 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> BTW, in the poke docs I saw examples WRT endian'ness, like the spec
> little int a;
> big int b;
> int c;
> In the past I've assumed that endian'ness is a machine characteristic
> and would not change within a protocol element. The example taken from
> the poke docs suggests that there may be different elements. Of course
> we can think about different payload data in a single protocol element,
> but is that usual?

I can't speak to what's "usual" in a general sense, but one example
that comes to mind is working on a firmware file that's intended to
be programmed to a device by another system. It could have
information for the programming system stored in that system's byte
order, while the actual data to be written will use the endianness
of the device (or whatever reads it later).

--
__ __
#_ < |\| |< _#

Re: Musings about inspecting and processing binary data with shell

<20230902214850.617@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=6598&group=comp.unix.shell#6598

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.shell
Subject: Re: Musings about inspecting and processing binary data with shell
Date: Sun, 3 Sep 2023 06:25:02 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <20230902214850.617@kylheku.com>
References: <ucqg6g$3c7b4$1@dont-email.me> <64f26b69@news.ausics.net>
<ucvk6d$f6kh$1@dont-email.me>
Injection-Date: Sun, 3 Sep 2023 06:25:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3db77d17316144525385b79f78a13f40";
logging-data="851473"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/F6/GAL5qL44w1hyGZ4Lm0+FVLPb8ACDM="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:w9gNq60KHX9p6lZgmca9PK9oxWc=
 by: Kaz Kylheku - Sun, 3 Sep 2023 06:25 UTC

On 2023-09-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> BTW, in the poke docs I saw examples WRT endian'ness, like the spec
> little int a;
> big int b;

"big endian int" is nonsensical, which detracts from the example.

Endian specifications only make sense on exact sized types like int16,
uint32 or int64.

"int" is a local concept: matching this system's principal compiler's
"int", which is referenced in the system ABI.

If we are dealing with external data---which we must be, if we are
concerned with byte order---that data doesn't care what our local "int"
is. We don't want the extraction code to break with a different "int".

Tus, if we're commiting to a byte order, we should commit to the number
of bytes which constitute that order.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Musings about inspecting and processing binary data with shell

<ud1pa4$sqsk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=7360&group=comp.unix.shell#7360

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Musings about inspecting and processing binary data with shell
Date: Sun, 3 Sep 2023 13:05:39 +0200
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <ud1pa4$sqsk$1@dont-email.me>
References: <ucqg6g$3c7b4$1@dont-email.me> <64f26b69@news.ausics.net>
<ucvk6d$f6kh$1@dont-email.me> <20230902214850.617@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 3 Sep 2023 11:05:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="46edaf97665d369ba442b2caea69fcbf";
logging-data="945044"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19QP8YJt9EsN/SxKC+9SbVx"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:kdwKEGLpy+pdQS01/bgtJiJOYD0=
In-Reply-To: <20230902214850.617@kylheku.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sun, 3 Sep 2023 11:05 UTC

On 03.09.2023 08:25, Kaz Kylheku wrote:
> On 2023-09-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> BTW, in the poke docs I saw examples WRT endian'ness, like the spec
>> little int a;
>> big int b;
>
> "big endian int" is nonsensical, which detracts from the example.
>
> Endian specifications only make sense on exact sized types like int16,
> uint32 or int64.

I cannot speak for the 'poke' package; maybe 'int' is just a shortcut
for the 'int' type of the concrete machine where it is running.
Similar to the "int c" declaration (from my quote upthread) that is
assuming (as far as I recall) some default endian'ness on the machine.

Janis

> [...]

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor