Rocksolid Light - sci.math - Re: Meta: a usenet server just for sci.math

On 03/08/2023 08:51 PM, Ross Finlayson wrote:
> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>> usenet, not using Google group, via:
>>>>>
>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>
>>>>> So was looking for an alternative. And found this
>>>>> alternative, which seems fine:
>>>>>
>>>>> news.solani.org
>>>>>
>>>>> Have Fun!
>>>>>
>>>>> P.S.: Technical spec of news.solani.org:
>>>>>
>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>
>>>>> advantage of bare metal usenet,
>>>>> you see all headers of message.
>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>> Search you mentioned and for example HTTP is adding the SEARCH verb,
>>>> In traffic there are two kinds of usenet users,
>>>> viewers and traffic through Google Groups,
>>>> and, USENET. (USENET traffic.)
>>>>
>>>> Here now Google turned on login to view their
>>>> Google Groups - effectively closing the Google Groups
>>>> without a Google login.
>>>>
>>>> I suppose if they're used at work or whatever though
>>>> they'd be open.
>>>>
>>>>
>>>>
>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>> it scales up though then I think in the runtime is a situation where
>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>> or in sockets there is a drop, or it fell off the driver. I've implemented
>>>> the code this far, what has all of NNTP in a file and then the "re-routine,
>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>
>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>
>>>> Implementing the serial queue or "monohydra", or slique,
>>>> makes for that then when the parser is constantly parsing,
>>>> it seems a usual queue like data structure with parsing
>>>> returning its bounds, consuming the queue.
>>>>
>>>> Having the file buffers all down small on 4K pages,
>>>> has that a next usual page size is the megabyte.
>>>>
>>>> Here though it seems to make sense to have a natural
>>>> 4K alignment the file system representation, then that
>>>> it is moving files.
>>>>
>>>> So, then with the new modern Java, it that runs in its own
>>>> Java server runtime environment, it seems I would also
>>>> need to see whether the cloud virt supported the I/O model
>>>> or not, or that the cooperative multi-threading for example
>>>> would be single-threaded. (Blocking abstractly.)
>>>>
>>>> Then besides I suppose that could be neatly with basically
>>>> the program model, and its file model, being well-defined,
>>>> then for NNTP with IMAP organization search and extensions,
>>>> those being standardized, seems to make sense for an efficient
>>>> news file organization.
>>>>
>>>> Here then it seems for serving the NNTP, and for example
>>>> their file bodies under the storage, with the fixed headers,
>>>> variable header or XREF, and the message body, then under
>>>> content it's same as storage.
>>>>
>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>
>>>> Let's see here then, if I get the load test running, or,
>>>> just put a limit under the load while there are no load test
>>>> errors, it seems the algorithm then scales under load to be
>>>> making usually the algorithm serial in CPU, with: encryption,
>>>> and compression (traffic). (Block ciphers instead of serial transfer.)
>>>>
>>>> Then, the industry pattern with re-routines, has that the
>>>> re-routines are naturally co-operative in the blocking,
>>>> and in the language, including flow-of-control and exception scope.
>>>>
>>>>
>>>> So, I have a high-performance implementation here.
>>> It seems like for NFS, then, and having the separate read and write of the client,
>>> a default filesystem, is an idea for the system facility: mirroring the mounted file
>>> locally, and, providing the read view from that via a different route.
>>>
>>>
>>> A next idea then seems for the organization, the client views themselves
>>> organize over the durable and available file system representation, this
>>> provides anyone a view over the protocol with a group file convention.
>>>
>>> I.e., while usual continuous traffic was surfing, individual reads over group
>>> files could have independent views, for example collating contents.
>>>
>>> Then, extracting requests from traffic and threads seems usual.
>>>
>>> (For example a specialized object transfer view.)
>>>
>>> Making protocols for implementing internet protocols in groups and
>>> so on, here makes for giving usenet example views to content generally.
>>>
>>> So, I have designed a protocol node and implemented it mostly,
>>> then about designed an object transfer protocol, here the idea
>>> is how to make it so people can extract data, for example their own
>>> data, from a large durable store of all the usenet messages,
>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>> Greatest Hits".
>>>
>>> Here the point is to figure that usenet, these days, can be operated
>>> in cooperation with usenet, and really for its own sake, for leaving
>>> messages in usenet and here for usenet protocol stores as there's
>>> no reason it's plain text the content, while the protocol supports it.
>>>
>>> Building personal view for example is a simple matter of very many
>>> service providers any of which sells usenet all day for a good deal.
>>>
>>> Let's see here, $25/MM, storage on the cloud last year for about
>>> a million messages for a month is about $25. Outbound traffic is
>>> usually the metered cloud traffic, here for example that CDN traffic
>>> support the universal share convention, under metering. What that
>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
>>> RAM, then that there is for seeking that Network Store or Database Time
>>> instead effectively becomes File I/O time, as what may be faster,
>>> and more durable. There's a faster database time for scaling the ingestion
>>> here with that the file view is eventually consistent. (And reliable.)
>>>
>>> Checking the files would be over time for example with "last checked"
>>> and "last dropped" something along the lines of, finding wrong offsets,
>>> basically having to make it so that it survives neatly corruption of the
>>> store (by being more-or-less stored in-place).
>>>
>>> Content catalog and such, catalog.
>> Then I wonder and figure the re-routine can scale.
>>
>> Here for the re-routine, the industry factory pattern,
>> and the commands in the protocols in the templates,
>> and the memory module, with the algorithm interface,
>> in the high-performance computer resource, it is here
>> that this simple kind of "writing Internet software"
>> makes pretty rapidly for adding resources.
>>
>> Here the design is basically of a file I/O abstraction,
>> that the computer reads data files with mmap to get
>> their handlers, what results that for I/O map the channels
>> result transferring the channels in I/O for what results,
>> in mostly the allocated resource requirements generally,
>> and for the protocol and algorithm, it results then that
>> the industry factory pattern and making for interfaces,
>> then also here the I/O routine as what results that this
>> is an implementation, of a network server, mostly is making
>> for that the re-routine, results very neatly a model of
>> parallel cooperation.
>>
>> I think computers still have file systems and file I/O but
>> in abstraction just because PAGE_SIZE is still relevant for
>> the network besides or I/O, if eventually, here is that the
>> value types are in the commands and so on, it is besides
>> that in terms of the resources so defined it still is in a filesystem
>> convention that a remote and unreliable view of it suffices.
>>
>> Here then the source code also being "this is only 20-50k",
>> lines of code, with basically an entire otherwise library stack
>> of the runtime itself, only the network and file abstraction,
>> this makes for also that modularity results. (Factory Industry
>> Pattern Modules.)
>>
>> For a network server, here, that, mostly it is high performance
>> in the sense that this is about the most direct handle on the channels
>> and here mostly for the text layer in the I/O order, or protocol layer,
>> here is that basically encryption and compression usually in the layer,
>> there is besides a usual concern where encryption and compression
>> are left out, there is that text in the layer itself is commands.
>>
>> Then, those being constants under the resources for the protocol,
>> it's what results usual protocols like NNTP and HTTP and other protocols
>> with usually one server and many clients, here is for that these protocols
>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>
>> These are here defined "all Java" or "Pure Java", i.e. let's be clear that
>> in terms of the reference abstraction layer, I think computers still use
>> the non-blocking I/O and filesystems and network to RAM, so that as
>> the I/O is implemented in those it actually has those besides instead for
>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>> semantics for servicing the I/O in the accepter routine and what makes
>> for that the platform also provides a reference encryption implementation,
>> if not so relevant for the block encoder chain, besides that for example
>> compression has a default implementation, here the I/O model is as simply
>> in store for handles, channels, ..., that it results that data especially delivered
>> from a constant store can anyways be mostly compressed and encrypted
>> already or predigested to serve, here that it's the convention, here is for
>> resulting that these client-server protocols, with usually reads > postings
>> then here besides "retention", basically here is for what it is.
>>
>> With the re-routine and the protocol layer besides, having written the
>> routines in the re-routine, what there is to write here is this industry
>> factory, or a module framework, implementing the re-routines, as they're
>> built from the linear description a routine, makes for as the routine progresses
>> that it's "in the language" and that more than less in the terms, it makes for
>> implementing the case of logic for values, in the logic's flow-of-control's terms.
>>
>> Then, there is that actually running the software is different than just
>> writing it, here in the sense that as a server runtime, it is to be made a
>> thing, by giving it a name, and giving it an authority, to exist on the Internet.
>>
>> There is basically that for BGP and NAT and so on, and, mobile fabric networks,
>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
>> entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
>> respect to that TCP/IP is so provided or in terms of process what results
>> ports mostly and connection models where it is exactly the TCP after the IP,
>> the Transport Control Protocol and Internet Protocol, have here both this
>> socket and datagram connection orientation, or stateful and stateless or
>> here that in terms of routing it's defined in addresses, under that names
>> and routing define sources, routes, destinations, ..., that routine numeric
>> IP addresses result in the usual sense of the network being behind an IP
>> and including IPv4 network fabric with respect to local routers.
>>
>> I.e., here to include a service framework is "here besides the routine, let's
>> make it clear that in terms of being a durable resource, there needs to be
>> some lockbox filled with its sustenance that in some locked or constant
>> terms results that for the duration of its outlay, say five years, it is held
>> up, then, it will be so again, or, let down to result the carry-over that it
>> invested to archive itself, I won't have to care or do anything until then".
>>
>>
>> About the service activation and the idea that, for a port, the routine itself
>> needs only run under load, i.e. there is effectively little traffic on the old archives,
>> and usually only the some other archive needs any traffic. Here the point is
>> that for the Java routine there is the system port that was accepted for the
>> request, that inetd or the systemd or means the network service was accessed,
>> made for that much as for HTTP the protocol is client-server also for IP the
>> protocol is client-server, while the TCP is packets. This is a general idea for
>> system integration while here mostly the routine is that being a detail:
>> the filesystem or network resource that results that the re-routines basically
>> make very large CPU scaling.
>>
>> Then, it is basically containerized this sense of "at some domain name, there
>> is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
>>
>> I.e. being built on connection oriented protocols like the socket layer,
>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
>> it's more than less sensible that most users have no idea of installing some
>> NNTP browser or pointing their email to IMAP so that the email browser
>> browses the newsgroups and for postings, here this is mostly only talk
>> about implementing NNTP then IMAP and HTTP that happens to look like that,
>> besides for example SMTP or NNTP posting.
>>
>> I.e., having "this IMAP server, happens to be this NNTP module", or
>> "this HTTP server, happens to be a real simple mailbox these groups",
>> makes for having partitions and retentions of those and that basically
>> NNTP messages in the protocol can be more or less the same content
>> in media, what otherwise is of a usual message type.
>>
>> Then, the NNTP server-server routine is the progation of messages
>> besides "I shall hire ten great usenet retention accounts and gently
>> and politely draw them down and back-fill Usenet, these ten groups".
>>
>> By then I would have to have made for retention in storage, such contents,
>> as have a reference value, then for besides making that independent in
>> reference value, just so that it suffices that it basically results "a usable
>> durable filesystem that happens you can browse it like usenet". I.e. as
>> the pieces to make the backfill are dug up, they get assigned reference numbers
>> of their time to make for what here is that in a grand schema of things,
>> they have a reference number in numerical order (and what's also the
>> server's "message-number" besides its "message-id") as noted above this
>> gets into the storage for retention of a file, while, most services for this
>> are instead for storage and serving, not necessarily or at all retention.
>>
>> I.e., the point is that as the groups are retained from retention, there is an
>> approach what makes for an orderly archeology, as for what convention
>> some data arrives, here that this server-server routine is besides the usual
>> routine which is "here are new posts, propagate them", it's "please deliver
>> as of a retention scan, and I'll try not to repeat it, what results as orderly
>> as possible a proof or exercise of what we'll call afterward entire retention",
>> then will be for as of writing a file that "as of the date, from start to finish,
>> this site certified these messages as best-effort retention".
>>
>> It seems then besides there is basically "here is some mbox file, serve it
>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
>> what is ingestion, is to result for the protocol that "for this protocol,
>> there is actually a normative filesystem representation that happens to
>> be pretty much also altogether definede by the protocol", the point is
>> that ingestion would result in command to remain in the protocol,
>> that a usual file type that "presents a usual abstraction, of a filesystem,
>> as from the contents of a file", here with the notion of "for all these
>> threaded discussions, here this system only cares some approach to
>> these ten particular newgroups that already have mostly their corpus
>> though it's not in perhaps their native mbox instead consulted from services".
>>
>> Then, there's for storing and serving the files, and there is the usual
>> notion that moving the data, is to result, that really these file organizations
>> are not so large in terms of resources, being "less than gigabytes" or so,
>> still there's a notion that as a durable resource they're to be made
>> fungible here the networked file approach in the native filesystem,
>> then that with respect to it's a backing store, it's to make for that
>> the entire enterprise is more or less to made in terms of account,
>> that then as a facility on the network then a service in the network,
>> it's basically separated the facility and service, while still of course
>> that the service is basically defined by its corpus.
>>
>>
>> Then, to make that fungible in a world of account, while with an exit
>> strategy so that the operation isn't not abstract, is mostly about the
>> domain name, then that what results the networking, after trusted
>> network naming and connections for what result routing, and then
>> the port, in terms of that there are usual firewalls in ports though that
>> besides usually enough client ports are ephemeral, here the point is
>> that the protocols and their well-known ports, here it's usually enough
>> that the Internet doesn't concern itself so much protocols but with
>> respect to proxies, here that for example NNTP and IMAP don't have
>> so much anything so related that way after startTLS. For the world of
>> account, is basically to have for a domain name, an administrator, and,
>> an owner or representative. These are to establish authority for changes
>> and also accountability for usage.
>>
>> Basically they're to be persons and there is a process to get to be an
>> administrator of DNS, most always there are services that a usual person
>> implementing the system might use, besides for example the numerical.
>>
>> More relevant though to DNS is getting servers on the network, with respect
>> to listening ports and that they connect to clients what so discover them as
>> via DNS or configuration, here as above the usual notion that these are
>> standard services and run on well-known ports for inetd or systemd.
>> I.e. there is basically that running a server and dedicated networking,
>> and power and so on, and some notion of the limits of reliability, is then
>> as very much in other aspects of the organization of the system, i.e. its name,
>> while at the same time, the point that a module makes for that basically
>> the provision of a domain name or well-known or ephemeral host, is the
>> usual notion that static IP addresses are a limited resource and as about
>> the various networks in IPv4 and how they route traffic, is for that these
>> services have well-known sections in DNS for at least that the most usual
>> configuration is none.
>>
>> For a usual global reliability and availability, is some notion basically that
>> each region and zone has a service available on the IP address, for that
>> "hostname" resolves to the IP addresses. As well, in reverse, for the IP
>> address and about the hostname, it should resolve reverse to hostname.
>>
>> About certificates mostly for identification after mapping to port, or
>> multi-home Internet routing, here is the point that whether the domain
>> name administration is "epochal" or "regular", is that epochs are defined
>> by the ports behind the numbers and the domain name system as well,
>> where in terms of the registrar, the domain names are epochal to the
>> registrar, with respect to owners of domain names.
>>
>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>> and also BGP and NAT and routing and what are local and remote
>> addresses, here is for not-so-much "implement DNS the protocol
>> also while you're at it", rather for what results that there is a durable
>> and long-standing and proper doorman, for some usenet.science.
>>
>> Here then the notion seems to be whether the doorman basically
>> knows well-known services, is a multi-homing router, or otherwise
>> what is the point that it starts the lean runtime, with respect to that
>> it's a container and having enough sense of administration its operation
>> as contained. I.e. here given a port and a hostname and always running
>> makes for that as long as there is the low (preferable no) idle for services
>> running that have no clients, is here also for the cheapest doorman that
>> knows how to standup the client sentinel. (And put it back away.)
>>
>> Probably the most awful thing in the cloud services is the cost for
>> data ingress and egress. What that means is that for example using
>> a facility that is bound by that as a cost instead of under some constant
>> cost, is basically why there is the approach that the containers needs a
>> handle to the files, and they're either local files or network files, here
>> with the some convention above in archival a shared consistent view
>> of all the files, or abstractly consistent, is for making that the doorman
>> can handle lots of starting and finishing connections, while it is out of
>> the way when usually it's client traffic and opening and closing connections,
>> and the usual abstraction is that the client sentinel is never off and doorman
>> does nothing, here is for attaching the one to some lower constant cost,
>> where for example any long-running cost is more than some low constant cost.
>>
>> Then, this kind of service is often represented by nodes, in the usual sense
>> "here is an abstract container with you hope some native performance under
>> the hypervisor where it lives on the farm on its rack, it basically is moved the
>> image to wherever it's requested from and lives there, have fun, the meter is on".
>> I.e. that's just "this Jar has some config conventions and you can make the
>> container associate it and watchdog it with systemd for example and use the
>> cgroups while you're at it and make for tempfs quota and also the best network
>> file share, which you might be welcome to cache if you care just in the off-chance
>> that this file-mapping is free or constant cost as long as it doesn't egress the
>> network", is for here about the facilities that work, to get a copy of the system
>> what with respect to its usual operation is a piece of the Internet.
>>
>> For the different reference modules (industry factories) in their patterns then
>> and under combined configuration "file + process + network + fare", is that
>> the fare of the service basically reflects a daily coin, in the sense that it
>> represents an annual or epochal fee, what results for the time there is
>> what is otherwise all defined the "file + process + network + name",
>> what results it perpetuates in operation more than less simply and automatically.
>>
>> Then, the point though is to get it to where "I can go to this service, and
>> administer it more or less by paying an account, that it thus lives in its
>> budget and quota in its metered world".
>>
>> That though is very involved with identity, that in terms of "I the account
>> as provided this sum make this sum paid with respect to an agreement",
>> is that authority to make agreements must make that it results that the
>> operation of the system, is entirely transparent, and defined in terms of
>> the roles and delegation, conventions in operation.
>>
>> I.e., I personally don't want to administer a copy of usenet, but, it's here
>> pretty much sorted out that I can administer one once then that it's to
>> administer itself in the following, in terms of it having resources to allocate
>> and resources to disburse. Also if nobody's using it it should basically work
>> itself out to dial its lights down (while maintaining availability).
>>
>> Then a point seems "maintain and administer the operation in effect,
>> what arrangement sees via delegation, that a card number and a phone
>> number and an email account and more than less a responsible entity,
>> is so indicated for example in cryptographic identity thus that the operation
>> of this system as a service, effectively operates itself out of a kitty,
>> what makes for administration and overhead, an entirely transparent
>> model of a miniature business the system as a service".
>>
>> "... and a mailing address and mail service."
>>
>> Then, for accounts and accounts, for example is the provision of the component
>> as simply an image in cloud algorithms, where basically as above here it's configured
>> that anybody with any cloud account could basically run it on their own terms,
>> there is for here sorting out "after this delegation to some business entity what
>> results a corporation in effect, the rest is business-in-a-box and more-than-less
>> what makes for its administration in state, is for how it basically limits and replicates
>> its service, in terms of its own assets here as what administered is abstractly
>> "durable forever mailboxes with private ownership if on public or managed resources".
>>
>> A usual notion of a private email and usenet service offering and business-in-a-box,
>> here what I'm looking at is that besides archiving sci.math and copying out its content
>> under author line, is to make such an industry for example here that "once having
>> implemented an Internet service, an Internet service of them results Internet".
>>
>> I.e. here the point is to make a corporation and a foundation in effect, what in terms
>> of then about the books and accounts, is about accounts for the business accounts
>> that reflect a persistent entity, then what results in terms of computing, networking,
>> and internetworking, with a regular notion of "let's never change this arrangement
>> but it's in monthly or annual terms", here for that in overall arrangements,
>> it results what the entire system more than less runs in ways then to either
>> run out its limits or make itself a sponsored effort, about more-or-less a simple
>> and responsible and accountable set of operations what effect the business
>> (here that in terms of service there is basically the realm of agreement)
>> that basically this sort of business-in-a-box model, is then besides itself of
>> accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
>>
>> Then for a news://usenet.science, or for example sci.math.usenet.science,
>> is the idea that the entity is "some assemblage what is so that in DNS, and,
>> in the accounts payable and receivable, and, in the material matters of
>> arrangement and authority for administration, of DNS and resources and
>> accounts what result durably persisting the business, is basically for a service
>> then of what these are usual enough tasks, as that are interactive workflows
>> and for mechanical workflows.
>>
>> I.e. the point is for having the service than an on/off button and more or less
>> what is for a given instance of the operation, what results from some protocol
>> that provides a "durable store" of a sort of the business, that at any time basically
>> some re-routine or "eventually consistent" continuance of the operation of the
>> business, results basically a continuity in its operations, what is entirely granular,
>> that here for example the point is to "pick a DNS name, attach an account service,
>> go" it so results that in the terms, basically there are the placeholders of the
>> interactive workflows in that, and as what in terms are often for example simply
>> card and phone number terms, account terms.
>>
>> I.e. a service to replenish accounts as kitties for making accounts only and
>> exactly limited to the one service, its transfers, basically results that there
>> is the notion of an email address, a phone number, a credit card's information,
>> here a fixed limit debit account that works as of a kitty, there is a regular workflow
>> service that will read out the durable stores and according to the timeliness of
>> their events, affect the configuration and reconciliation of payments for accounts
>> (closed loop scheduling/receiving).
>>
>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>
>> Basically for dailies, monthlies, and annuals, what make weeklies,
>> is this idea of Internet-from-a- account, what is services.
>
>
> After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> of the message, a form of a data structure of a "search index".
>
> These types files should naturally compose, and result a data structure that according to some normal
> forms of search and summary algorithms, result that a data structure results, that makes for efficient
> search of sections of the corpus for information retrieval, here that "information retrieval is the science
> of search algorithms".
>
> Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> sure/no/yes, with predicates in values.
>
> Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> a data structure, with attributes as paths the leaves of the tree of which match.
>
> Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> MIME body of this message has a default text representation".
>
> So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> like any/each/every/all, "hits".
>
> This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> or selections of these messages, or items, for various standard algorithms that separate "to find" from
> "to serve to find".
>
> So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> there is a normal form for each message its "catsum", that catums have a natural algebra that a
> concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> combine in serial and parallel.
>
> The results should be applicable to any kind of data but here it's more or less about usenet groups.
>

Click here to read the complete article

On 02/08/2024 01:04 PM, Ross Finlayson wrote:
> On 03/08/2023 08:51 PM, Ross Finlayson wrote:
>> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
>>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson
>>> wrote:
>>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson
>>>> wrote:
>>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse
>>>>> wrote:
>>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>>> usenet, not using Google group, via:
>>>>>>
>>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>>
>>>>>> So was looking for an alternative. And found this
>>>>>> alternative, which seems fine:
>>>>>>
>>>>>> news.solani.org
>>>>>>
>>>>>> Have Fun!
>>>>>>
>>>>>> P.S.: Technical spec of news.solani.org:
>>>>>>
>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>>
>>>>>> advantage of bare metal usenet,
>>>>>> you see all headers of message.
>>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>>> Search you mentioned and for example HTTP is adding the SEARCH verb,
>>>>> In traffic there are two kinds of usenet users,
>>>>> viewers and traffic through Google Groups,
>>>>> and, USENET. (USENET traffic.)
>>>>>
>>>>> Here now Google turned on login to view their
>>>>> Google Groups - effectively closing the Google Groups
>>>>> without a Google login.
>>>>>
>>>>> I suppose if they're used at work or whatever though
>>>>> they'd be open.
>>>>>
>>>>>
>>>>>
>>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>>> it scales up though then I think in the runtime is a situation where
>>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>>> or in sockets there is a drop, or it fell off the driver. I've
>>>>> implemented
>>>>> the code this far, what has all of NNTP in a file and then the
>>>>> "re-routine,
>>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>>
>>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>>
>>>>> Implementing the serial queue or "monohydra", or slique,
>>>>> makes for that then when the parser is constantly parsing,
>>>>> it seems a usual queue like data structure with parsing
>>>>> returning its bounds, consuming the queue.
>>>>>
>>>>> Having the file buffers all down small on 4K pages,
>>>>> has that a next usual page size is the megabyte.
>>>>>
>>>>> Here though it seems to make sense to have a natural
>>>>> 4K alignment the file system representation, then that
>>>>> it is moving files.
>>>>>
>>>>> So, then with the new modern Java, it that runs in its own
>>>>> Java server runtime environment, it seems I would also
>>>>> need to see whether the cloud virt supported the I/O model
>>>>> or not, or that the cooperative multi-threading for example
>>>>> would be single-threaded. (Blocking abstractly.)
>>>>>
>>>>> Then besides I suppose that could be neatly with basically
>>>>> the program model, and its file model, being well-defined,
>>>>> then for NNTP with IMAP organization search and extensions,
>>>>> those being standardized, seems to make sense for an efficient
>>>>> news file organization.
>>>>>
>>>>> Here then it seems for serving the NNTP, and for example
>>>>> their file bodies under the storage, with the fixed headers,
>>>>> variable header or XREF, and the message body, then under
>>>>> content it's same as storage.
>>>>>
>>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>>
>>>>> Let's see here then, if I get the load test running, or,
>>>>> just put a limit under the load while there are no load test
>>>>> errors, it seems the algorithm then scales under load to be
>>>>> making usually the algorithm serial in CPU, with: encryption,
>>>>> and compression (traffic). (Block ciphers instead of serial transfer.)
>>>>>
>>>>> Then, the industry pattern with re-routines, has that the
>>>>> re-routines are naturally co-operative in the blocking,
>>>>> and in the language, including flow-of-control and exception scope.
>>>>>
>>>>>
>>>>> So, I have a high-performance implementation here.
>>>> It seems like for NFS, then, and having the separate read and write
>>>> of the client,
>>>> a default filesystem, is an idea for the system facility: mirroring
>>>> the mounted file
>>>> locally, and, providing the read view from that via a different route.
>>>>
>>>>
>>>> A next idea then seems for the organization, the client views
>>>> themselves
>>>> organize over the durable and available file system representation,
>>>> this
>>>> provides anyone a view over the protocol with a group file convention.
>>>>
>>>> I.e., while usual continuous traffic was surfing, individual reads
>>>> over group
>>>> files could have independent views, for example collating contents.
>>>>
>>>> Then, extracting requests from traffic and threads seems usual.
>>>>
>>>> (For example a specialized object transfer view.)
>>>>
>>>> Making protocols for implementing internet protocols in groups and
>>>> so on, here makes for giving usenet example views to content generally.
>>>>
>>>> So, I have designed a protocol node and implemented it mostly,
>>>> then about designed an object transfer protocol, here the idea
>>>> is how to make it so people can extract data, for example their own
>>>> data, from a large durable store of all the usenet messages,
>>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>>> Greatest Hits".
>>>>
>>>> Here the point is to figure that usenet, these days, can be operated
>>>> in cooperation with usenet, and really for its own sake, for leaving
>>>> messages in usenet and here for usenet protocol stores as there's
>>>> no reason it's plain text the content, while the protocol supports it.
>>>>
>>>> Building personal view for example is a simple matter of very many
>>>> service providers any of which sells usenet all day for a good deal.
>>>>
>>>> Let's see here, $25/MM, storage on the cloud last year for about
>>>> a million messages for a month is about $25. Outbound traffic is
>>>> usually the metered cloud traffic, here for example that CDN traffic
>>>> support the universal share convention, under metering. What that
>>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI
>>>> I/O and
>>>> RAM, then that there is for seeking that Network Store or Database Time
>>>> instead effectively becomes File I/O time, as what may be faster,
>>>> and more durable. There's a faster database time for scaling the
>>>> ingestion
>>>> here with that the file view is eventually consistent. (And reliable.)
>>>>
>>>> Checking the files would be over time for example with "last checked"
>>>> and "last dropped" something along the lines of, finding wrong offsets,
>>>> basically having to make it so that it survives neatly corruption of
>>>> the
>>>> store (by being more-or-less stored in-place).
>>>>
>>>> Content catalog and such, catalog.
>>> Then I wonder and figure the re-routine can scale.
>>>
>>> Here for the re-routine, the industry factory pattern,
>>> and the commands in the protocols in the templates,
>>> and the memory module, with the algorithm interface,
>>> in the high-performance computer resource, it is here
>>> that this simple kind of "writing Internet software"
>>> makes pretty rapidly for adding resources.
>>>
>>> Here the design is basically of a file I/O abstraction,
>>> that the computer reads data files with mmap to get
>>> their handlers, what results that for I/O map the channels
>>> result transferring the channels in I/O for what results,
>>> in mostly the allocated resource requirements generally,
>>> and for the protocol and algorithm, it results then that
>>> the industry factory pattern and making for interfaces,
>>> then also here the I/O routine as what results that this
>>> is an implementation, of a network server, mostly is making
>>> for that the re-routine, results very neatly a model of
>>> parallel cooperation.
>>>
>>> I think computers still have file systems and file I/O but
>>> in abstraction just because PAGE_SIZE is still relevant for
>>> the network besides or I/O, if eventually, here is that the
>>> value types are in the commands and so on, it is besides
>>> that in terms of the resources so defined it still is in a filesystem
>>> convention that a remote and unreliable view of it suffices.
>>>
>>> Here then the source code also being "this is only 20-50k",
>>> lines of code, with basically an entire otherwise library stack
>>> of the runtime itself, only the network and file abstraction,
>>> this makes for also that modularity results. (Factory Industry
>>> Pattern Modules.)
>>>
>>> For a network server, here, that, mostly it is high performance
>>> in the sense that this is about the most direct handle on the channels
>>> and here mostly for the text layer in the I/O order, or protocol layer,
>>> here is that basically encryption and compression usually in the layer,
>>> there is besides a usual concern where encryption and compression
>>> are left out, there is that text in the layer itself is commands.
>>>
>>> Then, those being constants under the resources for the protocol,
>>> it's what results usual protocols like NNTP and HTTP and other protocols
>>> with usually one server and many clients, here is for that these
>>> protocols
>>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>>
>>> These are here defined "all Java" or "Pure Java", i.e. let's be clear
>>> that
>>> in terms of the reference abstraction layer, I think computers still use
>>> the non-blocking I/O and filesystems and network to RAM, so that as
>>> the I/O is implemented in those it actually has those besides instead
>>> for
>>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>>> semantics for servicing the I/O in the accepter routine and what makes
>>> for that the platform also provides a reference encryption
>>> implementation,
>>> if not so relevant for the block encoder chain, besides that for example
>>> compression has a default implementation, here the I/O model is as
>>> simply
>>> in store for handles, channels, ..., that it results that data
>>> especially delivered
>>> from a constant store can anyways be mostly compressed and encrypted
>>> already or predigested to serve, here that it's the convention, here
>>> is for
>>> resulting that these client-server protocols, with usually reads >
>>> postings
>>> then here besides "retention", basically here is for what it is.
>>>
>>> With the re-routine and the protocol layer besides, having written the
>>> routines in the re-routine, what there is to write here is this industry
>>> factory, or a module framework, implementing the re-routines, as they're
>>> built from the linear description a routine, makes for as the routine
>>> progresses
>>> that it's "in the language" and that more than less in the terms, it
>>> makes for
>>> implementing the case of logic for values, in the logic's
>>> flow-of-control's terms.
>>>
>>> Then, there is that actually running the software is different than just
>>> writing it, here in the sense that as a server runtime, it is to be
>>> made a
>>> thing, by giving it a name, and giving it an authority, to exist on
>>> the Internet.
>>>
>>> There is basically that for BGP and NAT and so on, and, mobile fabric
>>> networks,
>>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main
>>> space, with
>>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP,
>>> here
>>> entirely the course is using the TCP/IP stack and Java's TCP/IP
>>> stack, with
>>> respect to that TCP/IP is so provided or in terms of process what
>>> results
>>> ports mostly and connection models where it is exactly the TCP after
>>> the IP,
>>> the Transport Control Protocol and Internet Protocol, have here both
>>> this
>>> socket and datagram connection orientation, or stateful and stateless or
>>> here that in terms of routing it's defined in addresses, under that
>>> names
>>> and routing define sources, routes, destinations, ..., that routine
>>> numeric
>>> IP addresses result in the usual sense of the network being behind an IP
>>> and including IPv4 network fabric with respect to local routers.
>>>
>>> I.e., here to include a service framework is "here besides the
>>> routine, let's
>>> make it clear that in terms of being a durable resource, there needs
>>> to be
>>> some lockbox filled with its sustenance that in some locked or constant
>>> terms results that for the duration of its outlay, say five years, it
>>> is held
>>> up, then, it will be so again, or, let down to result the carry-over
>>> that it
>>> invested to archive itself, I won't have to care or do anything until
>>> then".
>>>
>>>
>>> About the service activation and the idea that, for a port, the
>>> routine itself
>>> needs only run under load, i.e. there is effectively little traffic
>>> on the old archives,
>>> and usually only the some other archive needs any traffic. Here the
>>> point is
>>> that for the Java routine there is the system port that was accepted
>>> for the
>>> request, that inetd or the systemd or means the network service was
>>> accessed,
>>> made for that much as for HTTP the protocol is client-server also for
>>> IP the
>>> protocol is client-server, while the TCP is packets. This is a
>>> general idea for
>>> system integration while here mostly the routine is that being a detail:
>>> the filesystem or network resource that results that the re-routines
>>> basically
>>> make very large CPU scaling.
>>>
>>> Then, it is basically containerized this sense of "at some domain
>>> name, there
>>> is a service, it's HTTP and NNTP and IMAP besides, what cares the
>>> world".
>>>
>>> I.e. being built on connection oriented protocols like the socket layer,
>>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
>>> certificates,
>>> it's more than less sensible that most users have no idea of
>>> installing some
>>> NNTP browser or pointing their email to IMAP so that the email browser
>>> browses the newsgroups and for postings, here this is mostly only talk
>>> about implementing NNTP then IMAP and HTTP that happens to look like
>>> that,
>>> besides for example SMTP or NNTP posting.
>>>
>>> I.e., having "this IMAP server, happens to be this NNTP module", or
>>> "this HTTP server, happens to be a real simple mailbox these groups",
>>> makes for having partitions and retentions of those and that basically
>>> NNTP messages in the protocol can be more or less the same content
>>> in media, what otherwise is of a usual message type.
>>>
>>> Then, the NNTP server-server routine is the progation of messages
>>> besides "I shall hire ten great usenet retention accounts and gently
>>> and politely draw them down and back-fill Usenet, these ten groups".
>>>
>>> By then I would have to have made for retention in storage, such
>>> contents,
>>> as have a reference value, then for besides making that independent in
>>> reference value, just so that it suffices that it basically results
>>> "a usable
>>> durable filesystem that happens you can browse it like usenet". I.e. as
>>> the pieces to make the backfill are dug up, they get assigned
>>> reference numbers
>>> of their time to make for what here is that in a grand schema of things,
>>> they have a reference number in numerical order (and what's also the
>>> server's "message-number" besides its "message-id") as noted above this
>>> gets into the storage for retention of a file, while, most services
>>> for this
>>> are instead for storage and serving, not necessarily or at all
>>> retention.
>>>
>>> I.e., the point is that as the groups are retained from retention,
>>> there is an
>>> approach what makes for an orderly archeology, as for what convention
>>> some data arrives, here that this server-server routine is besides
>>> the usual
>>> routine which is "here are new posts, propagate them", it's "please
>>> deliver
>>> as of a retention scan, and I'll try not to repeat it, what results
>>> as orderly
>>> as possible a proof or exercise of what we'll call afterward entire
>>> retention",
>>> then will be for as of writing a file that "as of the date, from
>>> start to finish,
>>> this site certified these messages as best-effort retention".
>>>
>>> It seems then besides there is basically "here is some mbox file,
>>> serve it
>>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of
>>> that
>>> what is ingestion, is to result for the protocol that "for this
>>> protocol,
>>> there is actually a normative filesystem representation that happens to
>>> be pretty much also altogether definede by the protocol", the point is
>>> that ingestion would result in command to remain in the protocol,
>>> that a usual file type that "presents a usual abstraction, of a
>>> filesystem,
>>> as from the contents of a file", here with the notion of "for all these
>>> threaded discussions, here this system only cares some approach to
>>> these ten particular newgroups that already have mostly their corpus
>>> though it's not in perhaps their native mbox instead consulted from
>>> services".
>>>
>>> Then, there's for storing and serving the files, and there is the usual
>>> notion that moving the data, is to result, that really these file
>>> organizations
>>> are not so large in terms of resources, being "less than gigabytes"
>>> or so,
>>> still there's a notion that as a durable resource they're to be made
>>> fungible here the networked file approach in the native filesystem,
>>> then that with respect to it's a backing store, it's to make for that
>>> the entire enterprise is more or less to made in terms of account,
>>> that then as a facility on the network then a service in the network,
>>> it's basically separated the facility and service, while still of course
>>> that the service is basically defined by its corpus.
>>>
>>>
>>> Then, to make that fungible in a world of account, while with an exit
>>> strategy so that the operation isn't not abstract, is mostly about the
>>> domain name, then that what results the networking, after trusted
>>> network naming and connections for what result routing, and then
>>> the port, in terms of that there are usual firewalls in ports though
>>> that
>>> besides usually enough client ports are ephemeral, here the point is
>>> that the protocols and their well-known ports, here it's usually enough
>>> that the Internet doesn't concern itself so much protocols but with
>>> respect to proxies, here that for example NNTP and IMAP don't have
>>> so much anything so related that way after startTLS. For the world of
>>> account, is basically to have for a domain name, an administrator, and,
>>> an owner or representative. These are to establish authority for changes
>>> and also accountability for usage.
>>>
>>> Basically they're to be persons and there is a process to get to be an
>>> administrator of DNS, most always there are services that a usual person
>>> implementing the system might use, besides for example the numerical.
>>>
>>> More relevant though to DNS is getting servers on the network, with
>>> respect
>>> to listening ports and that they connect to clients what so discover
>>> them as
>>> via DNS or configuration, here as above the usual notion that these are
>>> standard services and run on well-known ports for inetd or systemd.
>>> I.e. there is basically that running a server and dedicated networking,
>>> and power and so on, and some notion of the limits of reliability, is
>>> then
>>> as very much in other aspects of the organization of the system, i.e.
>>> its name,
>>> while at the same time, the point that a module makes for that basically
>>> the provision of a domain name or well-known or ephemeral host, is the
>>> usual notion that static IP addresses are a limited resource and as
>>> about
>>> the various networks in IPv4 and how they route traffic, is for that
>>> these
>>> services have well-known sections in DNS for at least that the most
>>> usual
>>> configuration is none.
>>>
>>> For a usual global reliability and availability, is some notion
>>> basically that
>>> each region and zone has a service available on the IP address, for that
>>> "hostname" resolves to the IP addresses. As well, in reverse, for the IP
>>> address and about the hostname, it should resolve reverse to hostname.
>>>
>>> About certificates mostly for identification after mapping to port, or
>>> multi-home Internet routing, here is the point that whether the domain
>>> name administration is "epochal" or "regular", is that epochs are
>>> defined
>>> by the ports behind the numbers and the domain name system as well,
>>> where in terms of the registrar, the domain names are epochal to the
>>> registrar, with respect to owners of domain names.
>>>
>>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>>> and also BGP and NAT and routing and what are local and remote
>>> addresses, here is for not-so-much "implement DNS the protocol
>>> also while you're at it", rather for what results that there is a
>>> durable
>>> and long-standing and proper doorman, for some usenet.science.
>>>
>>> Here then the notion seems to be whether the doorman basically
>>> knows well-known services, is a multi-homing router, or otherwise
>>> what is the point that it starts the lean runtime, with respect to that
>>> it's a container and having enough sense of administration its operation
>>> as contained. I.e. here given a port and a hostname and always running
>>> makes for that as long as there is the low (preferable no) idle for
>>> services
>>> running that have no clients, is here also for the cheapest doorman that
>>> knows how to standup the client sentinel. (And put it back away.)
>>>
>>> Probably the most awful thing in the cloud services is the cost for
>>> data ingress and egress. What that means is that for example using
>>> a facility that is bound by that as a cost instead of under some
>>> constant
>>> cost, is basically why there is the approach that the containers needs a
>>> handle to the files, and they're either local files or network files,
>>> here
>>> with the some convention above in archival a shared consistent view
>>> of all the files, or abstractly consistent, is for making that the
>>> doorman
>>> can handle lots of starting and finishing connections, while it is
>>> out of
>>> the way when usually it's client traffic and opening and closing
>>> connections,
>>> and the usual abstraction is that the client sentinel is never off
>>> and doorman
>>> does nothing, here is for attaching the one to some lower constant cost,
>>> where for example any long-running cost is more than some low
>>> constant cost.
>>>
>>> Then, this kind of service is often represented by nodes, in the
>>> usual sense
>>> "here is an abstract container with you hope some native performance
>>> under
>>> the hypervisor where it lives on the farm on its rack, it basically
>>> is moved the
>>> image to wherever it's requested from and lives there, have fun, the
>>> meter is on".
>>> I.e. that's just "this Jar has some config conventions and you can
>>> make the
>>> container associate it and watchdog it with systemd for example and
>>> use the
>>> cgroups while you're at it and make for tempfs quota and also the
>>> best network
>>> file share, which you might be welcome to cache if you care just in
>>> the off-chance
>>> that this file-mapping is free or constant cost as long as it doesn't
>>> egress the
>>> network", is for here about the facilities that work, to get a copy
>>> of the system
>>> what with respect to its usual operation is a piece of the Internet.
>>>
>>> For the different reference modules (industry factories) in their
>>> patterns then
>>> and under combined configuration "file + process + network + fare",
>>> is that
>>> the fare of the service basically reflects a daily coin, in the sense
>>> that it
>>> represents an annual or epochal fee, what results for the time there is
>>> what is otherwise all defined the "file + process + network + name",
>>> what results it perpetuates in operation more than less simply and
>>> automatically.
>>>
>>> Then, the point though is to get it to where "I can go to this
>>> service, and
>>> administer it more or less by paying an account, that it thus lives
>>> in its
>>> budget and quota in its metered world".
>>>
>>> That though is very involved with identity, that in terms of "I the
>>> account
>>> as provided this sum make this sum paid with respect to an agreement",
>>> is that authority to make agreements must make that it results that the
>>> operation of the system, is entirely transparent, and defined in
>>> terms of
>>> the roles and delegation, conventions in operation.
>>>
>>> I.e., I personally don't want to administer a copy of usenet, but,
>>> it's here
>>> pretty much sorted out that I can administer one once then that it's to
>>> administer itself in the following, in terms of it having resources
>>> to allocate
>>> and resources to disburse. Also if nobody's using it it should
>>> basically work
>>> itself out to dial its lights down (while maintaining availability).
>>>
>>> Then a point seems "maintain and administer the operation in effect,
>>> what arrangement sees via delegation, that a card number and a phone
>>> number and an email account and more than less a responsible entity,
>>> is so indicated for example in cryptographic identity thus that the
>>> operation
>>> of this system as a service, effectively operates itself out of a kitty,
>>> what makes for administration and overhead, an entirely transparent
>>> model of a miniature business the system as a service".
>>>
>>> "... and a mailing address and mail service."
>>>
>>> Then, for accounts and accounts, for example is the provision of the
>>> component
>>> as simply an image in cloud algorithms, where basically as above here
>>> it's configured
>>> that anybody with any cloud account could basically run it on their
>>> own terms,
>>> there is for here sorting out "after this delegation to some business
>>> entity what
>>> results a corporation in effect, the rest is business-in-a-box and
>>> more-than-less
>>> what makes for its administration in state, is for how it basically
>>> limits and replicates
>>> its service, in terms of its own assets here as what administered is
>>> abstractly
>>> "durable forever mailboxes with private ownership if on public or
>>> managed resources".
>>>
>>> A usual notion of a private email and usenet service offering and
>>> business-in-a-box,
>>> here what I'm looking at is that besides archiving sci.math and
>>> copying out its content
>>> under author line, is to make such an industry for example here that
>>> "once having
>>> implemented an Internet service, an Internet service of them results
>>> Internet".
>>>
>>> I.e. here the point is to make a corporation and a foundation in
>>> effect, what in terms
>>> of then about the books and accounts, is about accounts for the
>>> business accounts
>>> that reflect a persistent entity, then what results in terms of
>>> computing, networking,
>>> and internetworking, with a regular notion of "let's never change
>>> this arrangement
>>> but it's in monthly or annual terms", here for that in overall
>>> arrangements,
>>> it results what the entire system more than less runs in ways then to
>>> either
>>> run out its limits or make itself a sponsored effort, about
>>> more-or-less a simple
>>> and responsible and accountable set of operations what effect the
>>> business
>>> (here that in terms of service there is basically the realm of
>>> agreement)
>>> that basically this sort of business-in-a-box model, is then besides
>>> itself of
>>> accounts, toward the notion as pay-as-you-go and "usual credits and
>>> their limits".
>>>
>>> Then for a news://usenet.science, or for example
>>> sci.math.usenet.science,
>>> is the idea that the entity is "some assemblage what is so that in
>>> DNS, and,
>>> in the accounts payable and receivable, and, in the material matters of
>>> arrangement and authority for administration, of DNS and resources and
>>> accounts what result durably persisting the business, is basically
>>> for a service
>>> then of what these are usual enough tasks, as that are interactive
>>> workflows
>>> and for mechanical workflows.
>>>
>>> I.e. the point is for having the service than an on/off button and
>>> more or less
>>> what is for a given instance of the operation, what results from some
>>> protocol
>>> that provides a "durable store" of a sort of the business, that at
>>> any time basically
>>> some re-routine or "eventually consistent" continuance of the
>>> operation of the
>>> business, results basically a continuity in its operations, what is
>>> entirely granular,
>>> that here for example the point is to "pick a DNS name, attach an
>>> account service,
>>> go" it so results that in the terms, basically there are the
>>> placeholders of the
>>> interactive workflows in that, and as what in terms are often for
>>> example simply
>>> card and phone number terms, account terms.
>>>
>>> I.e. a service to replenish accounts as kitties for making accounts
>>> only and
>>> exactly limited to the one service, its transfers, basically results
>>> that there
>>> is the notion of an email address, a phone number, a credit card's
>>> information,
>>> here a fixed limit debit account that works as of a kitty, there is a
>>> regular workflow
>>> service that will read out the durable stores and according to the
>>> timeliness of
>>> their events, affect the configuration and reconciliation of payments
>>> for accounts
>>> (closed loop scheduling/receiving).
>>>
>>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>>
>>> Basically for dailies, monthlies, and annuals, what make weeklies,
>>> is this idea of Internet-from-a- account, what is services.
>>
>>
>> After implementing a store, and the protocol for getting messages,
>> then what seems relevant here in the
>> context of the SEARCH command, is a fungible file-format, that is
>> derived from the body of the message
>> in a normal form, that is a data structure that represents an index
>> and catalog and dictionary and summary
>> of the message, a form of a data structure of a "search index".
>>
>> These types files should naturally compose, and result a data
>> structure that according to some normal
>> forms of search and summary algorithms, result that a data structure
>> results, that makes for efficient
>> search of sections of the corpus for information retrieval, here that
>> "information retrieval is the science
>> of search algorithms".
>>
>> Now, for what and how people search, or what is the specification of a
>> search, is in terms of queries, say,
>> here for some brief forms of queries that advise what's definitely
>> included in the search, what's excluded,
>> then perhaps what's maybe included, or yes/no/maybe, which makes for a
>> predicate that can be built,
>> that can be applied to results that compose and build for the terms of
>> a filter with yes/no/maybe or
>> sure/no/yes, with predicates in values.
>>
>> Here there is basically "free text search" and "matching summaries",
>> where text is the text and summary is
>> a data structure, with attributes as paths the leaves of the tree of
>> which match.
>>
>> Then, the message has text, its body, and and headers, key-value pairs
>> or collections thereof, where as well
>> there are default summaries like "a histogram of words by occurrence"
>> or for example default text like "the
>> MIME body of this message has a default text representation".
>>
>> So, the idea developing here is to define what are "normal" forms of
>> data structures that have some "normal"
>> forms of encoding that result that these "normalizing" after
>> "normative" data structures define well-behaved
>> algorithms upon them, which provide well-defined bounds in resources
>> that return some quantification of results,
>> like any/each/every/all, "hits".
>>
>> This is where usually enough search engines' or collected search
>> algorithms ("find") usually enough have these
>> de-facto forms, "under the hood", as it were, to make it first-class
>> that for a given message and body that
>> there is a normal form of a "catalog summary index" which can be
>> compiled to a constant when the message
>> is ingested, that then basically any filestore of these messages has
>> alongside it the filestore of the "catsums"
>> or as on-demand, then that any algorithm has at least well-defined
>> behavior under partitions or collections
>> or selections of these messages, or items, for various standard
>> algorithms that separate "to find" from
>> "to serve to find".
>>
>> So, ..., what I'm wondering are what would be sufficient normal forms
>> in brief that result that there are
>> defined for a given corpus of messages, basically at the granularity
>> of messages, how is defined how
>> there is a normal form for each message its "catsum", that catums have
>> a natural algebra that a
>> concatenation of catums is a catsum and that some standard algorithms
>> naturally have well-defined
>> results on their predicates and quantifiers of matching, in serial and
>> parallel, and that the results
>> combine in serial and parallel.
>>
>> The results should be applicable to any kind of data but here it's
>> more or less about usenet groups.
>>
>
>
> So, if you know all about old-fashioned
> Internet protocols like DNS, then NNTP,
> IMAP, SMTP, HTTP, and so on, then where
> it's at is figuring out these various sorts
> conventions then to result a sort-of, the
> sensible, fungible, and tractable, conventions
> of the data structures and algorithms, in
> the protocols, what result keeping things
> simple and standing up a usual Internet
> messaging agentry.
>
>
> BFF: backing-file formats, "Best friends forever"
>
> Message files
> Group files
>
> Thread link files
> Date link files
>
> SFF: search-file formats, "partially digested metadata"
>
>
>
> NOOBNB: Noob Nota Bene: Cur/Pur/Raw
>
> Load Roll/Fold/Shed/Hold: throughput/offput
>
>
>
> Then, the idea is to make it so that by constructing
> the files or a logical/physical sort of distinction,
> that then results a neat tape archive then that
> those can just be laid down together and result
> a corpus, or filtered on down and result a corpus,
> where the existence standard is sort of called "mailbox"
> or "mbox" format, with the idea muchly of
> "converting mbox to BFF".
>
>
> Then, for enabling search, basically the idea or a
> design principle of the FF is that they're concatenable
> or just overlaid and all write-once-read-many, then
> with regards to things like merges, which also should
> result as some sort of algorithm in tools, what results,
> that of course usual sorts tools like textutils, working
> on these files, would make it so that usual extant tools,
> are native on the files.
>
> So for metadata, the idea is that there are standard
> metadata attributes like the closed categories of
> headers and so on, where the primary attributes sort
> of look like
>
> message-id
> author
>
> delivery-path
> delivery-metadata (account, GUID, ...)
>
> destinations
>
> subject
> size
> content
>
> hash-raw-id <- after message-id
> hash-invariant-id <- after removing inconstants
> hash-uncoded-id <- after uncoding out to full
>
> Because messages are supposed to be unique,
> there's an idea to sort of detect differences.
>
>
> The idea is to sort of implement NNTP's OVERVIEW
> and WILDMAT, then there's IMAP, figuring that the
> first goals of SFF is to implement the normative
> commands, then with regards to implementations,
> basically working up for HTTP SEARCH, a sort of
> normative representation of messages, groups,
> threads, and so on, sort of what results a neat sort
> standard system for all sorts purposes these, "posts".
>
>
> Anybody know any "normative RFC email's in HTTP"?
> Here the idea is basically that a naive server
> simply gets pointed at BFF files for message-id
> and loads any message there as an HTTP representation,
> with regards to HTTP, HTML, and so on, about these
> sorts "sensible, fungible, tractable" conventions.
>
>
> It's been a while since I studied the standards,
> so I'm looking to get back tapping at the C10K server
> here, basically with hi-po full throughput then with
> regards to the sentinel/doorman bit (Load R/F/S/H).
>
> So, I'll be looking for "partially digested and
> composable search metadata formats" and "informative
> and normative standards-based message and content".
>
> They already have one of those, it's called "Internet".
>
>

Click here to read the complete article

On 02/09/2024 10:37 PM, Ross Finlayson wrote:
> On 02/08/2024 01:04 PM, Ross Finlayson wrote:
>> On 03/08/2023 08:51 PM, Ross Finlayson wrote:
>>> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson
>>> wrote:
>>>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson
>>>> wrote:
>>>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson
>>>>> wrote:
>>>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse
>>>>>> wrote:
>>>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>>>> usenet, not using Google group, via:
>>>>>>>
>>>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>>>
>>>>>>> So was looking for an alternative. And found this
>>>>>>> alternative, which seems fine:
>>>>>>>
>>>>>>> news.solani.org
>>>>>>>
>>>>>>> Have Fun!
>>>>>>>
>>>>>>> P.S.: Technical spec of news.solani.org:
>>>>>>>
>>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>>>
>>>>>>> advantage of bare metal usenet,
>>>>>>> you see all headers of message.
>>>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>>>> Search you mentioned and for example HTTP is adding the SEARCH
>>>>>>>> verb,
>>>>>> In traffic there are two kinds of usenet users,
>>>>>> viewers and traffic through Google Groups,
>>>>>> and, USENET. (USENET traffic.)
>>>>>>
>>>>>> Here now Google turned on login to view their
>>>>>> Google Groups - effectively closing the Google Groups
>>>>>> without a Google login.
>>>>>>
>>>>>> I suppose if they're used at work or whatever though
>>>>>> they'd be open.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>>>> it scales up though then I think in the runtime is a situation where
>>>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>>>> or in sockets there is a drop, or it fell off the driver. I've
>>>>>> implemented
>>>>>> the code this far, what has all of NNTP in a file and then the
>>>>>> "re-routine,
>>>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>>>
>>>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>>>
>>>>>> Implementing the serial queue or "monohydra", or slique,
>>>>>> makes for that then when the parser is constantly parsing,
>>>>>> it seems a usual queue like data structure with parsing
>>>>>> returning its bounds, consuming the queue.
>>>>>>
>>>>>> Having the file buffers all down small on 4K pages,
>>>>>> has that a next usual page size is the megabyte.
>>>>>>
>>>>>> Here though it seems to make sense to have a natural
>>>>>> 4K alignment the file system representation, then that
>>>>>> it is moving files.
>>>>>>
>>>>>> So, then with the new modern Java, it that runs in its own
>>>>>> Java server runtime environment, it seems I would also
>>>>>> need to see whether the cloud virt supported the I/O model
>>>>>> or not, or that the cooperative multi-threading for example
>>>>>> would be single-threaded. (Blocking abstractly.)
>>>>>>
>>>>>> Then besides I suppose that could be neatly with basically
>>>>>> the program model, and its file model, being well-defined,
>>>>>> then for NNTP with IMAP organization search and extensions,
>>>>>> those being standardized, seems to make sense for an efficient
>>>>>> news file organization.
>>>>>>
>>>>>> Here then it seems for serving the NNTP, and for example
>>>>>> their file bodies under the storage, with the fixed headers,
>>>>>> variable header or XREF, and the message body, then under
>>>>>> content it's same as storage.
>>>>>>
>>>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>>>
>>>>>> Let's see here then, if I get the load test running, or,
>>>>>> just put a limit under the load while there are no load test
>>>>>> errors, it seems the algorithm then scales under load to be
>>>>>> making usually the algorithm serial in CPU, with: encryption,
>>>>>> and compression (traffic). (Block ciphers instead of serial
>>>>>> transfer.)
>>>>>>
>>>>>> Then, the industry pattern with re-routines, has that the
>>>>>> re-routines are naturally co-operative in the blocking,
>>>>>> and in the language, including flow-of-control and exception scope.
>>>>>>
>>>>>>
>>>>>> So, I have a high-performance implementation here.
>>>>> It seems like for NFS, then, and having the separate read and write
>>>>> of the client,
>>>>> a default filesystem, is an idea for the system facility: mirroring
>>>>> the mounted file
>>>>> locally, and, providing the read view from that via a different route.
>>>>>
>>>>>
>>>>> A next idea then seems for the organization, the client views
>>>>> themselves
>>>>> organize over the durable and available file system representation,
>>>>> this
>>>>> provides anyone a view over the protocol with a group file convention.
>>>>>
>>>>> I.e., while usual continuous traffic was surfing, individual reads
>>>>> over group
>>>>> files could have independent views, for example collating contents.
>>>>>
>>>>> Then, extracting requests from traffic and threads seems usual.
>>>>>
>>>>> (For example a specialized object transfer view.)
>>>>>
>>>>> Making protocols for implementing internet protocols in groups and
>>>>> so on, here makes for giving usenet example views to content
>>>>> generally.
>>>>>
>>>>> So, I have designed a protocol node and implemented it mostly,
>>>>> then about designed an object transfer protocol, here the idea
>>>>> is how to make it so people can extract data, for example their own
>>>>> data, from a large durable store of all the usenet messages,
>>>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>>>> Greatest Hits".
>>>>>
>>>>> Here the point is to figure that usenet, these days, can be operated
>>>>> in cooperation with usenet, and really for its own sake, for leaving
>>>>> messages in usenet and here for usenet protocol stores as there's
>>>>> no reason it's plain text the content, while the protocol supports it.
>>>>>
>>>>> Building personal view for example is a simple matter of very many
>>>>> service providers any of which sells usenet all day for a good deal.
>>>>>
>>>>> Let's see here, $25/MM, storage on the cloud last year for about
>>>>> a million messages for a month is about $25. Outbound traffic is
>>>>> usually the metered cloud traffic, here for example that CDN traffic
>>>>> support the universal share convention, under metering. What that
>>>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI
>>>>> I/O and
>>>>> RAM, then that there is for seeking that Network Store or Database
>>>>> Time
>>>>> instead effectively becomes File I/O time, as what may be faster,
>>>>> and more durable. There's a faster database time for scaling the
>>>>> ingestion
>>>>> here with that the file view is eventually consistent. (And reliable.)
>>>>>
>>>>> Checking the files would be over time for example with "last checked"
>>>>> and "last dropped" something along the lines of, finding wrong
>>>>> offsets,
>>>>> basically having to make it so that it survives neatly corruption of
>>>>> the
>>>>> store (by being more-or-less stored in-place).
>>>>>
>>>>> Content catalog and such, catalog.
>>>> Then I wonder and figure the re-routine can scale.
>>>>
>>>> Here for the re-routine, the industry factory pattern,
>>>> and the commands in the protocols in the templates,
>>>> and the memory module, with the algorithm interface,
>>>> in the high-performance computer resource, it is here
>>>> that this simple kind of "writing Internet software"
>>>> makes pretty rapidly for adding resources.
>>>>
>>>> Here the design is basically of a file I/O abstraction,
>>>> that the computer reads data files with mmap to get
>>>> their handlers, what results that for I/O map the channels
>>>> result transferring the channels in I/O for what results,
>>>> in mostly the allocated resource requirements generally,
>>>> and for the protocol and algorithm, it results then that
>>>> the industry factory pattern and making for interfaces,
>>>> then also here the I/O routine as what results that this
>>>> is an implementation, of a network server, mostly is making
>>>> for that the re-routine, results very neatly a model of
>>>> parallel cooperation.
>>>>
>>>> I think computers still have file systems and file I/O but
>>>> in abstraction just because PAGE_SIZE is still relevant for
>>>> the network besides or I/O, if eventually, here is that the
>>>> value types are in the commands and so on, it is besides
>>>> that in terms of the resources so defined it still is in a filesystem
>>>> convention that a remote and unreliable view of it suffices.
>>>>
>>>> Here then the source code also being "this is only 20-50k",
>>>> lines of code, with basically an entire otherwise library stack
>>>> of the runtime itself, only the network and file abstraction,
>>>> this makes for also that modularity results. (Factory Industry
>>>> Pattern Modules.)
>>>>
>>>> For a network server, here, that, mostly it is high performance
>>>> in the sense that this is about the most direct handle on the channels
>>>> and here mostly for the text layer in the I/O order, or protocol layer,
>>>> here is that basically encryption and compression usually in the layer,
>>>> there is besides a usual concern where encryption and compression
>>>> are left out, there is that text in the layer itself is commands.
>>>>
>>>> Then, those being constants under the resources for the protocol,
>>>> it's what results usual protocols like NNTP and HTTP and other
>>>> protocols
>>>> with usually one server and many clients, here is for that these
>>>> protocols
>>>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>>>
>>>> These are here defined "all Java" or "Pure Java", i.e. let's be clear
>>>> that
>>>> in terms of the reference abstraction layer, I think computers still
>>>> use
>>>> the non-blocking I/O and filesystems and network to RAM, so that as
>>>> the I/O is implemented in those it actually has those besides instead
>>>> for
>>>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>>>> semantics for servicing the I/O in the accepter routine and what makes
>>>> for that the platform also provides a reference encryption
>>>> implementation,
>>>> if not so relevant for the block encoder chain, besides that for
>>>> example
>>>> compression has a default implementation, here the I/O model is as
>>>> simply
>>>> in store for handles, channels, ..., that it results that data
>>>> especially delivered
>>>> from a constant store can anyways be mostly compressed and encrypted
>>>> already or predigested to serve, here that it's the convention, here
>>>> is for
>>>> resulting that these client-server protocols, with usually reads >
>>>> postings
>>>> then here besides "retention", basically here is for what it is.
>>>>
>>>> With the re-routine and the protocol layer besides, having written the
>>>> routines in the re-routine, what there is to write here is this
>>>> industry
>>>> factory, or a module framework, implementing the re-routines, as
>>>> they're
>>>> built from the linear description a routine, makes for as the routine
>>>> progresses
>>>> that it's "in the language" and that more than less in the terms, it
>>>> makes for
>>>> implementing the case of logic for values, in the logic's
>>>> flow-of-control's terms.
>>>>
>>>> Then, there is that actually running the software is different than
>>>> just
>>>> writing it, here in the sense that as a server runtime, it is to be
>>>> made a
>>>> thing, by giving it a name, and giving it an authority, to exist on
>>>> the Internet.
>>>>
>>>> There is basically that for BGP and NAT and so on, and, mobile fabric
>>>> networks,
>>>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main
>>>> space, with
>>>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP,
>>>> here
>>>> entirely the course is using the TCP/IP stack and Java's TCP/IP
>>>> stack, with
>>>> respect to that TCP/IP is so provided or in terms of process what
>>>> results
>>>> ports mostly and connection models where it is exactly the TCP after
>>>> the IP,
>>>> the Transport Control Protocol and Internet Protocol, have here both
>>>> this
>>>> socket and datagram connection orientation, or stateful and
>>>> stateless or
>>>> here that in terms of routing it's defined in addresses, under that
>>>> names
>>>> and routing define sources, routes, destinations, ..., that routine
>>>> numeric
>>>> IP addresses result in the usual sense of the network being behind
>>>> an IP
>>>> and including IPv4 network fabric with respect to local routers.
>>>>
>>>> I.e., here to include a service framework is "here besides the
>>>> routine, let's
>>>> make it clear that in terms of being a durable resource, there needs
>>>> to be
>>>> some lockbox filled with its sustenance that in some locked or constant
>>>> terms results that for the duration of its outlay, say five years, it
>>>> is held
>>>> up, then, it will be so again, or, let down to result the carry-over
>>>> that it
>>>> invested to archive itself, I won't have to care or do anything until
>>>> then".
>>>>
>>>>
>>>> About the service activation and the idea that, for a port, the
>>>> routine itself
>>>> needs only run under load, i.e. there is effectively little traffic
>>>> on the old archives,
>>>> and usually only the some other archive needs any traffic. Here the
>>>> point is
>>>> that for the Java routine there is the system port that was accepted
>>>> for the
>>>> request, that inetd or the systemd or means the network service was
>>>> accessed,
>>>> made for that much as for HTTP the protocol is client-server also for
>>>> IP the
>>>> protocol is client-server, while the TCP is packets. This is a
>>>> general idea for
>>>> system integration while here mostly the routine is that being a
>>>> detail:
>>>> the filesystem or network resource that results that the re-routines
>>>> basically
>>>> make very large CPU scaling.
>>>>
>>>> Then, it is basically containerized this sense of "at some domain
>>>> name, there
>>>> is a service, it's HTTP and NNTP and IMAP besides, what cares the
>>>> world".
>>>>
>>>> I.e. being built on connection oriented protocols like the socket
>>>> layer,
>>>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
>>>> certificates,
>>>> it's more than less sensible that most users have no idea of
>>>> installing some
>>>> NNTP browser or pointing their email to IMAP so that the email browser
>>>> browses the newsgroups and for postings, here this is mostly only talk
>>>> about implementing NNTP then IMAP and HTTP that happens to look like
>>>> that,
>>>> besides for example SMTP or NNTP posting.
>>>>
>>>> I.e., having "this IMAP server, happens to be this NNTP module", or
>>>> "this HTTP server, happens to be a real simple mailbox these groups",
>>>> makes for having partitions and retentions of those and that basically
>>>> NNTP messages in the protocol can be more or less the same content
>>>> in media, what otherwise is of a usual message type.
>>>>
>>>> Then, the NNTP server-server routine is the progation of messages
>>>> besides "I shall hire ten great usenet retention accounts and gently
>>>> and politely draw them down and back-fill Usenet, these ten groups".
>>>>
>>>> By then I would have to have made for retention in storage, such
>>>> contents,
>>>> as have a reference value, then for besides making that independent in
>>>> reference value, just so that it suffices that it basically results
>>>> "a usable
>>>> durable filesystem that happens you can browse it like usenet". I.e. as
>>>> the pieces to make the backfill are dug up, they get assigned
>>>> reference numbers
>>>> of their time to make for what here is that in a grand schema of
>>>> things,
>>>> they have a reference number in numerical order (and what's also the
>>>> server's "message-number" besides its "message-id") as noted above this
>>>> gets into the storage for retention of a file, while, most services
>>>> for this
>>>> are instead for storage and serving, not necessarily or at all
>>>> retention.
>>>>
>>>> I.e., the point is that as the groups are retained from retention,
>>>> there is an
>>>> approach what makes for an orderly archeology, as for what convention
>>>> some data arrives, here that this server-server routine is besides
>>>> the usual
>>>> routine which is "here are new posts, propagate them", it's "please
>>>> deliver
>>>> as of a retention scan, and I'll try not to repeat it, what results
>>>> as orderly
>>>> as possible a proof or exercise of what we'll call afterward entire
>>>> retention",
>>>> then will be for as of writing a file that "as of the date, from
>>>> start to finish,
>>>> this site certified these messages as best-effort retention".
>>>>
>>>> It seems then besides there is basically "here is some mbox file,
>>>> serve it
>>>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of
>>>> that
>>>> what is ingestion, is to result for the protocol that "for this
>>>> protocol,
>>>> there is actually a normative filesystem representation that happens to
>>>> be pretty much also altogether definede by the protocol", the point is
>>>> that ingestion would result in command to remain in the protocol,
>>>> that a usual file type that "presents a usual abstraction, of a
>>>> filesystem,
>>>> as from the contents of a file", here with the notion of "for all these
>>>> threaded discussions, here this system only cares some approach to
>>>> these ten particular newgroups that already have mostly their corpus
>>>> though it's not in perhaps their native mbox instead consulted from
>>>> services".
>>>>
>>>> Then, there's for storing and serving the files, and there is the usual
>>>> notion that moving the data, is to result, that really these file
>>>> organizations
>>>> are not so large in terms of resources, being "less than gigabytes"
>>>> or so,
>>>> still there's a notion that as a durable resource they're to be made
>>>> fungible here the networked file approach in the native filesystem,
>>>> then that with respect to it's a backing store, it's to make for that
>>>> the entire enterprise is more or less to made in terms of account,
>>>> that then as a facility on the network then a service in the network,
>>>> it's basically separated the facility and service, while still of
>>>> course
>>>> that the service is basically defined by its corpus.
>>>>
>>>>
>>>> Then, to make that fungible in a world of account, while with an exit
>>>> strategy so that the operation isn't not abstract, is mostly about the
>>>> domain name, then that what results the networking, after trusted
>>>> network naming and connections for what result routing, and then
>>>> the port, in terms of that there are usual firewalls in ports though
>>>> that
>>>> besides usually enough client ports are ephemeral, here the point is
>>>> that the protocols and their well-known ports, here it's usually enough
>>>> that the Internet doesn't concern itself so much protocols but with
>>>> respect to proxies, here that for example NNTP and IMAP don't have
>>>> so much anything so related that way after startTLS. For the world of
>>>> account, is basically to have for a domain name, an administrator, and,
>>>> an owner or representative. These are to establish authority for
>>>> changes
>>>> and also accountability for usage.
>>>>
>>>> Basically they're to be persons and there is a process to get to be an
>>>> administrator of DNS, most always there are services that a usual
>>>> person
>>>> implementing the system might use, besides for example the numerical.
>>>>
>>>> More relevant though to DNS is getting servers on the network, with
>>>> respect
>>>> to listening ports and that they connect to clients what so discover
>>>> them as
>>>> via DNS or configuration, here as above the usual notion that these are
>>>> standard services and run on well-known ports for inetd or systemd.
>>>> I.e. there is basically that running a server and dedicated networking,
>>>> and power and so on, and some notion of the limits of reliability, is
>>>> then
>>>> as very much in other aspects of the organization of the system, i.e.
>>>> its name,
>>>> while at the same time, the point that a module makes for that
>>>> basically
>>>> the provision of a domain name or well-known or ephemeral host, is the
>>>> usual notion that static IP addresses are a limited resource and as
>>>> about
>>>> the various networks in IPv4 and how they route traffic, is for that
>>>> these
>>>> services have well-known sections in DNS for at least that the most
>>>> usual
>>>> configuration is none.
>>>>
>>>> For a usual global reliability and availability, is some notion
>>>> basically that
>>>> each region and zone has a service available on the IP address, for
>>>> that
>>>> "hostname" resolves to the IP addresses. As well, in reverse, for
>>>> the IP
>>>> address and about the hostname, it should resolve reverse to hostname.
>>>>
>>>> About certificates mostly for identification after mapping to port, or
>>>> multi-home Internet routing, here is the point that whether the domain
>>>> name administration is "epochal" or "regular", is that epochs are
>>>> defined
>>>> by the ports behind the numbers and the domain name system as well,
>>>> where in terms of the registrar, the domain names are epochal to the
>>>> registrar, with respect to owners of domain names.
>>>>
>>>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>>>> and also BGP and NAT and routing and what are local and remote
>>>> addresses, here is for not-so-much "implement DNS the protocol
>>>> also while you're at it", rather for what results that there is a
>>>> durable
>>>> and long-standing and proper doorman, for some usenet.science.
>>>>
>>>> Here then the notion seems to be whether the doorman basically
>>>> knows well-known services, is a multi-homing router, or otherwise
>>>> what is the point that it starts the lean runtime, with respect to that
>>>> it's a container and having enough sense of administration its
>>>> operation
>>>> as contained. I.e. here given a port and a hostname and always running
>>>> makes for that as long as there is the low (preferable no) idle for
>>>> services
>>>> running that have no clients, is here also for the cheapest doorman
>>>> that
>>>> knows how to standup the client sentinel. (And put it back away.)
>>>>
>>>> Probably the most awful thing in the cloud services is the cost for
>>>> data ingress and egress. What that means is that for example using
>>>> a facility that is bound by that as a cost instead of under some
>>>> constant
>>>> cost, is basically why there is the approach that the containers
>>>> needs a
>>>> handle to the files, and they're either local files or network files,
>>>> here
>>>> with the some convention above in archival a shared consistent view
>>>> of all the files, or abstractly consistent, is for making that the
>>>> doorman
>>>> can handle lots of starting and finishing connections, while it is
>>>> out of
>>>> the way when usually it's client traffic and opening and closing
>>>> connections,
>>>> and the usual abstraction is that the client sentinel is never off
>>>> and doorman
>>>> does nothing, here is for attaching the one to some lower constant
>>>> cost,
>>>> where for example any long-running cost is more than some low
>>>> constant cost.
>>>>
>>>> Then, this kind of service is often represented by nodes, in the
>>>> usual sense
>>>> "here is an abstract container with you hope some native performance
>>>> under
>>>> the hypervisor where it lives on the farm on its rack, it basically
>>>> is moved the
>>>> image to wherever it's requested from and lives there, have fun, the
>>>> meter is on".
>>>> I.e. that's just "this Jar has some config conventions and you can
>>>> make the
>>>> container associate it and watchdog it with systemd for example and
>>>> use the
>>>> cgroups while you're at it and make for tempfs quota and also the
>>>> best network
>>>> file share, which you might be welcome to cache if you care just in
>>>> the off-chance
>>>> that this file-mapping is free or constant cost as long as it doesn't
>>>> egress the
>>>> network", is for here about the facilities that work, to get a copy
>>>> of the system
>>>> what with respect to its usual operation is a piece of the Internet.
>>>>
>>>> For the different reference modules (industry factories) in their
>>>> patterns then
>>>> and under combined configuration "file + process + network + fare",
>>>> is that
>>>> the fare of the service basically reflects a daily coin, in the sense
>>>> that it
>>>> represents an annual or epochal fee, what results for the time there is
>>>> what is otherwise all defined the "file + process + network + name",
>>>> what results it perpetuates in operation more than less simply and
>>>> automatically.
>>>>
>>>> Then, the point though is to get it to where "I can go to this
>>>> service, and
>>>> administer it more or less by paying an account, that it thus lives
>>>> in its
>>>> budget and quota in its metered world".
>>>>
>>>> That though is very involved with identity, that in terms of "I the
>>>> account
>>>> as provided this sum make this sum paid with respect to an agreement",
>>>> is that authority to make agreements must make that it results that the
>>>> operation of the system, is entirely transparent, and defined in
>>>> terms of
>>>> the roles and delegation, conventions in operation.
>>>>
>>>> I.e., I personally don't want to administer a copy of usenet, but,
>>>> it's here
>>>> pretty much sorted out that I can administer one once then that it's to
>>>> administer itself in the following, in terms of it having resources
>>>> to allocate
>>>> and resources to disburse. Also if nobody's using it it should
>>>> basically work
>>>> itself out to dial its lights down (while maintaining availability).
>>>>
>>>> Then a point seems "maintain and administer the operation in effect,
>>>> what arrangement sees via delegation, that a card number and a phone
>>>> number and an email account and more than less a responsible entity,
>>>> is so indicated for example in cryptographic identity thus that the
>>>> operation
>>>> of this system as a service, effectively operates itself out of a
>>>> kitty,
>>>> what makes for administration and overhead, an entirely transparent
>>>> model of a miniature business the system as a service".
>>>>
>>>> "... and a mailing address and mail service."
>>>>
>>>> Then, for accounts and accounts, for example is the provision of the
>>>> component
>>>> as simply an image in cloud algorithms, where basically as above here
>>>> it's configured
>>>> that anybody with any cloud account could basically run it on their
>>>> own terms,
>>>> there is for here sorting out "after this delegation to some business
>>>> entity what
>>>> results a corporation in effect, the rest is business-in-a-box and
>>>> more-than-less
>>>> what makes for its administration in state, is for how it basically
>>>> limits and replicates
>>>> its service, in terms of its own assets here as what administered is
>>>> abstractly
>>>> "durable forever mailboxes with private ownership if on public or
>>>> managed resources".
>>>>
>>>> A usual notion of a private email and usenet service offering and
>>>> business-in-a-box,
>>>> here what I'm looking at is that besides archiving sci.math and
>>>> copying out its content
>>>> under author line, is to make such an industry for example here that
>>>> "once having
>>>> implemented an Internet service, an Internet service of them results
>>>> Internet".
>>>>
>>>> I.e. here the point is to make a corporation and a foundation in
>>>> effect, what in terms
>>>> of then about the books and accounts, is about accounts for the
>>>> business accounts
>>>> that reflect a persistent entity, then what results in terms of
>>>> computing, networking,
>>>> and internetworking, with a regular notion of "let's never change
>>>> this arrangement
>>>> but it's in monthly or annual terms", here for that in overall
>>>> arrangements,
>>>> it results what the entire system more than less runs in ways then to
>>>> either
>>>> run out its limits or make itself a sponsored effort, about
>>>> more-or-less a simple
>>>> and responsible and accountable set of operations what effect the
>>>> business
>>>> (here that in terms of service there is basically the realm of
>>>> agreement)
>>>> that basically this sort of business-in-a-box model, is then besides
>>>> itself of
>>>> accounts, toward the notion as pay-as-you-go and "usual credits and
>>>> their limits".
>>>>
>>>> Then for a news://usenet.science, or for example
>>>> sci.math.usenet.science,
>>>> is the idea that the entity is "some assemblage what is so that in
>>>> DNS, and,
>>>> in the accounts payable and receivable, and, in the material matters of
>>>> arrangement and authority for administration, of DNS and resources and
>>>> accounts what result durably persisting the business, is basically
>>>> for a service
>>>> then of what these are usual enough tasks, as that are interactive
>>>> workflows
>>>> and for mechanical workflows.
>>>>
>>>> I.e. the point is for having the service than an on/off button and
>>>> more or less
>>>> what is for a given instance of the operation, what results from some
>>>> protocol
>>>> that provides a "durable store" of a sort of the business, that at
>>>> any time basically
>>>> some re-routine or "eventually consistent" continuance of the
>>>> operation of the
>>>> business, results basically a continuity in its operations, what is
>>>> entirely granular,
>>>> that here for example the point is to "pick a DNS name, attach an
>>>> account service,
>>>> go" it so results that in the terms, basically there are the
>>>> placeholders of the
>>>> interactive workflows in that, and as what in terms are often for
>>>> example simply
>>>> card and phone number terms, account terms.
>>>>
>>>> I.e. a service to replenish accounts as kitties for making accounts
>>>> only and
>>>> exactly limited to the one service, its transfers, basically results
>>>> that there
>>>> is the notion of an email address, a phone number, a credit card's
>>>> information,
>>>> here a fixed limit debit account that works as of a kitty, there is a
>>>> regular workflow
>>>> service that will read out the durable stores and according to the
>>>> timeliness of
>>>> their events, affect the configuration and reconciliation of payments
>>>> for accounts
>>>> (closed loop scheduling/receiving).
>>>>
>>>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>>>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>>>
>>>> Basically for dailies, monthlies, and annuals, what make weeklies,
>>>> is this idea of Internet-from-a- account, what is services.
>>>
>>>
>>> After implementing a store, and the protocol for getting messages,
>>> then what seems relevant here in the
>>> context of the SEARCH command, is a fungible file-format, that is
>>> derived from the body of the message
>>> in a normal form, that is a data structure that represents an index
>>> and catalog and dictionary and summary
>>> of the message, a form of a data structure of a "search index".
>>>
>>> These types files should naturally compose, and result a data
>>> structure that according to some normal
>>> forms of search and summary algorithms, result that a data structure
>>> results, that makes for efficient
>>> search of sections of the corpus for information retrieval, here that
>>> "information retrieval is the science
>>> of search algorithms".
>>>
>>> Now, for what and how people search, or what is the specification of a
>>> search, is in terms of queries, say,
>>> here for some brief forms of queries that advise what's definitely
>>> included in the search, what's excluded,
>>> then perhaps what's maybe included, or yes/no/maybe, which makes for a
>>> predicate that can be built,
>>> that can be applied to results that compose and build for the terms of
>>> a filter with yes/no/maybe or
>>> sure/no/yes, with predicates in values.
>>>
>>> Here there is basically "free text search" and "matching summaries",
>>> where text is the text and summary is
>>> a data structure, with attributes as paths the leaves of the tree of
>>> which match.
>>>
>>> Then, the message has text, its body, and and headers, key-value pairs
>>> or collections thereof, where as well
>>> there are default summaries like "a histogram of words by occurrence"
>>> or for example default text like "the
>>> MIME body of this message has a default text representation".
>>>
>>> So, the idea developing here is to define what are "normal" forms of
>>> data structures that have some "normal"
>>> forms of encoding that result that these "normalizing" after
>>> "normative" data structures define well-behaved
>>> algorithms upon them, which provide well-defined bounds in resources
>>> that return some quantification of results,
>>> like any/each/every/all, "hits".
>>>
>>> This is where usually enough search engines' or collected search
>>> algorithms ("find") usually enough have these
>>> de-facto forms, "under the hood", as it were, to make it first-class
>>> that for a given message and body that
>>> there is a normal form of a "catalog summary index" which can be
>>> compiled to a constant when the message
>>> is ingested, that then basically any filestore of these messages has
>>> alongside it the filestore of the "catsums"
>>> or as on-demand, then that any algorithm has at least well-defined
>>> behavior under partitions or collections
>>> or selections of these messages, or items, for various standard
>>> algorithms that separate "to find" from
>>> "to serve to find".
>>>
>>> So, ..., what I'm wondering are what would be sufficient normal forms
>>> in brief that result that there are
>>> defined for a given corpus of messages, basically at the granularity
>>> of messages, how is defined how
>>> there is a normal form for each message its "catsum", that catums have
>>> a natural algebra that a
>>> concatenation of catums is a catsum and that some standard algorithms
>>> naturally have well-defined
>>> results on their predicates and quantifiers of matching, in serial and
>>> parallel, and that the results
>>> combine in serial and parallel.
>>>
>>> The results should be applicable to any kind of data but here it's
>>> more or less about usenet groups.
>>>
>>
>>
>> So, if you know all about old-fashioned
>> Internet protocols like DNS, then NNTP,
>> IMAP, SMTP, HTTP, and so on, then where
>> it's at is figuring out these various sorts
>> conventions then to result a sort-of, the
>> sensible, fungible, and tractable, conventions
>> of the data structures and algorithms, in
>> the protocols, what result keeping things
>> simple and standing up a usual Internet
>> messaging agentry.
>>
>>
>> BFF: backing-file formats, "Best friends forever"
>>
>> Message files
>> Group files
>>
>> Thread link files
>> Date link files
>>
>> SFF: search-file formats, "partially digested metadata"
>>
>>
>>
>> NOOBNB: Noob Nota Bene: Cur/Pur/Raw
>>
>> Load Roll/Fold/Shed/Hold: throughput/offput
>>
>>
>>
>> Then, the idea is to make it so that by constructing
>> the files or a logical/physical sort of distinction,
>> that then results a neat tape archive then that
>> those can just be laid down together and result
>> a corpus, or filtered on down and result a corpus,
>> where the existence standard is sort of called "mailbox"
>> or "mbox" format, with the idea muchly of
>> "converting mbox to BFF".
>>
>>
>> Then, for enabling search, basically the idea or a
>> design principle of the FF is that they're concatenable
>> or just overlaid and all write-once-read-many, then
>> with regards to things like merges, which also should
>> result as some sort of algorithm in tools, what results,
>> that of course usual sorts tools like textutils, working
>> on these files, would make it so that usual extant tools,
>> are native on the files.
>>
>> So for metadata, the idea is that there are standard
>> metadata attributes like the closed categories of
>> headers and so on, where the primary attributes sort
>> of look like
>>
>> message-id
>> author
>>
>> delivery-path
>> delivery-metadata (account, GUID, ...)
>>
>> destinations
>>
>> subject
>> size
>> content
>>
>> hash-raw-id <- after message-id
>> hash-invariant-id <- after removing inconstants
>> hash-uncoded-id <- after uncoding out to full
>>
>> Because messages are supposed to be unique,
>> there's an idea to sort of detect differences.
>>
>>
>> The idea is to sort of implement NNTP's OVERVIEW
>> and WILDMAT, then there's IMAP, figuring that the
>> first goals of SFF is to implement the normative
>> commands, then with regards to implementations,
>> basically working up for HTTP SEARCH, a sort of
>> normative representation of messages, groups,
>> threads, and so on, sort of what results a neat sort
>> standard system for all sorts purposes these, "posts".
>>
>>
>> Anybody know any "normative RFC email's in HTTP"?
>> Here the idea is basically that a naive server
>> simply gets pointed at BFF files for message-id
>> and loads any message there as an HTTP representation,
>> with regards to HTTP, HTML, and so on, about these
>> sorts "sensible, fungible, tractable" conventions.
>>
>>
>> It's been a while since I studied the standards,
>> so I'm looking to get back tapping at the C10K server
>> here, basically with hi-po full throughput then with
>> regards to the sentinel/doorman bit (Load R/F/S/H).
>>
>> So, I'll be looking for "partially digested and
>> composable search metadata formats" and "informative
>> and normative standards-based message and content".
>>
>> They already have one of those, it's called "Internet".
>>
>>
>
>
>
> Reading up on anti-spam, it seems that Usenet messages have
> a pretty simple format, then with regards to all of Internet
> messages, or Email and MIME and so on, gets into basically
> the nitty-gritty of the Internet Protocols like SMTP, IMAP, NNTP,
> and HTTP, about figuring out what's the needful then for things
> like Netnews messages, Email messages, HTTP messages,
> and these kinds of things, basically for message multi-part.
>
> https://en.wikipedia.org/wiki/MIME
>
> (DANE, DKIM, DMARC, ....)
>
> It's kind of complicated to implement correctly the parsing
> of Internet messages, so, it should be done up right.
>
> The compeering would involve the conventions of INND.
> The INND software is very usual, vis-a-vis Tornado or some
> commercial cousins, these days.
>
> The idea seems to be "run INND with cleanfeed", in terms
> of control and junk and the blood/brain barrier or here
> the text/binaries barrier, I'm only interested in setting up
> for text and then maybe some "richer text" or as with
> regards to Internet protocols for messaging and messages.
>
> Then the idea is to implement this "clean-room", so it results
> a sort of plain description of data structures logical/physical
> then a reference implementation.
>
> The groups then accepted/rejected for compeering basically
> follow the WILDMAT format, which is pretty reasonable
> in terms of yes/no/maybe or sure/no/yes sorts of filters.
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/newsfeeds.html
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/libstorage.html
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html#S2
>
> It refers to the INND storageApi token so I'll be curious about
> that and BFF. The tradspool format, here as it partitions under
> groups, is that BFF instead partitions under message-ID, that
> then groups files have pointers into those.
>
> message-id/
>
> id <- "id"
>
> hd <- "head"
> bd <- "body"
>
> td <- "thread", reference, references
> rd <- "replied to", touchfile
>
> ad <- "author directory", ... (author id)
> yd <- "year to date" (date)
>
> xd <- "expired", no-archive, ...
> dd <- "dead", "soft-delete"
> ud <- "undead", ...
>
> The files here basically indicate by presence then content,
> what's in the message, and what's its state. Then, the idea
> is that some markers basically indicate any "inconsistent" state.
>
> The idea is that the message-id folder should be exactly on
> the order of the message size, only. I.e. besides head and body,
> the other files are only presence indicators or fixed size.
> And, the presence files should be limited to fit in the range
> of the alphabet, as above it results single-letter named files.
>
> Then the idea is that the message-id folder is created on the
> side with id,hd,bd then just moved/renamed into its place,
> then by its presence the rest follows. (That it's well-formed.)
>
> The idea here again is that the storage is just stored deflated already,
> with the idea that then as the message is served up with threading,
> where to litter the thread links, and whether to only litter the
> referring post's folder with the referenced post's ID, or that otherwise
> there's this idea that it's a poor-man's sort of write-once-read-many
> organization, that's horizontally scalable, then that any assemblage
> of messages can be overlaid together, then groups files can be created
> on demand, then that as far as files go, the natural file-system cache,
> caches access to the files.
>
> The idea that the message is stored compressed is that many messages
> aren't much read, and most clients support compressed delivery,
> and the common deflate format allows "stitching" together in
> a reference algorithm, what results the header + glue + body.
> This will save much space and not be too complicated to assemble,
> where compression and encryption are a lot of the time,
> in Internet protocols.
>
> The message-id is part of the message, so there's some idea that
> it's also related to de-duplication under path, then that otherwise
> when two messages with the same message-id arrive, but different
> otherwise content, is wrong, about what to do when there are conflicts
> in content.
>
> All the groups files basically live in one folder, then with regards
> to their overviews, as that it sort of results just a growing file,
> where the idea is that "fixed length records" pretty directly relate
> a simplest sort of addressing, in a world where storage has grown
> to be unbounded, if slow, that it also works well with caches and
> mmap and all the usual facilities of the usual general purpose
> scheduler and such.
>
> Relating that to time-series data then and currency, is a key sort
> of thing, about here that the idea is to make for time-series
> organization that it's usually enough hierarchical YYYYMMDD,
> or for example YYMMDD, if for example this system's epoch
> is Jan 1 2000, with a usual sort of idea then to either have
> a list of message ID's, or, indices that are offsets to the group
> file, or, otherwise as to how to implement access in partition
> to relations of the items, for browsing and searching by date.
>
> Then it seems for authors there's a sort of "author-id" to get
> sorted, so that basically like threads is for making the
> set-associativity of messages and threads, and groups, to authors,
> then also as with regards to NOOBNB that there are
> New/Old/Off authors and Bot/Non/Bad authors,
> keeping things simple.
>
> Here the idea is that authors, who reply to other authors,
> are related variously, people they reply to and people who
> reply to them, and also the opposite, people who they
> don't reply to and people who don't reply to them.
> The idea is that common interest is reflected in replies,
> and that can be read off the messages, then also as
> for "direct" and "indirect" replies, either down the chain
> or on the same thread, or same group.
>
> (Cliques after Kudos and "Frenemies" after "Jabber",
> are about same, in "tendered response" and "tendered reserve",
> in groups, their threads, then into the domain of context.)
>
> So, the first part of SFF seems to be making OVERVIEW,
> which is usual key attributes, then relating authorships,
> then as about content. As well for supporting NNTP and IMAP,
> is for some default SFF supporting summary and retrieval.
>
> groups/group-id/
>
> ms <- messages
>
> <- overview ?
> <- thread heads/tails ?
> <- authors ?
> <- date ranges ?
>
> It's a usual idea that BFF, the backing file-format, and
> SFF, the search file-format, has that they're distinct
> and that SFF is just derived from BFF, and on-demand,
> so that it works out that search algorithms are implemented
> on BFF files, naively, then as with regards to those making
> their own plans and building their own index files as then
> for search and pointing those back to groups, messages,
> threads, authors, and so on.
>
>
> The basic idea of expiry or time-to-live is basically
> that there isn't one, yet, it's basically to result that
> the message-id folders get tagged in usual rotations
> over the folders in the arrival and date partitions,
> then marked out or expunged or what, as with regards
> to the write-once-read-many or regenerated groups
> files, and the presence or absence of messages by their ID.
> (And the state of authors, in time and date ranges.)
>
>
>

Click here to read the complete article

So I'm looking at my hi-po C10K low-load/constant-load
Internet text protocol server, then with respect to
encryption and compression as usual, then I'm looking
to make that in the framework, to have those basically
be out-of-band, with respect to things like
encryption and compression, or things like
transport and HTTP or "upgrade".

I.e., the idea here is to implement the servers first
in "TLS-terminated" or un-encrypted, then as with
respect to having enough aware in the protocol,
to make for adapting to encrypting and compressing
and upgrading front-ends, with regards to the
publicly-facing endpoints and the internally-facing
endpoints, which you would know about if you're
usually enough familiar with client-server frameworks
and server-oriented architecture and these kinds of
things.

The idea then is to offload the TLS-termination
to a sort of dedicated layer, then as with regards
to a generic sort of "out-of-band" state machine
the establishment and maintenance of the connections,
where still I'm mostly interested in "stateful" protocols
or "connection-oriented" vis-a-vis the "datagram"
protocols, or about endpoints and sockets vis-a-vis
endpoints and datagrams, those usually enough sharing
an address family while variously their transport (packets).

Then there's sort of whether to host TLS-termination
inside the runtime as usually, or next to it as sort of
either in-process or out-of-process, similarly with
compression, and including for example concepts
of cache-ing, and upgrade, and these sorts things,
while keeping it so that the "protocol module" is
all self-contained and behaves according to protocol,
for the great facility of the standardization and deployment
of Internet protocols in a friendly sort of environment,
vis-a-vis the DMZ to the wider Internet, as basically with
the idea of only surfacing one well-known port and otherwise
abstracting away the rest of the box altogether,
to reduce the attack surface its vectors, for
a usual goal of thread-modeling, reducing it.

So people would usually enough just launch a proxy,
but I'm mostly interested only in supporting TLS and
perhaps compression in the protocol as only altogether
a pass-through layer, then as with regards to connecting
that in-process as possible, so passing I/O handles,
otherwise with a usual notion of domain sockets
or just plain Address Family UNIX sockets.

There's basically whether the publicly-facing actually
just serves on the usual un-encrypted port, for the
insensitive types of things, and the usual encrypted
port, or whether it's mostly in the protocol that
STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
and with respect to usually there's no notion at all
of STREAMS or "out-of-band" in STREAMS, sockets,
Address Family UNIX.

The usual notion here is making it like so:

NNTP
IMAP -> NNTP
HTTP -> IMAP -> NNTP

for a Usenet service, then as with respect to
that there's such high affinity of SMTP, then
as with regards to HTTP more generally as
the most usual fungible de facto client-server
protocol, is connecting those locally after
TLS-termination, while still having TLS-layer
between the Internet and the server.

So in this high-performance implementation it
sort of relies directly on the commonly implemented
and ubiquitously available non-blocking I/O of
the runtime, here as about keeping it altogether
simple, with respect to the process model,
and the runtime according to the OS/virt/scheduler's
login and quota and bindings, and back-end,
that in some runtimes like an app-container,
that's supposed to live all in-process, while with
respect to off-loading load to right-sized resources,
it's sort of general.

Then I've written this mostly in Java and plan to
keep it this way, where the Direct Memory for
the service of non-blocking I/O, is pretty well
understood, vis-a-vis actually just writing this
closer to the user-space libraries, here as with
regards to usual notions of cross-compiling and
so on. Here it's kind of simplified because this
entire stack has no dependencies outside the
usual Virtual Machine, it compiles and runs
without a dependency manager at all, then
though that it gets involved the parsing the content,
while simply the framework of ingesting, storing,
and moving blobs is just damn fast, and
very well-behaved in the resources of the runtime.

So, setting up TLS termination for these sorts
protocols where the protocol either does or
doesn't have an explicit STARTTLS up front
or always just opens with the handshake,
basically has where I'm looking at how to
instrument and connect that for the Hopper
as above and how besides passing native
file and I/O handles and buffers, what least
needful results a useful approach for TLS on/off.

So, this is a sort of approach, figuring for
"nesting the protocols", where similarly is
the goal of having the fronting of the backings,
sort of like so, ...

NNTP
IMAP -> NNTP
HTTP -> NNTP
HTTP -> IMAP -> NNTP

with the front being in the protocol, then
that HTTP has a sort of normative protocol
for IMAP and NNTP protocols, and IMAP
has as for NNTP protocols, treating groups
like mailboxes, and commands as under usual
sorts HTTP verbs and resources.

Similarly the same server can just serve each
the relevant protocols on each the relevant ports.

If you know these things, ....

Looking at how Usenet moderated groups operate,
well first there's PGP and control messages then
later it seems there's this sort Stump/Webstump
setup, or as with regards to moderators.isc.org,
what is usual with regards to control messages
and usual notions of control and cancel messages
and as with regards to newsgroups that actually
want to employ Usenet moderation sort of standardly.

(Usenet trust is mostly based on PGP, or
'Philip Zimmerman's Pretty Good Privacy',
though there are variations and over time.)

http://tools.ietf.org/html/rfc5537

http://wiki.killfile.org/projects/usenet/faqs/nam/

Reading into RFC5537 gets into some detail like
limits in the headers field with respect to References
or Threads:

https://datatracker.ietf.org/doc/html/rfc5537#section-3.4.4

https://datatracker.ietf.org/doc/html/rfc5537#section-3.5.1

So, the agents are described as

Posting
Injecting
Relaying
Serving
Reading

Moderator
Gateway

then with respect to these sorts separations duties,
the usual notions of Internet protocols their agents
and behavior in the protocol, old IETF MUST/SHOULD/MAY
and so on.

So, the goal here seems to be to define a
profile of "connected core services" of sorts
of Internet protocol messaging, then this
"common central storage" of this BFF/SFF
and then reference implementations then
for reference editions, these sorts things.

Of course there already is one, it's called
"Internet mail and news".

On 02/11/2024 02:18 PM, Ross Finlayson wrote:
>
> So I'm looking at my hi-po C10K low-load/constant-load
> Internet text protocol server, then with respect to
> encryption and compression as usual, then I'm looking
> to make that in the framework, to have those basically
> be out-of-band, with respect to things like
> encryption and compression, or things like
> transport and HTTP or "upgrade".
>
> I.e., the idea here is to implement the servers first
> in "TLS-terminated" or un-encrypted, then as with
> respect to having enough aware in the protocol,
> to make for adapting to encrypting and compressing
> and upgrading front-ends, with regards to the
> publicly-facing endpoints and the internally-facing
> endpoints, which you would know about if you're
> usually enough familiar with client-server frameworks
> and server-oriented architecture and these kinds of
> things.
>
> The idea then is to offload the TLS-termination
> to a sort of dedicated layer, then as with regards
> to a generic sort of "out-of-band" state machine
> the establishment and maintenance of the connections,
> where still I'm mostly interested in "stateful" protocols
> or "connection-oriented" vis-a-vis the "datagram"
> protocols, or about endpoints and sockets vis-a-vis
> endpoints and datagrams, those usually enough sharing
> an address family while variously their transport (packets).
>
> Then there's sort of whether to host TLS-termination
> inside the runtime as usually, or next to it as sort of
> either in-process or out-of-process, similarly with
> compression, and including for example concepts
> of cache-ing, and upgrade, and these sorts things,
> while keeping it so that the "protocol module" is
> all self-contained and behaves according to protocol,
> for the great facility of the standardization and deployment
> of Internet protocols in a friendly sort of environment,
> vis-a-vis the DMZ to the wider Internet, as basically with
> the idea of only surfacing one well-known port and otherwise
> abstracting away the rest of the box altogether,
> to reduce the attack surface its vectors, for
> a usual goal of thread-modeling, reducing it.
>
>
> So people would usually enough just launch a proxy,
> but I'm mostly interested only in supporting TLS and
> perhaps compression in the protocol as only altogether
> a pass-through layer, then as with regards to connecting
> that in-process as possible, so passing I/O handles,
> otherwise with a usual notion of domain sockets
> or just plain Address Family UNIX sockets.
>
> There's basically whether the publicly-facing actually
> just serves on the usual un-encrypted port, for the
> insensitive types of things, and the usual encrypted
> port, or whether it's mostly in the protocol that
> STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
> and with respect to usually there's no notion at all
> of STREAMS or "out-of-band" in STREAMS, sockets,
> Address Family UNIX.
>
>
> The usual notion here is making it like so:
>
> NNTP
> IMAP -> NNTP
> HTTP -> IMAP -> NNTP
>
> for a Usenet service, then as with respect to
> that there's such high affinity of SMTP, then
> as with regards to HTTP more generally as
> the most usual fungible de facto client-server
> protocol, is connecting those locally after
> TLS-termination, while still having TLS-layer
> between the Internet and the server.
>
> So in this high-performance implementation it
> sort of relies directly on the commonly implemented
> and ubiquitously available non-blocking I/O of
> the runtime, here as about keeping it altogether
> simple, with respect to the process model,
> and the runtime according to the OS/virt/scheduler's
> login and quota and bindings, and back-end,
> that in some runtimes like an app-container,
> that's supposed to live all in-process, while with
> respect to off-loading load to right-sized resources,
> it's sort of general.
>
> Then I've written this mostly in Java and plan to
> keep it this way, where the Direct Memory for
> the service of non-blocking I/O, is pretty well
> understood, vis-a-vis actually just writing this
> closer to the user-space libraries, here as with
> regards to usual notions of cross-compiling and
> so on. Here it's kind of simplified because this
> entire stack has no dependencies outside the
> usual Virtual Machine, it compiles and runs
> without a dependency manager at all, then
> though that it gets involved the parsing the content,
> while simply the framework of ingesting, storing,
> and moving blobs is just damn fast, and
> very well-behaved in the resources of the runtime.
>
> So, setting up TLS termination for these sorts
> protocols where the protocol either does or
> doesn't have an explicit STARTTLS up front
> or always just opens with the handshake,
> basically has where I'm looking at how to
> instrument and connect that for the Hopper
> as above and how besides passing native
> file and I/O handles and buffers, what least
> needful results a useful approach for TLS on/off.
>
> So, this is a sort of approach, figuring for
> "nesting the protocols", where similarly is
> the goal of having the fronting of the backings,
> sort of like so, ...
>
> NNTP
> IMAP -> NNTP
> HTTP -> NNTP
> HTTP -> IMAP -> NNTP
>
> with the front being in the protocol, then
> that HTTP has a sort of normative protocol
> for IMAP and NNTP protocols, and IMAP
> has as for NNTP protocols, treating groups
> like mailboxes, and commands as under usual
> sorts HTTP verbs and resources.
>
> Similarly the same server can just serve each
> the relevant protocols on each the relevant ports.
>
> If you know these things, ....
>
>

So one thing I want here is to make it so that data can
be encrypted very weakly at rest, then, that, the SSL
or TLS for TLS 1.2 or TLS 1.3, results that the symmetric
key bits for the records is always the same as this what
is the very-weak key.

This way pretty much the entire CPU load of TLS is
eliminated, while still the data is encrypted very-weakly
which at least naively is entirely inscrutable.

The idea is that in TLS 1.2 there's this

client random cr ->
<- server random sr
client premaster cpm ->

these going into PRF (cpm, 'blah', cr + sr, [48]), then
whether renegotiation keeps the same client random
and client premaster, then that the server can compute
the server random to make it so derived the very-weakly
key, or for example any of what results least-effort.

Maybe not, sort of depends.

Then the TLS 1.3 has this HKDF, HMAC Key Derivation Function,
it can again provide a salt or server random, then as with
regards to that filling out in the algorithm to result the
very-weakly key, for a least-effort block cipher that's also
zero-effort and being a pass-through no-op, so the block cipher
stays out the way of the data already concatenably-compressed
and very-weakly encrypted at rest.

Then it looks like I'd be trying to make hash collisions which
is practically intractable, about what goes into the seeds
whether it can result things like "the server random is
zero minus the client random, their sum is zero" and
this kind of thing.

I suppose it would be demonstrative to setup a usual
sort of "TLS man-in-the-middle" Mitm just to demonstrate
that given the client trusts any of Mitm's CAs and the
server trusts any of Mitm's CAs that Mitm sits in the middle
and can intercept all traffic.

So, the TLS 1.2, PRF or pseudo-random function, is as of
"a secret, a seed, and an identifying label". It's all SHA-256
in TLS 1.2. Then it's iterative over the seed, that the
secret is hashed with the seed-hashed secret so many times,
each round of that concatenated ++ until there's enough bytes
to result the key material. Then in TLS the seed is defined
as "blah' ++ seed, so, to figure out how to figure to make it
so that 'blah' ++ (client random + server random) makes it
possible to make a spigot of the hash algorithm, of zeros,
or an initial segment long enough for all key sizes,
to split out of that the server write MAC and encryption keys,
then to very-weakly encrypt the data at rest with that.

Then the client would still be sending up with the client
MAC and encryption keys, about whether it's possible
to setup part of the master key or the whole thing.
Whether a client could fabricate the premaster secret
so that the data resulted very-weakly encryped on its
own terms, doesn't seem feasible as the client random
is sent first, but cooperating could help make it so,
with regards to the client otherwise picking a weak
random secret overall.

Click here to read the complete article

"Search", then, here the idea is to facilitate search, variously.

SEARCH: it's an HTTP verb, with an indicate request body.
What are its semantics? It's undefined, just a request/response
with a request body.

SEARCH: it's an IMAP command.

WILDMAT: sometimes "find" is exactly the command that's
running on file systems, and its predicates are similar with
WILDMAT, as with regards to match/dont/match/dont/...,
about what is "accepter/rejector networks", for the usual
notions of formal automata of the accepter and rejector,
and the binary propositions what result match/dont,
with regards usually to the relation called "match".

After BFF a sort of "a normative file format with the
properties of being concatenable resulting set-like
semantics", is the idea that "SFF" or "search file format"
is for _summaries_ and _digests_ and _intermediate_
forms, what result data that's otherwise derived from
"the data", derived on demand or cached opportunistically,
about the language of "Information Retrieval", after the
language of "summary" and "digest".

The word "summary" basically reflects on statistics,
that a "summary statistic" in otherwise the memoryless,
like a mean, is for histograms and for "match", about
making what is summary data.

For some people the search corpus is indices, for
something like the open-source search engines,
which are just runtimes that have usual sorts
binary data structures for log N lookups,
here though the idea is a general form as for
"summary", that is tractable as files, then what
can be purposed to being inputs to usual sorts
"key-value" or "content", "hits", in documents.

For some people the search corpus is the fully-normalized
database, then all sorts usual queries what result
denormalized data and summaries and the hierarchical
and these kinds things.

So, here the sort of approach is for the "Library/Museum",
about the "Browse, Exhibits, Tours, Carrels", that search
and summary and digest and report is a lot of different
things, with the idea that "SFF" files, generally, make it
sensible, fungible, and tractable, how to deal with all this.

It's not really part of "NNTP, IMAP, HTTP", yet at the same
time, it's a very generic sort of thing, here with the idea
that by designing some reference algorithms that result
making partially digested summary with context,
those just being concatenable, then that the usual
idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
that being about same as Wildmat, for variously attributes
and content, and the relations in documents and among them,
gets into these ideas about how tooling generally results,
making for files what then have simple algorithms that
work on them, variously repurposable to compiled indices
for usual "instant gratification" types.

On 02/17/2024 11:38 AM, Ross Finlayson wrote:
> "Search", then, here the idea is to facilitate search, variously.
>
> SEARCH: it's an HTTP verb, with an indicate request body.
> What are its semantics? It's undefined, just a request/response
> with a request body.
>
> SEARCH: it's an IMAP command.
>
> WILDMAT: sometimes "find" is exactly the command that's
> running on file systems, and its predicates are similar with
> WILDMAT, as with regards to match/dont/match/dont/...,
> about what is "accepter/rejector networks", for the usual
> notions of formal automata of the accepter and rejector,
> and the binary propositions what result match/dont,
> with regards usually to the relation called "match".
>
> After BFF a sort of "a normative file format with the
> properties of being concatenable resulting set-like
> semantics", is the idea that "SFF" or "search file format"
> is for _summaries_ and _digests_ and _intermediate_
> forms, what result data that's otherwise derived from
> "the data", derived on demand or cached opportunistically,
> about the language of "Information Retrieval", after the
> language of "summary" and "digest".
>
> The word "summary" basically reflects on statistics,
> that a "summary statistic" in otherwise the memoryless,
> like a mean, is for histograms and for "match", about
> making what is summary data.
>
> For some people the search corpus is indices, for
> something like the open-source search engines,
> which are just runtimes that have usual sorts
> binary data structures for log N lookups,
> here though the idea is a general form as for
> "summary", that is tractable as files, then what
> can be purposed to being inputs to usual sorts
> "key-value" or "content", "hits", in documents.
>
> For some people the search corpus is the fully-normalized
> database, then all sorts usual queries what result
> denormalized data and summaries and the hierarchical
> and these kinds things.
>
> So, here the sort of approach is for the "Library/Museum",
> about the "Browse, Exhibits, Tours, Carrels", that search
> and summary and digest and report is a lot of different
> things, with the idea that "SFF" files, generally, make it
> sensible, fungible, and tractable, how to deal with all this.
>
> It's not really part of "NNTP, IMAP, HTTP", yet at the same
> time, it's a very generic sort of thing, here with the idea
> that by designing some reference algorithms that result
> making partially digested summary with context,
> those just being concatenable, then that the usual
> idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
> that being about same as Wildmat, for variously attributes
> and content, and the relations in documents and among them,
> gets into these ideas about how tooling generally results,
> making for files what then have simple algorithms that
> work on them, variously repurposable to compiled indices
> for usual "instant gratification" types.
>
>

So, if Luhn kind of started "automatic content analysis",
then I wonder after, "standardized content analysis",
and there is some, from the Europen Union as you might
imagine, those great croons to harmonisation.

https://ecrea.eu/page-18206/12952085

Then it seems there are notions of "content analysis",
where here concept of "SFF" is "content summary
statistics, in fungible composable data structures
with embedded attributes", then that "content
analysis" after that is subjective, for each of
various objectives.

So, first it seems presence indicators, where,
the granularity here is basically the document,
or that each post is a document, then with
regards to internally within document,
contexts in those.

"Contexts their content", then, basically gets
into surfacing document ID's as attributes,
then as with regards to threads and so on,
that those are larger documents, groups,
and so on, those be related and associated,
about structural attributes, then as with
regards to quantitative attributes, then
as with regards to qualitative attributes.

Y. Zhang's "Qualitative Analysis of Content",
"cited by 4352", is a nice sort of reading,
Zhang and Wildemuth 2009. https://www.ischool.utexas.edu/yanz/

"... Schamber (1991) ..."
"Theory saturation was achieved as mentions
of criteria became increasingly redundant."

https://www.csescienceeditor.org/article/working-toward-standards-for-plain-language-summaries/

So, if Luhn started, ....

https://courses.ischool.berkeley.edu/i256/f06/papers/luhn58.pdf

"Statistical information [summary] derived from
word frequency and distribution is used by the
machine to compute [...] the ''auto-abstract''."

So, significant words, in a sentence, not more than
four words away from other significant words,
indicates significance.

(via
https://blog.fastforwardlabs.com/2016/03/25/h.p.-luhn-and-the-heuristic-value-of-simplicity.html
)

"[Latent Dirichlet Allocation] borrows Luhn's basic insight ...."

(Here Dirichlet would often refer to the pigeonhole principle,
or the Dirichlet problem,
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation .
"Topic modeling is a classic solution to the problem
of information retrieval using linked data and
semantic web technology ...."
)

There's a usual idea of making a histogram of words
for any document, here with the idea for something
like ye olde Quikview, which extracts first text and
character data from any content, and maybe it source
line or addressing, then resulting a histogram, of
the words, is that this is a sort of fundamental unit
of summary, which is usually just an intermediate
result that's discarded after "greater indexing",
but here the idea is that any corpus here in BFF,
results any kind of effort resulting SFF, which is
pretty usual.

Then, for the power of words, basically is for relating
words and words, and words that are in the same
content, and variously their meanings, and then
figuring out words that are just meaningless phrases
or style, and meaningful phrases or compounds,
these are the kinds of things, about, relating documents,
and their topics, according to the words, in the content.

There is a usual notion of inversion, well of course there
are lots, yet here one idea is that sometimes somebody
says a word once and that's its great significance,
otherwise someone uses a word all the time and loses it,
about here these kinds of things, and the work that
goes into computing for both ways, so that either
the sorts patterns have enough summary data,
to result valid summaries in either the inversions.

"Summary" here of course is both the quantitative
about statistical summaries of statistics, those being
statistics, and qualitative as about "terms that relate".

Most search engines are "search and hide engines",
here that's left off as this is "SFF raw" as it were.

https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Term frequency / Inverse Document Frequency

That seems pretty interesting. I haven't been
studying my information theory for a while,
after signals theory and Wolfowitz and Goldman
and so on in information theory.

So, it's pretty clear that the document summary
begins with header attributes then gets into the
content, for the granularity of the locality of
attaching summary and SFF to the location of
a document, so that ranging over the corpus
is the natural operation of ranging over the
content as its derivative data in this write-once-read-many
approach with the discardable what is derivable.

The histogram then, or, for closed categories
it's just the counts by the categories, "counts-map",
then for words, here is that words establish their
identity by their content, there isn't yet any notion
of attaching words or hittables to languages or
dictionaries, though it's about the most usual
thing that the documents build their languages
and as with regards to the most usual and immediate
definitions being associations in the texts, themselves,
and according to the inference of availability in time,
that definition evolves over time, indicated by
introduction, then about how to basically work
up for natural language parsing, terms that are
introduced variously, how to result them definition.

Then the source, or its link, with the same concept
as parsing any kind of source language and that
as character data it's got a line and character number,
in regions inside the source, usually linear, is here
for quite simple documents with a representation
as lines of text, vis-a-vis, semantic or graphical placement.

Click here to read the complete article

Well, for extraction and segmentation, there's
what's involved is a model of messages and
then as of a sort of model of MIME, with
regards to "access-patternry", then for
extraction and characterization and
segmentation and ellision, these kinds of
things what result the things.

Extraction is sort of after messages attributes
or the headers, then the content encoding and
such, then as with regards to then embedding
of documents in otherwise the document.

Characterization here really reflects on character
encodings, with the idea that a corpus of words
has a range of an alphabet and that these days
of all the code pages and glyph-maps of the world,
what it reflects that members of alphabets indicate
for any given textual representation as character data,
that it matches the respective code-pages or planes
or regions of the Unicode, these days, with respect
to legacy encodings and such.

So, for extraction and characterization, then gets
into quite usual patterns of language, with things
like punctuation and syntax, bracketing and groupings,
commas and joiners and separators, the parenthetical,
comments, quoting, and these kinds of things, in
quite most all usual languages.

For message formats and MIME, then, and content-encoding
then extraction, in characterization after alphabet and
punctuation, then gets pretty directly into the lexical,
syntax, and grammar, with regards to texts.

"Theory saturation ...."

On 02/18/2024 12:14 PM, Ross Finlayson wrote:
> On 02/17/2024 11:38 AM, Ross Finlayson wrote:
>> "Search", then, here the idea is to facilitate search, variously.
>>
>> SEARCH: it's an HTTP verb, with an indicate request body.
>> What are its semantics? It's undefined, just a request/response
>> with a request body.
>>
>> SEARCH: it's an IMAP command.
>>
>> WILDMAT: sometimes "find" is exactly the command that's
>> running on file systems, and its predicates are similar with
>> WILDMAT, as with regards to match/dont/match/dont/...,
>> about what is "accepter/rejector networks", for the usual
>> notions of formal automata of the accepter and rejector,
>> and the binary propositions what result match/dont,
>> with regards usually to the relation called "match".
>>
>> After BFF a sort of "a normative file format with the
>> properties of being concatenable resulting set-like
>> semantics", is the idea that "SFF" or "search file format"
>> is for _summaries_ and _digests_ and _intermediate_
>> forms, what result data that's otherwise derived from
>> "the data", derived on demand or cached opportunistically,
>> about the language of "Information Retrieval", after the
>> language of "summary" and "digest".
>>
>> The word "summary" basically reflects on statistics,
>> that a "summary statistic" in otherwise the memoryless,
>> like a mean, is for histograms and for "match", about
>> making what is summary data.
>>
>> For some people the search corpus is indices, for
>> something like the open-source search engines,
>> which are just runtimes that have usual sorts
>> binary data structures for log N lookups,
>> here though the idea is a general form as for
>> "summary", that is tractable as files, then what
>> can be purposed to being inputs to usual sorts
>> "key-value" or "content", "hits", in documents.
>>
>> For some people the search corpus is the fully-normalized
>> database, then all sorts usual queries what result
>> denormalized data and summaries and the hierarchical
>> and these kinds things.
>>
>> So, here the sort of approach is for the "Library/Museum",
>> about the "Browse, Exhibits, Tours, Carrels", that search
>> and summary and digest and report is a lot of different
>> things, with the idea that "SFF" files, generally, make it
>> sensible, fungible, and tractable, how to deal with all this.
>>
>> It's not really part of "NNTP, IMAP, HTTP", yet at the same
>> time, it's a very generic sort of thing, here with the idea
>> that by designing some reference algorithms that result
>> making partially digested summary with context,
>> those just being concatenable, then that the usual
>> idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
>> that being about same as Wildmat, for variously attributes
>> and content, and the relations in documents and among them,
>> gets into these ideas about how tooling generally results,
>> making for files what then have simple algorithms that
>> work on them, variously repurposable to compiled indices
>> for usual "instant gratification" types.
>>
>>
>
>
>
> Well, for extraction and segmentation, there's
> what's involved is a model of messages and
> then as of a sort of model of MIME, with
> regards to "access-patternry", then for
> extraction and characterization and
> segmentation and ellision, these kinds of
> things what result the things.
>
> Extraction is sort of after messages attributes
> or the headers, then the content encoding and
> such, then as with regards to then embedding
> of documents in otherwise the document.
>
> Characterization here really reflects on character
> encodings, with the idea that a corpus of words
> has a range of an alphabet and that these days
> of all the code pages and glyph-maps of the world,
> what it reflects that members of alphabets indicate
> for any given textual representation as character data,
> that it matches the respective code-pages or planes
> or regions of the Unicode, these days, with respect
> to legacy encodings and such.
>
> So, for extraction and characterization, then gets
> into quite usual patterns of language, with things
> like punctuation and syntax, bracketing and groupings,
> commas and joiners and separators, the parenthetical,
> comments, quoting, and these kinds of things, in
> quite most all usual languages.
>
> For message formats and MIME, then, and content-encoding
> then extraction, in characterization after alphabet and
> punctuation, then gets pretty directly into the lexical,
> syntax, and grammar, with regards to texts.
>
> "Theory saturation ...."
>
>

It seems like Gert Webelhuth has a good book called
"Principles and Parameters of Syntactic Saturation",
discusses linguistics pretty thoroughly.

global.oup.com/academic/product/principles-and-parameters-of-syntactic-saturation-9780195070415?cc=us&lang=en&
books.google.com/books?id=nXboTBXbhwAC

Reading about this notion of "saturation", on the one
hand it seems to indicate lack of information, on the
other hand it seems to be capricious selective ignorance.

www.tandfonline.com/doi/full/10.1080/23311886.2020.1838706
doi.org/10.1080/23311886.2020.1838706
Saturation controversy in qualitative research: Complexities and
underlying assumptions. A literature review
Favourate Y. Sebele-Mpofu

Here it's called "censoring samples", which is often enough
with respect to "outliers". Here it's also called "retro-finitist".
The author details it's a big subjective mess and from a
statistical design sort of view it's, not saying much.

Here this is starting a bit simpler with for example a sort of
goal to understand annotated and threaded plain text
conversations, in the usual sort of way of establishing
sequence, about the idea for relational algebra, to be
relating posts and conversations in threads, in groups
in time, as with regards to simple fungible BFF's, as
with regards to simple fungible SFF's, what result highly
repurposable presentation, via storage-neutral means.

It results sort of bulky to start making the in-place
summary file formats, with regards to, for example,
the resulting size of larger summaries, yet at the same
time, the extraction and segmentation, after characterization,
and ellision:

extraction: headers and body
characterization: content encoding
extraction: text extraction
segmentation: words are atoms, letters are atoms, segments are atoms
ellision: hyphen-ization, 1/*comment*/2

then has for natural sorts bracketing and grouping,
here for example as with paragraphs and itemizations,
for the plainest sort of text having default characterization.

In this context it's particularly attribution which is a content
convention, the "quoting depth" character, for example,
in a world of spaces and tabs, with regards to enumerating
branches, what result relations what are to summarize
together, and apart. I.e. there's a notion with the document,
that often enough the posts bring their own context,
for being self-contained, in the threaded organization,
how to best guess attribution, given good faith attribution,
in the most usual sorts of contexts, of plain text extraction.

Then, SEARCH here is basically that "search finds hits",
or what matches, according to WILDMAT and IMAP SEARCH
and variously Yes/No/Maybe as a sort of WILDMAT search,
then for _where_ it finds hits, here in the groups', the threads',
the authors', and the dates', for browsing into those variously.

That speaks to a usual form of relation for navigation,

group -> threads
thread -> authors
author -> threads
date -> threads

and these kinds of things, about the many relations that
in summary are all derivable from the above described BFF
files, which are plain messages files with dates linked in from
the side, threading indicated in the message files, and authors
linked out from the messages.

I.e., here the idea then for content, is that, specific mentions
of technical words, basically relate to "tag cloud", about
finding related messages, authors, threads, groups,
among the things.

About a "dedicated little OS" to run a "dedicated little service".

"Critix"

1) some boot code
power on self test, EFI/UEFI, certificates and boot, boot

2) a virt model / a machine model
maybe running in a virt
maybe running on metal

3) a process/scheduler model
it's processes, a process model
goal is, "some of POSIX"

Resources

Drivers

RAM
Bus
USB, ... serial/parallel, device connections, ....
DMA
framebuffer
audio dac/adc

Disk

hard
memory
network

identity
resources

Networking

TCP/IP stack
UDP, ...
SCTP, ...
raw, ...

naming

Windowing

"video memory and what follows SVGA"
"Java, a plain windowing VM"

PCI <-> PCIe

USB 1/2 USB 3/4

MMU <-> DMA

Serial ATA

NIC / IEEE 802

"EFI system partition"

virtualization model
emulator

clock-accurate / bit-accurate
clock-inaccurate / voltage

mainboard / motherboard
circuit summary

emulator environment

CPU
main memory
host adapters

PU's
bus

I^2C

clock model / timing model
interconnect model / flow model
insertion model / removal model
instruction model

Alright then, about the SFF, "summary" file-format,
"sorted" file-format, "search" file-format, the idea
here is to figure out normal forms of summary,
that go with the posts, with the idea that "a post's
directory is on the order of contained size of the
size of the post", while, "a post's directory is on
a constant order of entries", here is for sort of
summarizing what a post's directory looks like
in "well-formed BFF", then as with regards to
things like Intermediate file-formats as mentioned
above here with the goal of "very-weakly-encrypted
at rest as constant contents", then here for
"SFF files, either in the post's-directory or
on the side, and about how links to them get
collected to directories in a filesystem structure
for the conventions of the concatenation of files".

So, here the idea so far is that BFF has a normative
form for each post, which has a particular opaque
globally-universal unique identifier, the Message-ID,
then that the directory looks like MessageId/ then its
contents were as these files.

id hd bd yd td rd ad dd ud xd
id, header, body, year-to-date, thread, referenced, authored, dead,
undead, expired

or just files named

i h b y t r a d u x

which according to the presence of the files and
their contents, indicate that the presence of the
MessageId/ directory indicates the presence of
a well-formed message, contingent not being expired.

.... Where hd bd are the message split into its parts,
with regards to the composition of messages by
concatenating those back together with the computed
message numbers and this kind of thing, with regards to
the site, and the idea that they're stored at-rest pre-compressed,
then knowledge of the compression algorithm makes for
concatenating them in message-composition as compressed.

Then, there are variously already relations of the
posts, according to groups, then here as above that
there's perceived required for date, and author.
I.e. these are files on the order the counts of posts,
or span in time, or count of authors.

(About threading and relating posts, is the idea of
matching subjects not-so-much but employing the
References header, then as with regards to IMAP and
parity as for IMAP's THREADS extension, ...,
www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
There's a usual sort of notion that sorted, threaded
enumeration is either in date order or thread-tree
traversal order, usually more sensibly date order,
with regards to breaking out sub-threads, variously.
"It's all one thread." IMAP: "there is an implicit sort
criterion of sequence number".)

Then, similarly is for defining models for the sort, summary,
search, SFF, that it sort of (ha) rather begins with sort,
about the idea that it's sort of expected that there will
be a date order partition either as symlinks or as an index file,
or as with regards to that messages date is also stored in
the yd file, then as with regards to "no file-times can be
assumed or reliable", with regards to "there's exactly one
file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
kinds of things. There's a real goal that it works easy
with shell built-ins and text-utils, or "command line",
to work with the files.

So, sort pretty well goes with filtering.
If you're familiar with the context, of, "data tables",
with a filter-predicate and a sort-predicate,
they're different things but then go together.
It's figured that they get front-ended according
to the quite most usual "column model" of the
"table model" then "yes/no/maybe" row filtering
and "multi-sort" row sorting. (In relational algebra, ...,
or as rather with 'relational algebra with rows and nulls',
this most usual sort of 'composable filtering' and 'multi-sort').

Then in IMAP, the THREAD command is "a variant of
SEARCH with threading semantics for the results".
This is where both posts and emails work off the
References header, but it looks like in the wild there
is something like "a vendor does poor-man's subject
threading for you and stuffs in a X-References",
this kind of thing, here with regards to that
instead of concatenation, is that intermediate
results get sorted and threaded together,
then those, get interleaved and stably sorted
together, that being sort of the idea, with regards
to search results in or among threads.

(Cf www.jwz.org/doc/threading.html as
via www.rfc-editor.org/rfc/rfc5256.html ,
with regards to In-Reply-To and References.
There are some interesting articles there
about "mailbox summarization".)

About the summary of posts, one way to start
as for example an interesting article about mailbox
summarization gets into, is, all the necessary text-encodings
to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
or CP-1252, in the base of BE or LE BOMs, or anything to
do with summarizing the character data, of any of the
headers, or the body of the text, figuring of course
that everything's delivered as it arrives, as with regards
to the opacity usually of everything vis-a-vis its inspection.

This could be a normative sort of file that goes in the messageId/
folder.

cd: character-data, a summary of whatever form of character
encoding or requirements of unfolding or unquoting or in
the headers or the body or anywhere involved indicating
a stamp indicating each of the encodings or character sets.

Then, the idea is that it's a pretty deep inspection to
figure out how the various attributes, what are their
encodings, and the body, and the contents, with regards
to a sort of, "a normalized string indicating the necessary
character encodings necessary to extract attributes and
given attributes and the body and given sections", for such
matters of indicating the needful for things like sort,
and collation, in internationalization and localization,
aka i18n and l10n. (Given that the messages are stored
as they arrived and undisturbed.)

The idea is that "the cd file doesn't exist for messages
in plain ASCII7, but for anything anywhere else, breaks
out what results how to get it out". This is where text
is often in a sort of format like this.

Ascii
it's keyboard characters
ISO8859-1/ISO8859-15/CP-1252
it's Latin1 often though with the Windows guys
Sideout
it's Ascii with 0-127 gigglies or upper glyphs
Wideout
it's 0-256 with any 256 wide characters in upper Unicode planes
Unicode
it's Unicode

Then there are all sorts of encodings, this is according to
the rules of Messages with regards to header and body
and content and transfer-encoding and all these sorts
things, it's Unicode.

Then, another thing to get figured out is lengths,
the size of contents or counts or lengths, figuring
that it's a great boon to message-composition to
allocate exactly what it needs for when, as a sum
of invariant lengths.

Then the MessageId/ files still has un-used 'l' and 's',
then though that 'l' looks too close to '1', here it's
sort of unambiguous.

ld: lengthed, the coded and uncoded lengths of attributes and parts

The idea here is to make it easiest for something like
"consult the lengths and allocate it raw, concatenate
the message into it, consult the lengths and allocate
it uncoded, uncode the message into it".

So, getting into the SFF, is that basically
"BFF indicates well-formed messages or their expiry",
"SFF is derived via a common algorithm for all messages",
and "some SFF lives next to BFF and is also write-once-read-many",
vis-a-vis that "generally SFF is discardable because it's derivable".

Then, it seems that cd and ld should be part of the BFF,
the backing file-format, or as so generated on demand,
that with regards to the structural content of the messages,
and the composition of the wire forms of the messages,
they're intermediate values which indicate sort of a validation.
Of course they'd have to be validated in a sense, for the idea
that otherwise routine can rely on them.

Here for the character determination, is basically for a
specification, after validation, of text encodings, what's
to result, that such a specification starts in "closed categories",
as with regards to the names of things or a registry of them,
associated with specific normative algorithms,
that result a common text encoding.

So, here cd starts with, "7-bit clean ASCII". Then as above
there are the most usual character sets involved, as what
these days fall into Unicode, with respect to all the character
encodings in the world, and their normalized names and
glyphs and codes as these days fall into the great effort
what is, "Unicode", and the ubiquitous encoding, UTF-8,
for UCS-2, or UTF-16 or UTF-32 and other such notions,
and their variants sometimes when UTF-8 for example
in some settings has an encoding, here that it's mostly
entirely tractable everywhere, "printable ASCII" or
"UTF-8, excluding non-printable characters".

So, the idea for the contents of the specification,
gets into here dealing with messages. The messages
have headers, they have bodies, there are overall
or default or implicit or specific or self-declaring
sorts textual data, the code-pages, the representations,
the encodings, and the forms. This is all called, "textual",
data.

Then here the usual idea for messages, is that, while
Usenet messages are particularly simple, with regards
to Email messages, or the usual serialization of HTTP messages,
it's a header with a multi-set of attributes and a body,
the interpretation as by the relevant content headers
or defaultly or implicitly, with respect to the system encoding
and locale, and other usual expectations of defaults,
vis-a-vis, explicits.

So, the idea of BFF's cd, is to be a specification, of
all the normative character encodings' textual,
for a given edition or revision of all the character
encodings, here as simplified being "Internet Messages".
This is associated with the headers, overall, the headers,
apiece, or their segmented values, apiece, the body,
overall, the parts of the body, apiece, or their segment
values, apiece, and the message, altogether.

Then, the lengths, or BFF's ld, is also after following
a particular normative reading of "the bytes" or "the wire",
and "the characters" and "in their character encoding",
and it must be valid, to be reliable to allocate the buffer
for the wire data, filling the buffer exactly, according
to the lengths, the sizes. The mal-formed or the ambiguous
or the mistaken or any ways otherwise the invalid, is
basically that for the summary to follow, that the contents
of otherwise the opaque at-rest transport format,
get the extraction to result the attributes, in scalars,
the values, for locale and collation.

Then, I know quite well all the standards of the textual,
now to learn enough about the Internet Message,
for Email and Usenet and MIME and HTTP's usual,
for example like "Usenet messages end on the wire
with a dot that's in an escapement otherwise, erm",
these kinds of things, resulting for this sort of BFF
message format, though it does give an entire directory
on the file system to each message in the representation,
with a write-once-read-many expectation as is pretty usual,
and soft-delete, and for operations message-wise,
here is getting into the particulars of "cd" and "ld",
these data derived from the Message, what results
a usual means for the validity and the transparency
of the textual in the content of the message.

This is of course, "Meta", to sci.math, and
humor is irrelevant to sci.math, but it's an
exercise in the study of Internet Protocols.

IETF RFC

NNTP

3977 https://datatracker.ietf.org/doc/html/rfc3977
8054 https://www.rfc-editor.org/rfc/rfc8054

SMTP

5321 https://datatracker.ietf.org/doc/html/rfc5321
2821 https://www.ietf.org/rfc/rfc2821.txt
2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
Format

IMAP

3501 https://datatracker.ietf.org/doc/html/rfc3501
2683 https://datatracker.ietf.org/doc/html/rfc2683
4978 https://datatracker.ietf.org/doc/html/rfc4978
3516 https://datatracker.ietf.org/doc/html/rfc3516

POP3

1725 https://www.ietf.org/rfc/rfc1939.txt

MIME

2045 https://datatracker.ietf.org/doc/html/rfc2045
2049 https://datatracker.ietf.org/doc/html/rfc2049
2046 https://datatracker.ietf.org/doc/html/rfc2046

DEFLATE

1950 https://datatracker.ietf.org/doc/html/rfc1950
1951 https://datatracker.ietf.org/doc/html/rfc1951

HTTP

7231 https://datatracker.ietf.org/doc/html/rfc7231
7230 https://datatracker.ietf.org/doc/html/rfc7230

"dot-stuffing":

https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2

If posting is permitted, the article MUST be in the format specified
in Section 3.6 and MUST be sent by the client to the server as a
multi-line data block (see Section 3.1.1). Thus a single dot (".")
on a line indicates the end of the text, and lines starting with a
dot in the original text have that dot doubled during transmission.

https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2

If transmission of the article is requested, the client MUST send the
entire article, including headers and body, to the server as a
multi-line data block (see Section 3.1.1). Thus, a single dot (".")
on a line indicates the end of the text, and lines starting with a
dot in the original text have that dot doubled during transmission.

Well I was under the impression that there was something of
the dynamic in the headers, vis-a-vis the body, and that often
enough it's always ARTICLE not HEAD, BODY, or STAT, why with
regards to having hd and bd being separate files, is a thing.
Still though it can be nice to have them separate.

Then, for the message content at rest, there's "dot-stuffing",
this is basically an artifact of "dot alone on a line ends a post,
in a terminal window telnet'ed to an NNTP server", here with
regards to that POST and IHAVE and so on are supposed to deliver
it, and it's supposed to be returned both as part of the end of
the ARTICLE and also BODY but also HEAD, but it's supposed to
not be counted in :bytes, while though the spec says not to rely
on "bytes" because for example it's not ignored.

I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
message store, here is for studying SMTP and seeing what Email
says about it.

SMTP: SMTP indicates the end of the mail data by sending a
line containing only a "." (period or full stop). A transparency
procedure is used to prevent this from interfering with the user's
text (see section 4.5.2).

- Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.

- When a line of mail text is received by the SMTP server, it checks
the line. If the line is composed of a single period, it is
treated as the end of mail indicator. If the first character is a
period and there are other characters on the line, the first
character is deleted.

So here it's like dot-stuffing in NNTP, is sort of different than
dot-stuffing in SMTP, with regards to that I want the data to
be a constant at rest, then here about though then there's
also for having a text edition at rest, i.e. that "uncompressed"
makes for that it's the same for any kind of messages, vis-a-vis
the "end of data" or "dot-stuffing", ....

POP3: When all lines of the response have been sent, a
final line is sent, consisting of a termination octet (decimal code
046, ".") and a CRLF pair. If any line of the multi-line response
begins with the termination octet, the line is "byte-stuffed" by
pre-pending the termination octet to that line of the response.
Hence a multi-line response is terminated with the five octets
"CRLF.CRLF".

POP3 RETR: "After the initial +OK, the
POP3 server sends the message corresponding to the given
message-number, being careful to byte-stuff the termination
character (as with all multi-line responses)."

I don't mind just concatenating the termination sequence at
the end, it's a constant of fixed size, but I want the content
to be un-stuffed at rest, ....

"In order to simplify parsing, all POP3 servers are
required to use a certain format for scan listings. A
scan listing consists of the message-number of the
message, followed by a single space and the exact size of
the message in octets. Methods for calculating the exact
size of the message are described in the "Message Format"
section below. "

https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
"Lines in a message MUST be a maximum of 998 characters
excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
characters excluding the CRLF."

Hmm..., what I'm trying to figure out is how to store the data
at rest, in its pieces, that just concatenate back together to
form message composition, here variously that parts are
compressible or already compressed, and about the uncompressed,
whether to have dot-stuffing in the compressed and not-dot-stuffing
in the otherwise plain-text at rest, with regards to Usenet and Email
messages, and other usual bodies like HTTP with respect to MIME
and MIME multipart and so on. This is where there's something
like "oh about three and a half terabytes, uncompressed, a copy
of text Usenet", and figuring out how to have it so that it all fits
exploded all out on a modern filesystem, in this write-once-read-many
approach, (or, often enough, write-once-read-never), and that
ingesting the data is expeditious and it's very normative and tractable
at rest.

It gets into ideas like this, "name the files that are fragments of
deflate/gzip to something like h7/b7, where 7 is almost Z",
and "build the Huffman tables over sort of the whole world
as it's figured that they're sort of constant over time, for lots
of repeated constants in the headers", this kind of thing.
Mostly though it's the idea of having the file fragments
being concatenable with some reference files to stream them.

Then, as this is sort of an aside from the cd and ld, the
characters and lengths, of the summary metadata, as well
is about the extraction of the data, vis-a-vis the data at rest.
The idea is that whole extraction is "stream a concatenation
of the data at rest", while there's usually for overview and
search to be extracting attributes' values and resulting those
populate overviews, or for example renditions of threads,
and about the idea here of basically having NNTP, and then
IMAP sitting in front of that, and then also HTTP variously
in front of that, with that NNTP and IMAP and HTTP have
a very high affinity with respect to the usual operation of
their protocols, and also the content, here then with regards
to MIME, and for "MIME at rest", and this kind of thing.

One thing about summary, then is about, that's there's
derived data what is to make for extraction and summary,
sort, and search, then about access, which gets into values
that stored as files, are not write-once-read-many. Then,
whether to have this in the same directory as MessageId,
or to have the volatiles as they are, gets into the write-once-read-many
and about object stores and this kind of thing, with regards
to atomicity and changes, and this kind of thing. Basically
the idea for access is that that's IMAP and the status of
messages apiece for the login, for example, and then
hit counters, here with head-hits and body-hits for article-hits,
to help get an idea of hits to help establish relevance
of articles by accesses or hits, views. This would feed back
into the NOOBNB idea, with regards to figuring out views,
and some way to indicate like by viewing a related item,
to validate a view, this kind of thing.

It's sort of figured that the author-article pair is the
datum, then for those to get aggregated, with respect
to calling the login an author, here that all logins are
authors. Basically the idea with that is that the client
requesting the article would make it so, then for things
like "the IMAP fronting the NNTP and delegating the
author on down into the NNTP", and these kinds of things.

For MIME the idea seems to actually be to break the
parts on out into files into a subdirectory, that something
like "bm" indicates "body-MIME", then that MIME bodies
have a natural enough filesystem-representation,
where it results a good idea to make their transfer
and content encoding for the various transfer and
content encodings, and for delivering parts, ....
Then the usual idea of the MIME body as the
single-part MIME object, binary, basically is
for blobs, ..., then as with regards to those prepared
also "b7-at-rest" for delivering any kind of object,
here with its routing as a message besides as just
a usual kind of object-store.

Click here to read the complete article

On 02/24/2024 10:09 AM, Ross Finlayson wrote:
> IETF RFC
>
> NNTP
>
> 3977 https://datatracker.ietf.org/doc/html/rfc3977
> 8054 https://www.rfc-editor.org/rfc/rfc8054
>
> SMTP
>
> 5321 https://datatracker.ietf.org/doc/html/rfc5321
> 2821 https://www.ietf.org/rfc/rfc2821.txt
> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
> Format
>
> IMAP
>
> 3501 https://datatracker.ietf.org/doc/html/rfc3501
> 2683 https://datatracker.ietf.org/doc/html/rfc2683
> 4978 https://datatracker.ietf.org/doc/html/rfc4978
> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>
> POP3
>
> 1725 https://www.ietf.org/rfc/rfc1939.txt
>
>
> MIME
>
> 2045 https://datatracker.ietf.org/doc/html/rfc2045
> 2049 https://datatracker.ietf.org/doc/html/rfc2049
> 2046 https://datatracker.ietf.org/doc/html/rfc2046
>
> DEFLATE
>
> 1950 https://datatracker.ietf.org/doc/html/rfc1950
> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>
> HTTP
>
> 7231 https://datatracker.ietf.org/doc/html/rfc7231
> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>
> "dot-stuffing":
>
> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2
>
>
> If posting is permitted, the article MUST be in the format specified
> in Section 3.6 and MUST be sent by the client to the server as a
> multi-line data block (see Section 3.1.1). Thus a single dot (".")
> on a line indicates the end of the text, and lines starting with a
> dot in the original text have that dot doubled during transmission.
>
> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2
>
> If transmission of the article is requested, the client MUST send the
> entire article, including headers and body, to the server as a
> multi-line data block (see Section 3.1.1). Thus, a single dot (".")
> on a line indicates the end of the text, and lines starting with a
> dot in the original text have that dot doubled during transmission.
>
>
>
> Well I was under the impression that there was something of
> the dynamic in the headers, vis-a-vis the body, and that often
> enough it's always ARTICLE not HEAD, BODY, or STAT, why with
> regards to having hd and bd being separate files, is a thing.
> Still though it can be nice to have them separate.
>
> Then, for the message content at rest, there's "dot-stuffing",
> this is basically an artifact of "dot alone on a line ends a post,
> in a terminal window telnet'ed to an NNTP server", here with
> regards to that POST and IHAVE and so on are supposed to deliver
> it, and it's supposed to be returned both as part of the end of
> the ARTICLE and also BODY but also HEAD, but it's supposed to
> not be counted in :bytes, while though the spec says not to rely
> on "bytes" because for example it's not ignored.
>
> I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
> message store, here is for studying SMTP and seeing what Email
> says about it.
>
> SMTP: SMTP indicates the end of the mail data by sending a
> line containing only a "." (period or full stop). A transparency
> procedure is used to prevent this from interfering with the user's
> text (see section 4.5.2).
>
> - Before sending a line of mail text, the SMTP client checks the
> first character of the line. If it is a period, one additional
> period is inserted at the beginning of the line.
>
> - When a line of mail text is received by the SMTP server, it checks
> the line. If the line is composed of a single period, it is
> treated as the end of mail indicator. If the first character is a
> period and there are other characters on the line, the first
> character is deleted.
>
>
>
> So here it's like dot-stuffing in NNTP, is sort of different than
> dot-stuffing in SMTP, with regards to that I want the data to
> be a constant at rest, then here about though then there's
> also for having a text edition at rest, i.e. that "uncompressed"
> makes for that it's the same for any kind of messages, vis-a-vis
> the "end of data" or "dot-stuffing", ....
>
>
> POP3: When all lines of the response have been sent, a
> final line is sent, consisting of a termination octet (decimal code
> 046, ".") and a CRLF pair. If any line of the multi-line response
> begins with the termination octet, the line is "byte-stuffed" by
> pre-pending the termination octet to that line of the response.
> Hence a multi-line response is terminated with the five octets
> "CRLF.CRLF".
>
> POP3 RETR: "After the initial +OK, the
> POP3 server sends the message corresponding to the given
> message-number, being careful to byte-stuff the termination
> character (as with all multi-line responses)."
>
> I don't mind just concatenating the termination sequence at
> the end, it's a constant of fixed size, but I want the content
> to be un-stuffed at rest, ....
>
> "In order to simplify parsing, all POP3 servers are
> required to use a certain format for scan listings. A
> scan listing consists of the message-number of the
> message, followed by a single space and the exact size of
> the message in octets. Methods for calculating the exact
> size of the message are described in the "Message Format"
> section below. "
>
> https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
> "Lines in a message MUST be a maximum of 998 characters
> excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
> characters excluding the CRLF."
>
>
> Hmm..., what I'm trying to figure out is how to store the data
> at rest, in its pieces, that just concatenate back together to
> form message composition, here variously that parts are
> compressible or already compressed, and about the uncompressed,
> whether to have dot-stuffing in the compressed and not-dot-stuffing
> in the otherwise plain-text at rest, with regards to Usenet and Email
> messages, and other usual bodies like HTTP with respect to MIME
> and MIME multipart and so on. This is where there's something
> like "oh about three and a half terabytes, uncompressed, a copy
> of text Usenet", and figuring out how to have it so that it all fits
> exploded all out on a modern filesystem, in this write-once-read-many
> approach, (or, often enough, write-once-read-never), and that
> ingesting the data is expeditious and it's very normative and tractable
> at rest.
>
> It gets into ideas like this, "name the files that are fragments of
> deflate/gzip to something like h7/b7, where 7 is almost Z",
> and "build the Huffman tables over sort of the whole world
> as it's figured that they're sort of constant over time, for lots
> of repeated constants in the headers", this kind of thing.
> Mostly though it's the idea of having the file fragments
> being concatenable with some reference files to stream them.
>
> Then, as this is sort of an aside from the cd and ld, the
> characters and lengths, of the summary metadata, as well
> is about the extraction of the data, vis-a-vis the data at rest.
> The idea is that whole extraction is "stream a concatenation
> of the data at rest", while there's usually for overview and
> search to be extracting attributes' values and resulting those
> populate overviews, or for example renditions of threads,
> and about the idea here of basically having NNTP, and then
> IMAP sitting in front of that, and then also HTTP variously
> in front of that, with that NNTP and IMAP and HTTP have
> a very high affinity with respect to the usual operation of
> their protocols, and also the content, here then with regards
> to MIME, and for "MIME at rest", and this kind of thing.
>
>
>
>
> One thing about summary, then is about, that's there's
> derived data what is to make for extraction and summary,
> sort, and search, then about access, which gets into values
> that stored as files, are not write-once-read-many. Then,
> whether to have this in the same directory as MessageId,
> or to have the volatiles as they are, gets into the write-once-read-many
> and about object stores and this kind of thing, with regards
> to atomicity and changes, and this kind of thing. Basically
> the idea for access is that that's IMAP and the status of
> messages apiece for the login, for example, and then
> hit counters, here with head-hits and body-hits for article-hits,
> to help get an idea of hits to help establish relevance
> of articles by accesses or hits, views. This would feed back
> into the NOOBNB idea, with regards to figuring out views,
> and some way to indicate like by viewing a related item,
> to validate a view, this kind of thing.
>
> It's sort of figured that the author-article pair is the
> datum, then for those to get aggregated, with respect
> to calling the login an author, here that all logins are
> authors. Basically the idea with that is that the client
> requesting the article would make it so, then for things
> like "the IMAP fronting the NNTP and delegating the
> author on down into the NNTP", and these kinds of things.
>
>
> For MIME the idea seems to actually be to break the
> parts on out into files into a subdirectory, that something
> like "bm" indicates "body-MIME", then that MIME bodies
> have a natural enough filesystem-representation,
> where it results a good idea to make their transfer
> and content encoding for the various transfer and
> content encodings, and for delivering parts, ....
> Then the usual idea of the MIME body as the
> single-part MIME object, binary, basically is
> for blobs, ..., then as with regards to those prepared
> also "b7-at-rest" for delivering any kind of object,
> here with its routing as a message besides as just
> a usual kind of object-store.
>
>
> https://datatracker.ietf.org/doc/html/rfc2046#section-5
>
>
> The idea here is that it's great that messages, usually,
> can just be considered exactly as they arrive, the
> ingestion having added a Path element, say,
> serialized and stored as they arrived from the wire,
> and retreived and returned as back to it. Then,
> messages in various structures, eventually have
> parts and entities and messages in them and
> transfer and content encodings that were applied
> and data that is or isn't compressible and will or won't
> by served as textual or as binary, or as reference, in
> getting into the linked-content and "Content-ID",
> the idea that large blobs of data are also aside.
>
> Then, this idea is to store the entities and parts
> and contained messages and blobs, at rest, as
> where their content encoding and transfer encoding,
> make for the repurposable and constant representations
> at-rest, then that when it result either extraction, or,
> retrieval, that the point here is that extraction is
> "inside the envelope", then with the idea that
> message-composition, should have it so that
> largely the server just spews retrievals as
> concatenating the parts at rest, or putting them
> in content and transfer encodings, with regards
> to eventually the transfer encoding, then the compression
> layer as here is pretty usual, then the encryption and
> compression layers on out, the idea being to make
> those modular, factorizable, in terms of message-composition,
> that it gets pretty involved yet then results handling
> any kinds of Internet message content like this at all.
>
>
> Hmm, ..., "quoted-printable".
>
> https://datatracker.ietf.org/doc/html/rfc2049#section-4
>
> "he process of composing a MIME entity can be modeled as being done
> in a number of steps. Note that these steps are roughly similar to
> those steps used in PEM [RFC-1421] ..."
>
> (PEM, "Privacy Enhanced Mail", ....)
>
>
> So, it's being kind of sorted out mostly how to get
> the messages flowing pass-through, as much as possible,
> this still being the BFF, with regards then to extraction,
> and use cases for SFF.
>
>
> About "the three and a half terabytes uncompressed
> the Usenet archive", ....
>
>
>

Click here to read the complete article

On 02/25/2024 09:25 PM, Ross Finlayson wrote:
> On 02/24/2024 10:09 AM, Ross Finlayson wrote:
>> IETF RFC
>>
>> NNTP
>>
>> 3977 https://datatracker.ietf.org/doc/html/rfc3977
>> 8054 https://www.rfc-editor.org/rfc/rfc8054
>>
>> SMTP
>>
>> 5321 https://datatracker.ietf.org/doc/html/rfc5321
>> 2821 https://www.ietf.org/rfc/rfc2821.txt
>> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
>> Format
>>
>> IMAP
>>
>> 3501 https://datatracker.ietf.org/doc/html/rfc3501
>> 2683 https://datatracker.ietf.org/doc/html/rfc2683
>> 4978 https://datatracker.ietf.org/doc/html/rfc4978
>> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>>
>> POP3
>>
>> 1725 https://www.ietf.org/rfc/rfc1939.txt
>>
>>
>> MIME
>>
>> 2045 https://datatracker.ietf.org/doc/html/rfc2045
>> 2049 https://datatracker.ietf.org/doc/html/rfc2049
>> 2046 https://datatracker.ietf.org/doc/html/rfc2046
>>
>> DEFLATE
>>
>> 1950 https://datatracker.ietf.org/doc/html/rfc1950
>> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>>
>> HTTP
>>
>> 7231 https://datatracker.ietf.org/doc/html/rfc7231
>> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>>
>> "dot-stuffing":
>>
>> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2
>>
>>
>> If posting is permitted, the article MUST be in the format specified
>> in Section 3.6 and MUST be sent by the client to the server as a
>> multi-line data block (see Section 3.1.1). Thus a single dot (".")
>> on a line indicates the end of the text, and lines starting with a
>> dot in the original text have that dot doubled during transmission.
>>
>> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2
>>
>> If transmission of the article is requested, the client MUST send the
>> entire article, including headers and body, to the server as a
>> multi-line data block (see Section 3.1.1). Thus, a single dot (".")
>> on a line indicates the end of the text, and lines starting with a
>> dot in the original text have that dot doubled during transmission.
>>
>>
>>
>> Well I was under the impression that there was something of
>> the dynamic in the headers, vis-a-vis the body, and that often
>> enough it's always ARTICLE not HEAD, BODY, or STAT, why with
>> regards to having hd and bd being separate files, is a thing.
>> Still though it can be nice to have them separate.
>>
>> Then, for the message content at rest, there's "dot-stuffing",
>> this is basically an artifact of "dot alone on a line ends a post,
>> in a terminal window telnet'ed to an NNTP server", here with
>> regards to that POST and IHAVE and so on are supposed to deliver
>> it, and it's supposed to be returned both as part of the end of
>> the ARTICLE and also BODY but also HEAD, but it's supposed to
>> not be counted in :bytes, while though the spec says not to rely
>> on "bytes" because for example it's not ignored.
>>
>> I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
>> message store, here is for studying SMTP and seeing what Email
>> says about it.
>>
>> SMTP: SMTP indicates the end of the mail data by sending a
>> line containing only a "." (period or full stop). A transparency
>> procedure is used to prevent this from interfering with the user's
>> text (see section 4.5.2).
>>
>> - Before sending a line of mail text, the SMTP client checks the
>> first character of the line. If it is a period, one additional
>> period is inserted at the beginning of the line.
>>
>> - When a line of mail text is received by the SMTP server, it checks
>> the line. If the line is composed of a single period, it is
>> treated as the end of mail indicator. If the first character is a
>> period and there are other characters on the line, the first
>> character is deleted.
>>
>>
>>
>> So here it's like dot-stuffing in NNTP, is sort of different than
>> dot-stuffing in SMTP, with regards to that I want the data to
>> be a constant at rest, then here about though then there's
>> also for having a text edition at rest, i.e. that "uncompressed"
>> makes for that it's the same for any kind of messages, vis-a-vis
>> the "end of data" or "dot-stuffing", ....
>>
>>
>> POP3: When all lines of the response have been sent, a
>> final line is sent, consisting of a termination octet (decimal code
>> 046, ".") and a CRLF pair. If any line of the multi-line response
>> begins with the termination octet, the line is "byte-stuffed" by
>> pre-pending the termination octet to that line of the response.
>> Hence a multi-line response is terminated with the five octets
>> "CRLF.CRLF".
>>
>> POP3 RETR: "After the initial +OK, the
>> POP3 server sends the message corresponding to the given
>> message-number, being careful to byte-stuff the termination
>> character (as with all multi-line responses)."
>>
>> I don't mind just concatenating the termination sequence at
>> the end, it's a constant of fixed size, but I want the content
>> to be un-stuffed at rest, ....
>>
>> "In order to simplify parsing, all POP3 servers are
>> required to use a certain format for scan listings. A
>> scan listing consists of the message-number of the
>> message, followed by a single space and the exact size of
>> the message in octets. Methods for calculating the exact
>> size of the message are described in the "Message Format"
>> section below. "
>>
>> https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
>> "Lines in a message MUST be a maximum of 998 characters
>> excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
>> characters excluding the CRLF."
>>
>>
>> Hmm..., what I'm trying to figure out is how to store the data
>> at rest, in its pieces, that just concatenate back together to
>> form message composition, here variously that parts are
>> compressible or already compressed, and about the uncompressed,
>> whether to have dot-stuffing in the compressed and not-dot-stuffing
>> in the otherwise plain-text at rest, with regards to Usenet and Email
>> messages, and other usual bodies like HTTP with respect to MIME
>> and MIME multipart and so on. This is where there's something
>> like "oh about three and a half terabytes, uncompressed, a copy
>> of text Usenet", and figuring out how to have it so that it all fits
>> exploded all out on a modern filesystem, in this write-once-read-many
>> approach, (or, often enough, write-once-read-never), and that
>> ingesting the data is expeditious and it's very normative and tractable
>> at rest.
>>
>> It gets into ideas like this, "name the files that are fragments of
>> deflate/gzip to something like h7/b7, where 7 is almost Z",
>> and "build the Huffman tables over sort of the whole world
>> as it's figured that they're sort of constant over time, for lots
>> of repeated constants in the headers", this kind of thing.
>> Mostly though it's the idea of having the file fragments
>> being concatenable with some reference files to stream them.
>>
>> Then, as this is sort of an aside from the cd and ld, the
>> characters and lengths, of the summary metadata, as well
>> is about the extraction of the data, vis-a-vis the data at rest.
>> The idea is that whole extraction is "stream a concatenation
>> of the data at rest", while there's usually for overview and
>> search to be extracting attributes' values and resulting those
>> populate overviews, or for example renditions of threads,
>> and about the idea here of basically having NNTP, and then
>> IMAP sitting in front of that, and then also HTTP variously
>> in front of that, with that NNTP and IMAP and HTTP have
>> a very high affinity with respect to the usual operation of
>> their protocols, and also the content, here then with regards
>> to MIME, and for "MIME at rest", and this kind of thing.
>>
>>
>>
>>
>> One thing about summary, then is about, that's there's
>> derived data what is to make for extraction and summary,
>> sort, and search, then about access, which gets into values
>> that stored as files, are not write-once-read-many. Then,
>> whether to have this in the same directory as MessageId,
>> or to have the volatiles as they are, gets into the write-once-read-many
>> and about object stores and this kind of thing, with regards
>> to atomicity and changes, and this kind of thing. Basically
>> the idea for access is that that's IMAP and the status of
>> messages apiece for the login, for example, and then
>> hit counters, here with head-hits and body-hits for article-hits,
>> to help get an idea of hits to help establish relevance
>> of articles by accesses or hits, views. This would feed back
>> into the NOOBNB idea, with regards to figuring out views,
>> and some way to indicate like by viewing a related item,
>> to validate a view, this kind of thing.
>>
>> It's sort of figured that the author-article pair is the
>> datum, then for those to get aggregated, with respect
>> to calling the login an author, here that all logins are
>> authors. Basically the idea with that is that the client
>> requesting the article would make it so, then for things
>> like "the IMAP fronting the NNTP and delegating the
>> author on down into the NNTP", and these kinds of things.
>>
>>
>> For MIME the idea seems to actually be to break the
>> parts on out into files into a subdirectory, that something
>> like "bm" indicates "body-MIME", then that MIME bodies
>> have a natural enough filesystem-representation,
>> where it results a good idea to make their transfer
>> and content encoding for the various transfer and
>> content encodings, and for delivering parts, ....
>> Then the usual idea of the MIME body as the
>> single-part MIME object, binary, basically is
>> for blobs, ..., then as with regards to those prepared
>> also "b7-at-rest" for delivering any kind of object,
>> here with its routing as a message besides as just
>> a usual kind of object-store.
>>
>>
>> https://datatracker.ietf.org/doc/html/rfc2046#section-5
>>
>>
>> The idea here is that it's great that messages, usually,
>> can just be considered exactly as they arrive, the
>> ingestion having added a Path element, say,
>> serialized and stored as they arrived from the wire,
>> and retreived and returned as back to it. Then,
>> messages in various structures, eventually have
>> parts and entities and messages in them and
>> transfer and content encodings that were applied
>> and data that is or isn't compressible and will or won't
>> by served as textual or as binary, or as reference, in
>> getting into the linked-content and "Content-ID",
>> the idea that large blobs of data are also aside.
>>
>> Then, this idea is to store the entities and parts
>> and contained messages and blobs, at rest, as
>> where their content encoding and transfer encoding,
>> make for the repurposable and constant representations
>> at-rest, then that when it result either extraction, or,
>> retrieval, that the point here is that extraction is
>> "inside the envelope", then with the idea that
>> message-composition, should have it so that
>> largely the server just spews retrievals as
>> concatenating the parts at rest, or putting them
>> in content and transfer encodings, with regards
>> to eventually the transfer encoding, then the compression
>> layer as here is pretty usual, then the encryption and
>> compression layers on out, the idea being to make
>> those modular, factorizable, in terms of message-composition,
>> that it gets pretty involved yet then results handling
>> any kinds of Internet message content like this at all.
>>
>>
>> Hmm, ..., "quoted-printable".
>>
>> https://datatracker.ietf.org/doc/html/rfc2049#section-4
>>
>> "he process of composing a MIME entity can be modeled as being done
>> in a number of steps. Note that these steps are roughly similar to
>> those steps used in PEM [RFC-1421] ..."
>>
>> (PEM, "Privacy Enhanced Mail", ....)
>>
>>
>> So, it's being kind of sorted out mostly how to get
>> the messages flowing pass-through, as much as possible,
>> this still being the BFF, with regards then to extraction,
>> and use cases for SFF.
>>
>>
>> About "the three and a half terabytes uncompressed
>> the Usenet archive", ....
>>
>>
>>
>
>
>
>
> https://en.wikipedia.org/wiki/Maildir
>
> "Supported mailbox formats are Maildir, mbox, MH, Babyl, and MMDF."
> https://docs.python.org/3/library/mailbox.html
>
>
> Wow, technology's arrived at 3-D C-D's that store
> an entire petabit, hundreds of thousands of gigabytes,
> on one 3-D C-D.
>
> So big it's like "yeah it's only bits not bytes,
> but it's more than a quadrillion bits, on one 3-D C-D".
>
> Not sure if petabits or pebibits, ....
>
> Here the idea is that maildir has /tmp, /new, /cur,
> in that just being files apiece with the contents,
> that the idea is that BFF has directories apiece,
> then that it seems needful to have at least one
> file that is the message itself, and perhaps a
> compressed edition, then that software that
> expects a maildir, could just have symlinks
> built for it, then figuring maildir apps could
> move symlinks from /new to /cur, while the
> BFF just sits at rest.
>
> These days a usual notion of a store is an object-store,
> or a volume that is like ext3 or ext4 filesystem, say.
>
> Then, for sort of making it so, that BFF, is designed,
> so that other "one message one file" organizations
> can sit next to it, basically involves watching the
> /new folder, and having that BFF folders have a sort
> of ingestion program, ...
>
> bff-drop/
> bff-depo/
> bff-repo/
>
> figuring that bff-deposit is where BFF aware inputs
> deposit their messages, then for moving the MessageId/
> folder ensuite into bff-repo, then for the idea that
> basically a helper app, makes symlinks from maildir layout,
> into bff-repo, where one of the files in MessageId/
> is the "plain message", and the symlinks build the conventions
> of the maildir and this kind of thing.
>
> The idea then is that tools that use maildir, basically
> "don't maintain the maildir" in this kind of setup,
> and that instead of /tmp -> /new -> ingestion, there's
> instead a BFF file-watch on /tmp, that copies it to bff-drop/,
> and a file-watch on bff-repo/, that builds a symlink in /new.
>
> (What this may entail for this one message one directory
> approach, is to have one message one directory two donefiles,
> for a usual sort of touchfile convention to watch, for,
> and delete, after the first donefile, indicates readiness.)
>
> Or, the idea would be that procmail, or what drops mail
> into maildir, would be configured that its /new is simply
> pointed at bff-drop/, while other IMAP and so applications
> using maildir, would point at a usual /new and /cur, in maildir,
> that is just symlinks that a BFF file-watch on bff-drop,
> maintains in the same convention.
>
> Then its various that application using maildir also accept
> the files at-rest being compressed, that here most of the
> idea of bff-depo, is to deposit and decompose the messages
> into the MessageId/ folder, then to move that up, then to
> touch the MessageId/id file, which is the touchfile convention
> when it exists and is fully-formed.
>
> The idea here of decomposing the messages is that basically
> the usual idea is to just deliver them exactly as arrive, but
> the idea is that parts variously would have different modes
> of compression, or encryption, to decompose them "to rest",
> then to move them altogether to bff-repo, "at rest".
>
> The ext3 supports about 32K sub-directories. So, where this
> setup is "one message one directory", vis-a-vis, "one message
> one file", so, while it an work out that there's a sort of
> object-store view that's basically flat because MessageId's
> are unique, still is for a hierarchical directory partitioning,
> figuring that a good uniformizing hash-code will balance
> those out. Here the idea is to run md5sum, result 128 bits,
> then just split that into parts and xor them together.
>
> Let's see, 2^4^4 = 2^16, less than 32k is less than 2^15,
> so each directory name should be 14 or less hexadecimal
> characters, each one 4 bits, 32 of those in an md5sum,
> just splitting the md5 sum into 4-many 8-hexchar alphanumeric
> letters, putting the MessageId/ folders under those,
> figuring messages would be sparse in those, then though
> that as they approach about 4 billion, is for figuring out
> what is reaching the limits of the file system, about PATH_MAX,
> NAME_MAX, according to symlinks, max directories, max files,
> fileystem limits, and filesystem access times, these kinds of things.
>
> Then, for filesystems though that support it, is basically
> for either nesting subdirectories, or having a flat directory
> where various modern filesystems or object-stores result
> as many sub-directories as until they fill the disk.
>
> The idea is that filesystems and object-stores have their
> various guarantees, and limits, here getting into the
> "write once read many" and "write once read never"
> usual files, then about the entirely various use cases
> of the ephemeral data what's derived and discardable,
> that BFF always has a complete message in the various
> renditions, then to work the extraction and updates,
> at any later date.
>
>
>
> IETF RFC
>
> NNTP
>
> https://datatracker.ietf.org/wg/nntpext/documents/
>
> 3977 https://datatracker.ietf.org/doc/html/rfc3977
> 8054 https://www.rfc-editor.org/rfc/rfc8054
> 6048 https://datatracker.ietf.org/doc/html/rfc6048
>
> SMTP
>
> 5321 https://datatracker.ietf.org/doc/html/rfc5321
> 2821 https://www.ietf.org/rfc/rfc2821.txt
> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
> Format
> 3030 https://www.ietf.org/rfc/rfc3030.txt
>
> IMAP
>
> 3501 https://datatracker.ietf.org/doc/html/rfc3501
> 2683 https://datatracker.ietf.org/doc/html/rfc2683
> 4978 https://datatracker.ietf.org/doc/html/rfc4978
> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>
> POP3
>
> 1725 https://www.ietf.org/rfc/rfc1939.txt
>
>
> Message Encapsulation / PEM
>
> 934 https://datatracker.ietf.org/doc/html/rfc934
> 1421 https://datatracker.ietf.org/doc/html/rfc1421
> 1422 https://datatracker.ietf.org/doc/html/rfc1422
> 1423 https://datatracker.ietf.org/doc/html/rfc1423
> 1424 https://datatracker.ietf.org/doc/html/rfc1424
> 7468 https://datatracker.ietf.org/doc/html/rfc7468
>
> Language
>
> 4646 https://datatracker.ietf.org/doc/html/rfc4646
> 4647 https://datatracker.ietf.org/doc/html/rfc4647
>
> MIME
>
> 2045 https://datatracker.ietf.org/doc/html/rfc2045
> 2049 https://datatracker.ietf.org/doc/html/rfc2049
> 2046 https://datatracker.ietf.org/doc/html/rfc2046
> 2047 https://datatracker.ietf.org/doc/html/rfc2047
> 4288 https://datatracker.ietf.org/doc/html/rfc4288
> 4289 https://datatracker.ietf.org/doc/html/rfc4289
> 1521 https://datatracker.ietf.org/doc/html/rfc1521
> 1522 https://datatracker.ietf.org/doc/html/rfc1522
> 2231 https://datatracker.ietf.org/doc/html/rfc2231
>
> BASE64
>
> 4648 https://datatracker.ietf.org/doc/html/rfc4648
>
> DEFLATE
>
> 1950 https://datatracker.ietf.org/doc/html/rfc1950
> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>
> HTTP
>
> 7231 https://datatracker.ietf.org/doc/html/rfc7231
> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>

Click here to read the complete article

On 02/20/2024 07:47 PM, Ross Finlayson wrote:
> About a "dedicated little OS" to run a "dedicated little service".
>
>
> "Critix"
>
> 1) some boot code
> power on self test, EFI/UEFI, certificates and boot, boot
>
> 2) a virt model / a machine model
> maybe running in a virt
> maybe running on metal
>
> 3) a process/scheduler model
> it's processes, a process model
> goal is, "some of POSIX"
>
> Resources
>
> Drivers
>
> RAM
> Bus
> USB, ... serial/parallel, device connections, ....
> DMA
> framebuffer
> audio dac/adc
>
>
> Disk
>
> hard
> memory
> network
>
>
> Login
>
> identity
> resources
>
>
>
> Networking
>
> TCP/IP stack
> UDP, ...
> SCTP, ...
> raw, ...
>
> naming
>
>
> Windowing
>
> "video memory and what follows SVGA"
> "Java, a plain windowing VM"
>
>
>
> PCI <-> PCIe
>
> USB 1/2 USB 3/4
>
> MMU <-> DMA
>
> Serial ATA
>
> NIC / IEEE 802
>
> "EFI system partition"
>
> virtualization model
> emulator
>
> clock-accurate / bit-accurate
> clock-inaccurate / voltage
>
>
> mainboard / motherboard
> circuit summary
>
> emulator environment
>
> CPU
> main memory
> host adapters
>
> PU's
> bus
>
> I^2C
>
> clock model / timing model
> interconnect model / flow model
> insertion model / removal model
> instruction model
>
>

I got looking into PC architecture wondering
how it was since I studied internals and it really
seems it's stabilized a lot.

UEFI ACPI SMBIOS

DRAM
DMA
virtualized addressing

CPU

System Bus

Intel CSI QPI UPI
AMD HyperTransport
ARM CoreLink

PCI
PCIe

Host Adapters
ATA
NVMe
USB
NIC

So I'm wondering to myself, well first I wonder
about writing UEFI plugins to sort of enumerate
the setup and for example print it out and for
example see what keys are in the TPM and for
example the partition table and what goes in
in terms of the device tree and basically for
diagnostic, boot services then runtime services
after UEFI exits after having loaded into memory
the tables of the "runtime services" which are
mostly sort of a table in memory with offsets
of the things and maybe how they're ID's as
with regards to the System Bus the Host Adapters.

Then it's a pretty simplified model and gets
into things like wondering what all else is
going on in the device tree and I2C the
blinking lights and perhaps the beep, or bell.

A lot of times it looks like the video is onboard
out the CPU, vis-a-vis the UEFI video output
or what appears to be going on, I'm wondering
about it.

So I'm wondering how to make a simulator,
an emulator, uh, of these things above,
and then basically the low-speed things
and the high-speed things, and, their logical
protocols vis-a-vis the voltage and the
bit-and-clock accurate and the voltage as
symbols vis-a-vis symbolically the protocols,
how to make it so to have a sort of simulator
or emulator of this sort of usual system,
with a usual idea to target code to it to
that kind of system or a virt over the virtualized
system to otherwise exactly that kind of system, ....

On 02/20/2024 08:38 PM, Ross Finlayson wrote:
>
>
> Alright then, about the SFF, "summary" file-format,
> "sorted" file-format, "search" file-format, the idea
> here is to figure out normal forms of summary,
> that go with the posts, with the idea that "a post's
> directory is on the order of contained size of the
> size of the post", while, "a post's directory is on
> a constant order of entries", here is for sort of
> summarizing what a post's directory looks like
> in "well-formed BFF", then as with regards to
> things like Intermediate file-formats as mentioned
> above here with the goal of "very-weakly-encrypted
> at rest as constant contents", then here for
> "SFF files, either in the post's-directory or
> on the side, and about how links to them get
> collected to directories in a filesystem structure
> for the conventions of the concatenation of files".
>
> So, here the idea so far is that BFF has a normative
> form for each post, which has a particular opaque
> globally-universal unique identifier, the Message-ID,
> then that the directory looks like MessageId/ then its
> contents were as these files.
>
> id hd bd yd td rd ad dd ud xd
> id, header, body, year-to-date, thread, referenced, authored, dead,
> undead, expired
>
> or just files named
>
> i h b y t r a d u x
>
> which according to the presence of the files and
> their contents, indicate that the presence of the
> MessageId/ directory indicates the presence of
> a well-formed message, contingent not being expired.
>
> ... Where hd bd are the message split into its parts,
> with regards to the composition of messages by
> concatenating those back together with the computed
> message numbers and this kind of thing, with regards to
> the site, and the idea that they're stored at-rest pre-compressed,
> then knowledge of the compression algorithm makes for
> concatenating them in message-composition as compressed.
>
> Then, there are variously already relations of the
> posts, according to groups, then here as above that
> there's perceived required for date, and author.
> I.e. these are files on the order the counts of posts,
> or span in time, or count of authors.
>
> (About threading and relating posts, is the idea of
> matching subjects not-so-much but employing the
> References header, then as with regards to IMAP and
> parity as for IMAP's THREADS extension, ...,
> www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
> There's a usual sort of notion that sorted, threaded
> enumeration is either in date order or thread-tree
> traversal order, usually more sensibly date order,
> with regards to breaking out sub-threads, variously.
> "It's all one thread." IMAP: "there is an implicit sort
> criterion of sequence number".)
>
>
> Then, similarly is for defining models for the sort, summary,
> search, SFF, that it sort of (ha) rather begins with sort,
> about the idea that it's sort of expected that there will
> be a date order partition either as symlinks or as an index file,
> or as with regards to that messages date is also stored in
> the yd file, then as with regards to "no file-times can be
> assumed or reliable", with regards to "there's exactly one
> file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
> kinds of things. There's a real goal that it works easy
> with shell built-ins and text-utils, or "command line",
> to work with the files.
>
>
> So, sort pretty well goes with filtering.
> If you're familiar with the context, of, "data tables",
> with a filter-predicate and a sort-predicate,
> they're different things but then go together.
> It's figured that they get front-ended according
> to the quite most usual "column model" of the
> "table model" then "yes/no/maybe" row filtering
> and "multi-sort" row sorting. (In relational algebra, ...,
> or as rather with 'relational algebra with rows and nulls',
> this most usual sort of 'composable filtering' and 'multi-sort').
>
> Then in IMAP, the THREAD command is "a variant of
> SEARCH with threading semantics for the results".
> This is where both posts and emails work off the
> References header, but it looks like in the wild there
> is something like "a vendor does poor-man's subject
> threading for you and stuffs in a X-References",
> this kind of thing, here with regards to that
> instead of concatenation, is that intermediate
> results get sorted and threaded together,
> then those, get interleaved and stably sorted
> together, that being sort of the idea, with regards
> to search results in or among threads.
>
> (Cf www.jwz.org/doc/threading.html as
> via www.rfc-editor.org/rfc/rfc5256.html ,
> with regards to In-Reply-To and References.
> There are some interesting articles there
> about "mailbox summarization".)
>
> About the summary of posts, one way to start
> as for example an interesting article about mailbox
> summarization gets into, is, all the necessary text-encodings
> to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
> or CP-1252, in the base of BE or LE BOMs, or anything to
> do with summarizing the character data, of any of the
> headers, or the body of the text, figuring of course
> that everything's delivered as it arrives, as with regards
> to the opacity usually of everything vis-a-vis its inspection.
>
> This could be a normative sort of file that goes in the messageId/
> folder.
>
> cd: character-data, a summary of whatever form of character
> encoding or requirements of unfolding or unquoting or in
> the headers or the body or anywhere involved indicating
> a stamp indicating each of the encodings or character sets.
>
> Then, the idea is that it's a pretty deep inspection to
> figure out how the various attributes, what are their
> encodings, and the body, and the contents, with regards
> to a sort of, "a normalized string indicating the necessary
> character encodings necessary to extract attributes and
> given attributes and the body and given sections", for such
> matters of indicating the needful for things like sort,
> and collation, in internationalization and localization,
> aka i18n and l10n. (Given that the messages are stored
> as they arrived and undisturbed.)
>
> The idea is that "the cd file doesn't exist for messages
> in plain ASCII7, but for anything anywhere else, breaks
> out what results how to get it out". This is where text
> is often in a sort of format like this.
>
> Ascii
> it's keyboard characters
> ISO8859-1/ISO8859-15/CP-1252
> it's Latin1 often though with the Windows guys
> Sideout
> it's Ascii with 0-127 gigglies or upper glyphs
> Wideout
> it's 0-256 with any 256 wide characters in upper Unicode planes
> Unicode
> it's Unicode
>
> Then there are all sorts of encodings, this is according to
> the rules of Messages with regards to header and body
> and content and transfer-encoding and all these sorts
> things, it's Unicode.
>
> Then, another thing to get figured out is lengths,
> the size of contents or counts or lengths, figuring
> that it's a great boon to message-composition to
> allocate exactly what it needs for when, as a sum
> of invariant lengths.
>
> Then the MessageId/ files still has un-used 'l' and 's',
> then though that 'l' looks too close to '1', here it's
> sort of unambiguous.
>
> ld: lengthed, the coded and uncoded lengths of attributes and parts
>
> The idea here is to make it easiest for something like
> "consult the lengths and allocate it raw, concatenate
> the message into it, consult the lengths and allocate
> it uncoded, uncode the message into it".
>
> So, getting into the SFF, is that basically
> "BFF indicates well-formed messages or their expiry",
> "SFF is derived via a common algorithm for all messages",
> and "some SFF lives next to BFF and is also write-once-read-many",
> vis-a-vis that "generally SFF is discardable because it's derivable".
>
>

So, figuring that BFF then is about designed,
basically for storing Internet messages with
regards to MessageId, then about ContentId
and external resources separately, then here
the idea again becomes how to make for
the SFF files, what results, intermediate, tractable,
derivable, discardable, composable data structures,
in files of a format with regards to write-once-read-many,
write-once-read-never, and, "partition it", in terms of
natural partitions like time intervals and categorical attributes.

Click here to read the complete article

There are some various great open-source search
engines, here with respect to something like Lucene
or SOLR or ElasticSearch.

The idea is that there are attributes searches,
and full-text searches, those resulting hits,
to documents apiece, or sections of their content,
then backward along their attributes, like
threads and related threads, and authors and
their cliques, while across groups and periods
of time.

There's not much of a notion of "semantic search",
though, it's expected to sort of naturally result,
here as for usually enough least distance, as for
"the terms of matching", and predicates from what
results a filter predicate, here with what I call,
"Yes/No/Maybe".

Now, what is, "yes/no/maybe", one might ask.
Well, it's the query specification, of the world
of results, to filter to the specified results.
The idea is that there's an accepter network
for "Yes" and a rejector network for "No"
and an accepter network for "Maybe" and
then rest are rejected.

The idea is that the search, is a combination
of a bunch of yes/no/maybe terms, or,
sure/no/yes, to indicate what's definitely
included, what's not, and what is, then that
the term, results that it's composable, from
sorting the terms, to result a filter predicate
implementation, that can run anywhere along
the way, from the backend to the frontend,
this way being a, "search query specification".

There are notions like, "*", and single match
and multimatch, about basically columns and
a column model, of documents, that are
basically rows.

The idea of course is to built an arithmetic expression,
that also is exactly a natural expression,
for "matches", and "ranges".

"AP"|Archimedes|Plutonium in first|last

Here, there is a search, for various names, that
it composes this way.

AP first
AP last
Archimedes first
Archimedes last
Plutonium first
Plutonium last

As you can see, these "match terms", just naturally
break out, then that what's gets into negations,
break out and double, and what gets into ranges,
then, well that involves for partitions and ranges,
duplicating and breaking that out.

It results though a very fungible and normal form
of a search query specification, that rebuilds the
filter predicate according to sorting those, then
has very well understood runtime according to
yes/no/maybe and the multimatch, across and
among multiple attributes, multiple terms.

This sort of enriches a usual sort of query
"exact full hit", with this sort "ranges and conditions,
exact full hits".

So, the Yes/No/Maybe, is the generic search query
specification, overall, just reflecting an accepter/rejector
network, with a bit on the front to reflect keep/toss,
that's it's very practical and of course totally commonplace
and easily written broken out as find or wildmat specs.

For then these the objects and the terms relating
the things, there's about maintaining this, while
refining it, that basically there's an ownership
and a reference count of the filter objects, so
that various controls according to the syntax of
the normal form of the expression itself, with
most usual English terms like "is" and "in" and
"has" and "between", and "not", with & for "and"
and | for "or", makes that this should be the kind
of filter query specification that one would expect
to be general purpose on all such manners of
filter query specifications and their controls.

So, a normal form for these filter objects, then
gets relating them to the SFF files, because, an
SFF file of a given input corpus, satisifies some
of these specifications, the queries, or for example
doesn't, about making the language and files
first of the query, then the content, then just
mapping those to the content, which are built
off extractors and summarizers.

I already thought about this a lot. It results
that it sort of has its own little theory,
thus what can result its own little normal forms,
for making a fungible SFF description, what
results for any query, going through those,
running the same query or as so filtered down
the query for the partition already, from the
front-end to the back-end and back, a little
noisy protocol, that delivers search results.

The document is element of the corpus.
Here each message is a corpus. Now,
there's a convention in Internet messages,
not always followed, being that the ignorant
or lacking etiquette or just plain different,
don't follow it or break it, there's a convention
of attribution in Internet messages the
content that's replied to, and, this is
variously "block" or "inline".

From the outside though, the document here
has the "overview" attributes, the key-value
pairs of the headers those being, and the
"body" or "document" itself, which can as
well have extracted attributes, vis-a-vis
otherwise its, "full text".

https://en.wikipedia.org/wiki/Search_engine_indexing

The key thing here for partitioning is to
make for date-range partitioning, while,
the organization of the messages by ID is
essentially flat, and constant rate to access one
but linear to trawl through them, although parallelizable,
for example with a parallelizable filter predicate
like yes/no/maybe, before getting into the
inter-document of terms, here the idea is that
there's basically

date partition
group partition

then as with regards to

threads
authors

that these are each having their own linear organization,
or as with respect to time-series partitions, and the serial.

Then, there are two sorts of data structures
to build with:

binary trees,
bit-maps.

So, the idea is to build indexes for date ranges
and then just search separately, either linear
or from an in-memory currency, the current.

I'm not too interested in "rapid results" as
much as "thoroughly parallelizable and
effectively indexed", and "providing
incremental results" and "full hits".

The idea here is to relate date ranges,
to an index file for the groups files,
then to just search the date ranges,
and for example as maybe articles expire,
which here they don't as it's archival,
to relate dropping old partitions with
updating the groups indexes.

For NNTP and IMAP then there's,
OVERVIEW and SEARCH. So, the
key attributes relevant those protocols,
are here to make it so that messages
have an abstraction of an extraction,
those being fixed as what results,
then those being very naively composable,
with regards to building data structures
of those, what with regards to match terms,
evaluate matches in ranges on those.

Now, NNTP is basically write-once-read-many,
though I suppose it's mostly write-once-read-
maybe-a-few-times-then-never, while IMAP
basically adds to the notion of the session,
what's read and un-read, and, otherwise
with regards to flags, IMAP flags. I.e. flags
are variables, all this other stuff being constants.

So, there's an idea to build a sort of, top-down,
or onion-y, layered, match-finder. This is where
it's naively composable to concatenate the
world of terms, in attributes, of documents,
in date ranges and group partitions, to find
"there is a hit" then to dive deeper into it,
figuring the idea is to horizontally scale
by refining date partitions and serial collections,
then parallelize those, where as well that serial
algorithms work the same on those, eg, by
concatenating those and working on that.

This is where a group and a date partition
each have a relatively small range, of overview
attributes, and their values, then that for
noisy values, like timestamps, to detect those
and work out what are small cardinal categories
and large cardinal ergodic identifiers.

It's sort of like, "Why don't you check out the
book Information Retrieval and read that again",
and, in a sense, it's because I figure that Google
has littered all their no-brainer patterns with junk patents
that instead I expect to clean-room and prior-art this.
Maybe that's not so, I just wonder sometimes how
they've arrived at monopolizing what's a totally
usual sort of "fetch it" routine.

So, the goal is to find hits, in conventions of
documents, inside the convention of quoting,
with regards to
bidirectional relations of correspondence, and,
unidirectional relations of nesting, those
being terms for matching, and building matching,
then that the match document, is just copied
and sent to each partition in parallel, each
resulting its hits.

Click here to read the complete article

On 02/29/2024 07:55 PM, Ross Finlayson wrote:
> On 02/20/2024 07:47 PM, Ross Finlayson wrote:
>> About a "dedicated little OS" to run a "dedicated little service".
>>
>>
>> "Critix"
>>
>> 1) some boot code
>> power on self test, EFI/UEFI, certificates and boot, boot
>>
>> 2) a virt model / a machine model
>> maybe running in a virt
>> maybe running on metal
>>
>> 3) a process/scheduler model
>> it's processes, a process model
>> goal is, "some of POSIX"
>>
>> Resources
>>
>> Drivers
>>
>> RAM
>> Bus
>> USB, ... serial/parallel, device connections, ....
>> DMA
>> framebuffer
>> audio dac/adc
>>
>>
>> Disk
>>
>> hard
>> memory
>> network
>>
>>
>> Login
>>
>> identity
>> resources
>>
>>
>>
>> Networking
>>
>> TCP/IP stack
>> UDP, ...
>> SCTP, ...
>> raw, ...
>>
>> naming
>>
>>
>> Windowing
>>
>> "video memory and what follows SVGA"
>> "Java, a plain windowing VM"
>>
>>
>>
>> PCI <-> PCIe
>>
>> USB 1/2 USB 3/4
>>
>> MMU <-> DMA
>>
>> Serial ATA
>>
>> NIC / IEEE 802
>>
>> "EFI system partition"
>>
>> virtualization model
>> emulator
>>
>> clock-accurate / bit-accurate
>> clock-inaccurate / voltage
>>
>>
>> mainboard / motherboard
>> circuit summary
>>
>> emulator environment
>>
>> CPU
>> main memory
>> host adapters
>>
>> PU's
>> bus
>>
>> I^2C
>>
>> clock model / timing model
>> interconnect model / flow model
>> insertion model / removal model
>> instruction model
>>
>>
>
>
>
>
> I got looking into PC architecture wondering
> how it was since I studied internals and it really
> seems it's stabilized a lot.
>
> UEFI ACPI SMBIOS
>
> DRAM
> DMA
> virtualized addressing
>
> CPU
>
> System Bus
>
> Intel CSI QPI UPI
> AMD HyperTransport
> ARM CoreLink
>
>
> PCI
> PCIe
>
> Host Adapters
> ATA
> NVMe
> USB
> NIC
>
> So I'm wondering to myself, well first I wonder
> about writing UEFI plugins to sort of enumerate
> the setup and for example print it out and for
> example see what keys are in the TPM and for
> example the partition table and what goes in
> in terms of the device tree and basically for
> diagnostic, boot services then runtime services
> after UEFI exits after having loaded into memory
> the tables of the "runtime services" which are
> mostly sort of a table in memory with offsets
> of the things and maybe how they're ID's as
> with regards to the System Bus the Host Adapters.
>
>
> Then it's a pretty simplified model and gets
> into things like wondering what all else is
> going on in the device tree and I2C the
> blinking lights and perhaps the beep, or bell.
>
> A lot of times it looks like the video is onboard
> out the CPU, vis-a-vis the UEFI video output
> or what appears to be going on, I'm wondering
> about it.
>
>
> So I'm wondering how to make a simulator,
> an emulator, uh, of these things above,
> and then basically the low-speed things
> and the high-speed things, and, their logical
> protocols vis-a-vis the voltage and the
> bit-and-clock accurate and the voltage as
> symbols vis-a-vis symbolically the protocols,
> how to make it so to have a sort of simulator
> or emulator of this sort of usual system,
> with a usual idea to target code to it to
> that kind of system or a virt over the virtualized
> system to otherwise exactly that kind of system, ....
>
>
>

Critix

boot protocols

UEFI ACPI SMBIOS

CPU and instruction model

bus protocols

low-speed protocols
high-speed protocols

Looking at the instructions, it looks pretty much
that the kernel code is involved inside the system
instructions, to support the "bare-metal" and then
also the "virt-guests", then that communication
is among the nodes in AMD, then, the HyperTransport
basically is indicated as, IO, then for there to be figured
out that the guest virts get a sort of view of the "hardware
abstraction layer", then with regards to the segments and
otherwise the mappings, for the guest virts, vis-a-vis,
the mappings to the memory and I/O, getting figured
out these kinds of things as an example of what gets
into a model of a sort of machine, as a sort of emulator,
basically figuring to be bit-accurate and ignore being
clock-accurate.

The "BIOS and kernel guide" gets into the order of
system initializaiton and the links, and DRAM.
It looks that there are nodes basically being parallel
processors, and on those cores, being CPUs or
processors.

Then each of the processors has its control and status
registers, then with regards to tables, and with regards
to memory and cache, about those the segments,
figuring to model the various interconnections this
way in a little model of a mainboard CPU. "Using L2
Cache as General Storage During Boot".

Then it gets into enumerating and building the links,
and setting up the buffers, to figure out what's going
on the DRAM and DMA, and, PCI and PCIe, and, then
about what's ATA, NVMe, and USB, these kinds things.

Nodes' cores share registers or "software must ensure...",
with statics and scopes. Then it seems the cache lines
and then the interrupt vectors or APIC IDs get enumerated,
setting up the routes and tables.

Then various system and operating modes proceed,
where there's an idea that the basic difference
among executive, scheduler, and operating system,
basically is in with respect to the operating mode,
with respect to old real, protected, and, "unreal",
I suppose, modes, here that basically it's all really
simplified about protected mode and guest virts.

"After storing the save state, execution starts ...."

Then the's described "spring-boarding" into SMM
that the BSP and BSM, a quick protocol then that
all the live nodes enter SMM, basically according
to ACPI and the APIC.

"The processor supports many power management
features in a variety of systems."

This gets into voltage proper, here though that
what results is bit-accurate events.

"P-states are operational performance states
characterized by a unique frequency and voltage."

The idea here is to support very-low-power operation
vis-a-vis modest, usual, and full (P0). Then besides
consumption, is also reducing heat, or dialing down
according to temperature. Then there are C-states
and S-states, then mostly these would be as by
the BIOS, what gets surfaced as ACPI to the kernel.

There are some more preliminaries, the topology
gets setup, then gets involved the DCT DIMM DRAM
frequency and for DRAM, lighting up RAM, that
basically to be constant rate, about the DCT and DDR.

There are about 1000 model-specific registers what
seem to be for the BIOS to inspect and figure out
the above pretty much and put the system into a
state for regular operation.

Then it seems like an emulator would be setting
that up, then as with regards to usually enough
"known states" and setting up for simulating the
exercise of execution and I/O.

instructions

system-purpose

interrupt

CLGI CLI STI STGI
HLT
IRET IRETD IRETQ
LIDT SIDT
MONITOR MWAIT
RSM
SKINIT

privileges

ARPL
LAR
RDPKRU WRPKRU
VERR VERW

alignment

CLAC STAC

jump/routine

SYSCALL SYSRET
SYSENTER SYSEXIT

task, stack, tlb, gdt, ldt, cache

CLTS
CLRSSBSY SETSSBSY
INCSSP
INVD
INVLPG INVLPGA INVLPGB INVPCID TLBSYNC
LGDT SGDT
LLDT SLDT
LMSW
LSL
LTR STR
RDSSP
RSTORSSP SAVEPREVSSP
WBINVD WBNOINVD
WRSS WRUSS

load/store
MOV CRn MOV DRn
RDMSR WRMSR
SMSW
SWAPGS

virtual

PSMASH PVALIDATE
RMPADJUST RMPUPDATE
RMPQUERY
VMLOAD VMSAVE
VMMCALL VMGEXIT
VMRUN

perf

RDPMC
RDTSC RDTSCP

debug

INT 3

general-purpose

context
CPUID
LLWPCB LWPINS LWPVAL SLWPCB
NOP
PAUSE

Click here to read the complete article

On 03/04/2024 11:23 AM, Ross Finlayson wrote:
>
> So, figuring that BFF then is about designed,
> basically for storing Internet messages with
> regards to MessageId, then about ContentId
> and external resources separately, then here
> the idea again becomes how to make for
> the SFF files, what results, intermediate, tractable,
> derivable, discardable, composable data structures,
> in files of a format with regards to write-once-read-many,
> write-once-read-never, and, "partition it", in terms of
> natural partitions like time intervals and categorical attributes.
>
>
> There are some various great open-source search
> engines, here with respect to something like Lucene
> or SOLR or ElasticSearch.
>
> The idea is that there are attributes searches,
> and full-text searches, those resulting hits,
> to documents apiece, or sections of their content,
> then backward along their attributes, like
> threads and related threads, and authors and
> their cliques, while across groups and periods
> of time.
>
> There's not much of a notion of "semantic search",
> though, it's expected to sort of naturally result,
> here as for usually enough least distance, as for
> "the terms of matching", and predicates from what
> results a filter predicate, here with what I call,
> "Yes/No/Maybe".
>
> Now, what is, "yes/no/maybe", one might ask.
> Well, it's the query specification, of the world
> of results, to filter to the specified results.
> The idea is that there's an accepter network
> for "Yes" and a rejector network for "No"
> and an accepter network for "Maybe" and
> then rest are rejected.
>
> The idea is that the search, is a combination
> of a bunch of yes/no/maybe terms, or,
> sure/no/yes, to indicate what's definitely
> included, what's not, and what is, then that
> the term, results that it's composable, from
> sorting the terms, to result a filter predicate
> implementation, that can run anywhere along
> the way, from the backend to the frontend,
> this way being a, "search query specification".
>
>
> There are notions like, "*", and single match
> and multimatch, about basically columns and
> a column model, of documents, that are
> basically rows.
>
>
> The idea of course is to built an arithmetic expression,
> that also is exactly a natural expression,
> for "matches", and "ranges".
>
> "AP"|Archimedes|Plutonium in first|last
>
> Here, there is a search, for various names, that
> it composes this way.
>
> AP first
> AP last
> Archimedes first
> Archimedes last
> Plutonium first
> Plutonium last
>
> As you can see, these "match terms", just naturally
> break out, then that what's gets into negations,
> break out and double, and what gets into ranges,
> then, well that involves for partitions and ranges,
> duplicating and breaking that out.
>
> It results though a very fungible and normal form
> of a search query specification, that rebuilds the
> filter predicate according to sorting those, then
> has very well understood runtime according to
> yes/no/maybe and the multimatch, across and
> among multiple attributes, multiple terms.
>
>
> This sort of enriches a usual sort of query
> "exact full hit", with this sort "ranges and conditions,
> exact full hits".
>
> So, the Yes/No/Maybe, is the generic search query
> specification, overall, just reflecting an accepter/rejector
> network, with a bit on the front to reflect keep/toss,
> that's it's very practical and of course totally commonplace
> and easily written broken out as find or wildmat specs.
>
> For then these the objects and the terms relating
> the things, there's about maintaining this, while
> refining it, that basically there's an ownership
> and a reference count of the filter objects, so
> that various controls according to the syntax of
> the normal form of the expression itself, with
> most usual English terms like "is" and "in" and
> "has" and "between", and "not", with & for "and"
> and | for "or", makes that this should be the kind
> of filter query specification that one would expect
> to be general purpose on all such manners of
> filter query specifications and their controls.
>
> So, a normal form for these filter objects, then
> gets relating them to the SFF files, because, an
> SFF file of a given input corpus, satisifies some
> of these specifications, the queries, or for example
> doesn't, about making the language and files
> first of the query, then the content, then just
> mapping those to the content, which are built
> off extractors and summarizers.
>
> I already thought about this a lot. It results
> that it sort of has its own little theory,
> thus what can result its own little normal forms,
> for making a fungible SFF description, what
> results for any query, going through those,
> running the same query or as so filtered down
> the query for the partition already, from the
> front-end to the back-end and back, a little
> noisy protocol, that delivers search results.
>
>
>
>
> The document is element of the corpus.
> Here each message is a corpus. Now,
> there's a convention in Internet messages,
> not always followed, being that the ignorant
> or lacking etiquette or just plain different,
> don't follow it or break it, there's a convention
> of attribution in Internet messages the
> content that's replied to, and, this is
> variously "block" or "inline".
>
> From the outside though, the document here
> has the "overview" attributes, the key-value
> pairs of the headers those being, and the
> "body" or "document" itself, which can as
> well have extracted attributes, vis-a-vis
> otherwise its, "full text".
>
> https://en.wikipedia.org/wiki/Search_engine_indexing
>
>
> The key thing here for partitioning is to
> make for date-range partitioning, while,
> the organization of the messages by ID is
> essentially flat, and constant rate to access one
> but linear to trawl through them, although parallelizable,
> for example with a parallelizable filter predicate
> like yes/no/maybe, before getting into the
> inter-document of terms, here the idea is that
> there's basically
>
> date partition
> group partition
>
> then as with regards to
>
> threads
> authors
>
> that these are each having their own linear organization,
> or as with respect to time-series partitions, and the serial.
>
> Then, there are two sorts of data structures
> to build with:
>
> binary trees,
> bit-maps.
>
> So, the idea is to build indexes for date ranges
> and then just search separately, either linear
> or from an in-memory currency, the current.
>
> I'm not too interested in "rapid results" as
> much as "thoroughly parallelizable and
> effectively indexed", and "providing
> incremental results" and "full hits".
>
> The idea here is to relate date ranges,
> to an index file for the groups files,
> then to just search the date ranges,
> and for example as maybe articles expire,
> which here they don't as it's archival,
> to relate dropping old partitions with
> updating the groups indexes.
>
> For NNTP and IMAP then there's,
> OVERVIEW and SEARCH. So, the
> key attributes relevant those protocols,
> are here to make it so that messages
> have an abstraction of an extraction,
> those being fixed as what results,
> then those being very naively composable,
> with regards to building data structures
> of those, what with regards to match terms,
> evaluate matches in ranges on those.
>
> Now, NNTP is basically write-once-read-many,
> though I suppose it's mostly write-once-read-
> maybe-a-few-times-then-never, while IMAP
> basically adds to the notion of the session,
> what's read and un-read, and, otherwise
> with regards to flags, IMAP flags. I.e. flags
> are variables, all this other stuff being constants.
>
>
> So, there's an idea to build a sort of, top-down,
> or onion-y, layered, match-finder. This is where
> it's naively composable to concatenate the
> world of terms, in attributes, of documents,
> in date ranges and group partitions, to find
> "there is a hit" then to dive deeper into it,
> figuring the idea is to horizontally scale
> by refining date partitions and serial collections,
> then parallelize those, where as well that serial
> algorithms work the same on those, eg, by
> concatenating those and working on that.
>
> This is where a group and a date partition
> each have a relatively small range, of overview
> attributes, and their values, then that for
> noisy values, like timestamps, to detect those
> and work out what are small cardinal categories
> and large cardinal ergodic identifiers.
>
> It's sort of like, "Why don't you check out the
> book Information Retrieval and read that again",
> and, in a sense, it's because I figure that Google
> has littered all their no-brainer patterns with junk patents
> that instead I expect to clean-room and prior-art this.
> Maybe that's not so, I just wonder sometimes how
> they've arrived at monopolizing what's a totally
> usual sort of "fetch it" routine.
>
>
> So, the goal is to find hits, in conventions of
> documents, inside the convention of quoting,
> with regards to
> bidirectional relations of correspondence, and,
> unidirectional relations of nesting, those
> being terms for matching, and building matching,
> then that the match document, is just copied
> and sent to each partition in parallel, each
> resulting its hits.
>
> The idea is to show a sort of search plan, over
> the partitions, then that there's incremental
> progress and expected times displayed, and
> incremental results gathered, digging it up.
>
> There's basically for partitions "has-a-hit" and
> "hit-count", "hit-list", "hit-stream". That might
> sound sort of macabre, but it means search hits
> not mob hits, then for the keep/toss and yes/no/maybe,
> that partitions are boundaries of sorts, on down
> to ideas of "document-level" and "attribute-level"
> aspects of, "intromissive and extromissive visibility".
>
>
> https://lucene.apache.org/core/3_5_0/fileformats.html
>
> https://solr.apache.org/guide/solr/latest/configuration-guide/index-location-format.html
>
>
> It seems sort of sensible to adapt to Lucene's index file format,
> or, it's pretty sensible, then with regards to default attributes
> and this kind of thing, and the idea that threads are
> documents for searching in threads and finding the
> content actually aside the quotes.
>
> The Lucene's index file format, isn't a data structure itself,
> in terms of a data structure built for b-tree/b-map, where
> the idea is to result a file, that's a serialization of a data
> structure, within it, the pointer relations as to offsets
> in the file, so that, it can be loaded into memory and
> run, or that, I/O can seek through it and run, but especially
> that, it can be mapped into memory and run.
>
> I.e., "implementing the lookup" as following pointer offsets
> in files, vis-a-vis a usual idea that the pointers are just links
> in the tree or off the map, is one of these "SFF" files.
>
> So, for an "index", it's really sort of only the terms then
> that they're inverted from the documents that contain
> them, to point back to them.
>
> Then, because there are going to be index files for each
> partition, is that there are terms and there are partitions,
> with the idea that the query's broken out by organization,
> so that search proceeds only when there's matching partitions,
> then into matching terms.
>
> AP 2020-2023
>
> * AP
> !afore(2020)
> !after(2023)
>
> AP 2019, 2024
>
> * AP
> !afore(2019)
> !after(2019)
>
> * AP
> !afore(2024)
> !after(2024)
>
>
> Here for example the idea is to search the partitions
> according to they match "natural" date terms, vis-a-vis,
> referenced dates, and matching the term in any fields,
> then that the range terms result either one query or
> two, in the sense of breaking those out and resulting
> that then their results get concatenated.
>
> You can see that "in", here, as "between", for example
> in terms of range, is implemented as "not out", for
> that this way the Yes/No/Maybe, Sure/No/Yes, runs
>
> match _any_ Sure: yes
> match _any_ No: no
> match _all_ Yes: yes
> no
>
> I.e. it's not a "Should/Must/MustNot Boolean" query.
>
> What happens is that this way everything sort
> of "or's" together "any", then when are introduced
> no's, then those double about, when introduced
> between's, those are no's, and when disjoint between's,
> those break out otherwise redundant but separately
> partitionable, queries.
>
> AP not subject|body AI
>
> not subject AI
> not body AI
> AP
>
> Then the filter objects have these attributes:
> owner, refcount, sure, not, operand, match term.
>
> This is a fundamental sort of accepter/rejector that
> I wrote up quite a bit on sci.logic, and here a bit.
>
> Then this is that besides terms, a given file, has
> for partitions, to relate those in terms of dates,
> and skip those that don't apply, having that inside
> the file, vis-a-vis, having it alongside the file,
> pulling it from a file. Basically a search is to
> identify SFF files as they're found going along,
> then search through those.
>
> The term frequency / inverse document frequency,
> gets into summary statistics of terms in documents
> the corpus, here as about those building up out
> of partitions, and summing the summaries
> with either concatenation or categorical closures.
>
> So, about the terms, and the content, here it's
> plainly text content, and there is a convention
> the quoting convention. This is where, a reference
> is quoted in part or in full, then the content is
> either after-article (the article convention), afore-article
> (the email convention) or "amidst-article", inline,
> interspersed, or combinations thereof.
>
> afore-article: reference follows
> amidst-article: article split
> after-article: reference is quoted
>
> The idea in the quoting convention, is that
> nothing changes in the quoted content,
> which is indicated by the text convention.
>
> This gets into the idea of sorting the hits for
> relevance, and origin, about threads, or references,
> when terms are introduced into threads, then
> to follow those references, returning threads,
> that have terms for hits.
>
> The idea is to implement a sort of article-diff,
> according to discovering quoting character
> conventions, about what would be fragments,
> of articles as documents, and documents,
> their fragments by quoting, referring to
> references, as introduce terms.
>
> The references thread then as a data structure,
> has at least two ways to look at it. The reference
> itself is indicated by a directed-acyclic-graph or
> tree built as links, it's a primary attribute, then
> there's time-series data, then there's matching
> of the subject attribute, and even as that search
> results are a sort of thread.
>
> In this sense then a thread, is abstractly of threads,
> threads have heads, about that hits on articles,
> are also hits on their threads, with each article
> being head of a thread.
>
>
> About common words, basically gets into language.
> These are the articles (the definite and indefinite
> articles of language), the usual copulas, the usual
> prepositions, and all such words of parts-of-speech
> that are syntactical and implement referents, and
> about how they connect meaningful words, and
> into language, in terms of sentences, paragraphs,
> fragments, articles, and documents.
>
> The idea is that a long enough article will eventually
> contain all the common words. It's much structurally
> about language, though, and usual match terms of
> Yes/No/Maybe or the match terms of the Boolean,
> are here for firstly exact match then secondarily
> into "fuzzy" match and about terms that comprise
> phrases, that the goal is that SFF makes data that
> can be used to relate these things, when abstractly
> each document is in a vacuum of all the languages
> and is just an octet stream or character stream.
>
> The, multi-lingual, then, basically figures to have
> either common words of multiple languages,
> and be multi-lingual, or meaningful words from
> multiple languages, then that those are loanwords.
>
> So, back to NNTP WILDMAT and IMAP SEARCH, ....
>
> https://www.rfc-editor.org/rfc/rfc2980.html#section-3.3
> https://datatracker.ietf.org/doc/html/rfc3977#section-4.2
>
> If you've ever spent a lot of time making regexes
> and running find to match files, wildmat is sort
> of sensible and indeed a lot like Yes/No/Maybe.
> Kind of like, sed accepts a list of commands,
> and sometimes tr, when find, sed, and tr are the tools.
> Anyways, implementing WILDMAT is to be implemented
> according to SFF backing it then a reference algorithm.
> The match terms of Yes/No/Maybe, don't really have
> wildcards. They match substrings. For example
> "equals" is equals and "in" is substring and "~" for
> "relates" is by default "in". Then, there's either adding
> wildcards, or adding anchors, to those, where the
> anchors would be "^" for front and "$" for end.
> Basically though WILDMAT is a sequence (Yes|No),
> indicated by Yes terms not starting with '!' and No
> terms marked with '!', then in reverse order,
> i.e., right-to-left, any Yes match is yes and any No
> match is no, and default is no. So, in Yes/No/Maybe,
> it's a stack of Yes/No/Maybe's.
>
> Mostly though NNTP doesn't have SEARCH, though,
> so, .... And, wildmat is as much a match term, as
> an accepter/rejector, for accepter/rejector algorithms,
> that compose as queries.
>
> https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.4
>
> IMAP defines "keys", these being the language of
> the query, then as for expressions in those. Then
> most of those get into the flags, counters, and
> with regards to the user, session, that get into
> the general idea that NNTP's session is just a
> notion of "current group and current article",
> that IMAP's user and session have flags and counters
> applied to each message.
>
> Search, then, basically is into search and selection,
> and accumulating selection, and refining search,
> that basically Sure accumulates as the selection
> and No/Yes is the search. This gets relevant in
> the IMAP extensions of SEARCH for selection,
> then with the idea of commands on the selection.
>
>
>
> Relevance: gets into "signal, and noise". That is
> to say, back-and-forth references that don't
> introduce new terms, are noise, and it's the
> introduction of terms, and following that
> their reference, that's relevance.
>
> For attributes, this basically is for determining
> low cardinality and high cardinality attributes,
> that low cardinality attributes are categories,
> and high cardinality attributes are identifiers.
>
> This gets into "distance", and relation, then to
> find close relations in near distances, helping
> to find the beginnings and ends of things.
>
>
> So, I figure BFF is about designed, so to carry
> it out, and then get into SFF, that to have in
> the middle something MFF metadata file-format
> or session and user-wise, and the collection documents
> and the query documents, yet, the layout of
> the files and partitions, should be planned about
> that it will grow, either the number of directories
> or files, or there depth thereof, and it should be
> partitionable, so that it results being able to add
> or drop partitions by moving folders or making
> links, about that mailbox is a file and maildir is
> a directory and here the idea is "unbounded
> retention and performant maintenance".
>
> It involves read/write, instead of write-once-ready-many.
> Rather, it involves read/write, or growing files,
> and critical transactionality of serialization of
> parallel routine, vis-a-vis the semantics of atomic move.
>
> Then, for, "distance", is the distances of relations,
> about how to relate things, and how to find
> regions, that result a small distance among them,
> like words and roots and authors and topics
> and these kinds things, to build summary statistics
> that are discrete and composable, then that those
> naturally build both summaries as digests and also
> histograms, not so much "data mining" as "towers of relation".
>
> So, for a sort of notion of, "network distance",
> is that basically there is time-series data and
> auto-association of equality.
>

Click here to read the complete article

On 03/07/2024 08:09 AM, Ross Finlayson wrote:
> On 02/29/2024 07:55 PM, Ross Finlayson wrote:
>> On 02/20/2024 07:47 PM, Ross Finlayson wrote:
>>> About a "dedicated little OS" to run a "dedicated little service".
>>>
>>>
>>> "Critix"
>>>
>>> 1) some boot code
>>> power on self test, EFI/UEFI, certificates and boot, boot
>>>
>>> 2) a virt model / a machine model
>>> maybe running in a virt
>>> maybe running on metal
>>>
>>> 3) a process/scheduler model
>>> it's processes, a process model
>>> goal is, "some of POSIX"
>>>
>>> Resources
>>>
>>> Drivers
>>>
>>> RAM
>>> Bus
>>> USB, ... serial/parallel, device connections, ....
>>> DMA
>>> framebuffer
>>> audio dac/adc
>>>
>>>
>>> Disk
>>>
>>> hard
>>> memory
>>> network
>>>
>>>
>>> Login
>>>
>>> identity
>>> resources
>>>
>>>
>>>
>>> Networking
>>>
>>> TCP/IP stack
>>> UDP, ...
>>> SCTP, ...
>>> raw, ...
>>>
>>> naming
>>>
>>>
>>> Windowing
>>>
>>> "video memory and what follows SVGA"
>>> "Java, a plain windowing VM"
>>>
>>>
>>>
>>> PCI <-> PCIe
>>>
>>> USB 1/2 USB 3/4
>>>
>>> MMU <-> DMA
>>>
>>> Serial ATA
>>>
>>> NIC / IEEE 802
>>>
>>> "EFI system partition"
>>>
>>> virtualization model
>>> emulator
>>>
>>> clock-accurate / bit-accurate
>>> clock-inaccurate / voltage
>>>
>>>
>>> mainboard / motherboard
>>> circuit summary
>>>
>>> emulator environment
>>>
>>> CPU
>>> main memory
>>> host adapters
>>>
>>> PU's
>>> bus
>>>
>>> I^2C
>>>
>>> clock model / timing model
>>> interconnect model / flow model
>>> insertion model / removal model
>>> instruction model
>>>
>>>
>>
>>
>>
>>
>> I got looking into PC architecture wondering
>> how it was since I studied internals and it really
>> seems it's stabilized a lot.
>>
>> UEFI ACPI SMBIOS
>>
>> DRAM
>> DMA
>> virtualized addressing
>>
>> CPU
>>
>> System Bus
>>
>> Intel CSI QPI UPI
>> AMD HyperTransport
>> ARM CoreLink
>>
>>
>> PCI
>> PCIe
>>
>> Host Adapters
>> ATA
>> NVMe
>> USB
>> NIC
>>
>> So I'm wondering to myself, well first I wonder
>> about writing UEFI plugins to sort of enumerate
>> the setup and for example print it out and for
>> example see what keys are in the TPM and for
>> example the partition table and what goes in
>> in terms of the device tree and basically for
>> diagnostic, boot services then runtime services
>> after UEFI exits after having loaded into memory
>> the tables of the "runtime services" which are
>> mostly sort of a table in memory with offsets
>> of the things and maybe how they're ID's as
>> with regards to the System Bus the Host Adapters.
>>
>>
>> Then it's a pretty simplified model and gets
>> into things like wondering what all else is
>> going on in the device tree and I2C the
>> blinking lights and perhaps the beep, or bell.
>>
>> A lot of times it looks like the video is onboard
>> out the CPU, vis-a-vis the UEFI video output
>> or what appears to be going on, I'm wondering
>> about it.
>>
>>
>> So I'm wondering how to make a simulator,
>> an emulator, uh, of these things above,
>> and then basically the low-speed things
>> and the high-speed things, and, their logical
>> protocols vis-a-vis the voltage and the
>> bit-and-clock accurate and the voltage as
>> symbols vis-a-vis symbolically the protocols,
>> how to make it so to have a sort of simulator
>> or emulator of this sort of usual system,
>> with a usual idea to target code to it to
>> that kind of system or a virt over the virtualized
>> system to otherwise exactly that kind of system, ....
>>
>>
>>
>
>
> Critix
>
> boot protocols
>
> UEFI ACPI SMBIOS
>
> CPU and instruction model
>
> bus protocols
>
> low-speed protocols
> high-speed protocols
>
>
>
> Looking at the instructions, it looks pretty much
> that the kernel code is involved inside the system
> instructions, to support the "bare-metal" and then
> also the "virt-guests", then that communication
> is among the nodes in AMD, then, the HyperTransport
> basically is indicated as, IO, then for there to be figured
> out that the guest virts get a sort of view of the "hardware
> abstraction layer", then with regards to the segments and
> otherwise the mappings, for the guest virts, vis-a-vis,
> the mappings to the memory and I/O, getting figured
> out these kinds of things as an example of what gets
> into a model of a sort of machine, as a sort of emulator,
> basically figuring to be bit-accurate and ignore being
> clock-accurate.
>
> The "BIOS and kernel guide" gets into the order of
> system initializaiton and the links, and DRAM.
> It looks that there are nodes basically being parallel
> processors, and on those cores, being CPUs or
> processors.
>
> Then each of the processors has its control and status
> registers, then with regards to tables, and with regards
> to memory and cache, about those the segments,
> figuring to model the various interconnections this
> way in a little model of a mainboard CPU. "Using L2
> Cache as General Storage During Boot".
>
> Then it gets into enumerating and building the links,
> and setting up the buffers, to figure out what's going
> on the DRAM and DMA, and, PCI and PCIe, and, then
> about what's ATA, NVMe, and USB, these kinds things.
>
> Nodes' cores share registers or "software must ensure...",
> with statics and scopes. Then it seems the cache lines
> and then the interrupt vectors or APIC IDs get enumerated,
> setting up the routes and tables.
>
> Then various system and operating modes proceed,
> where there's an idea that the basic difference
> among executive, scheduler, and operating system,
> basically is in with respect to the operating mode,
> with respect to old real, protected, and, "unreal",
> I suppose, modes, here that basically it's all really
> simplified about protected mode and guest virts.
>
> "After storing the save state, execution starts ...."
>
> Then the's described "spring-boarding" into SMM
> that the BSP and BSM, a quick protocol then that
> all the live nodes enter SMM, basically according
> to ACPI and the APIC.
>
> "The processor supports many power management
> features in a variety of systems."
>
> This gets into voltage proper, here though that
> what results is bit-accurate events.
>
> "P-states are operational performance states
> characterized by a unique frequency and voltage."
>
> The idea here is to support very-low-power operation
> vis-a-vis modest, usual, and full (P0). Then besides
> consumption, is also reducing heat, or dialing down
> according to temperature. Then there are C-states
> and S-states, then mostly these would be as by
> the BIOS, what gets surfaced as ACPI to the kernel.
>
> There are some more preliminaries, the topology
> gets setup, then gets involved the DCT DIMM DRAM
> frequency and for DRAM, lighting up RAM, that
> basically to be constant rate, about the DCT and DDR.
>
> There are about 1000 model-specific registers what
> seem to be for the BIOS to inspect and figure out
> the above pretty much and put the system into a
> state for regular operation.
>
> Then it seems like an emulator would be setting
> that up, then as with regards to usually enough
> "known states" and setting up for simulating the
> exercise of execution and I/O.
>
> instructions
>
>
> system-purpose
>
>
> interrupt
>
> CLGI CLI STI STGI
> HLT
> IRET IRETD IRETQ
> LIDT SIDT
> MONITOR MWAIT
> RSM
> SKINIT
>
> privileges
>
> ARPL
> LAR
> RDPKRU WRPKRU
> VERR VERW
>
> alignment
>
> CLAC STAC
>
> jump/routine
>
> SYSCALL SYSRET
> SYSENTER SYSEXIT
>
> task, stack, tlb, gdt, ldt, cache
>
> CLTS
> CLRSSBSY SETSSBSY
> INCSSP
> INVD
> INVLPG INVLPGA INVLPGB INVPCID TLBSYNC
> LGDT SGDT
> LLDT SLDT
> LMSW
> LSL
> LTR STR
> RDSSP
> RSTORSSP SAVEPREVSSP
> WBINVD WBNOINVD
> WRSS WRUSS
>
>
> load/store
> MOV CRn MOV DRn
> RDMSR WRMSR
> SMSW
> SWAPGS
>
> virtual
>
> PSMASH PVALIDATE
> RMPADJUST RMPUPDATE
> RMPQUERY
> VMLOAD VMSAVE
> VMMCALL VMGEXIT
> VMRUN
>
>
> perf
>
> RDPMC
> RDTSC RDTSCP
>
>
> debug
>
> INT 3
>
>
>
>
> general-purpose
>
> context
> CPUID
> LLWPCB LWPINS LWPVAL SLWPCB
> NOP
> PAUSE
>
> RDFSBASE
>
> RDPID
> RPPRU
>
> UD0 UD1 UD2
>
> jump/routine
> CALL RET
> ENTER LEAVE
> INT
> INTO
> Jcc
> JCXZ JECXZ JRCXZ
> JMP
>
> register
> BOUND
> BT BTC BTR BTS
> CLC CLD CMC
> LAHF SAHF
> STC STD
> WRFSBASE WRGSBASE
>
> compare
> cmp
> CMP
> CMPS CMPSB CMPSW CMPSD CMPSQ
> CMPXCHG CMPXCHG8B CMPXCHG16B
> SCAS SCASB SCASW SCASD SCASQ
> SETcc
> TEST
> branch
> LOOP LOOPE LOOPNE LOOPNZ LOOPZ
>
>
> input/output
> IN
> INS INSB INSW INSD
> OUT
> OUTS OUTSB OUTSW OUTSD
>
> memory/cache
> CLFLUSH CLFLUSHOPT
> CLWB
> CLZERO
> LFENCE MCOMMIT MFENCE SFENCE
> MONITORX MWAITX
> PREFETCH PREFETCHW PREFETCHlevel
>
> memory/stack
> POP
> POPA POPAD
> POPF POPFD POPFQ
> PUSH
> PUSHA PUSHAD
> PUSHF PUSHFD PUSHFQ
>
> memory/segment
> XLAT XLATB
>
> load/store
> BEXTR
> BLCFILL BLCI BLCIC BLCMSK BLCS BLCIC BLCMSK BLSFILL BLSI BLSMSK BLSR
> BSF BSR
> BSWAP
> BZHI
> CBW CWDE CDQE CWD CDQ CQO
> CMOVcc
> LDS LES LFS LGS LSS
> LEA
> LODS LODSB LODSW LODSQ
> MOV
> MOVBE
> MOVD
> MOVMSKPD MOVMSKPS
> MOVNTI
> MOVS MOVSB MOVSW MOVSD MOVSQ
> MOVSX MOVSXD MOVZX
> PDEP PEXT
> RDRAND RDSEED
> STOD STOSB STOSW STOSD STODQ
> XADD XCHG
>
>
>
>
> bitwise/math
> and or nand nor
> complement
> roll
> AND ANDN
> LZCNT TZCNT
> NOT
> OR XOR
> POPCNT
> RCL RCR ROL ROR RORX
> SAL SHL SAR SARX SHL SHLD SHLX SHR SHRD SHRX
> T1MSKC TZMSK
> math
> plus minus mul div muldiv
> ADC ADCX ADD
> DEC INC
> DIV IDIV IMUL MUL MULX
> NEG
> SBB SUB
>
>
>
>
>
> ignored / unimplemented
>
> bcd binary coded decimal
> AAA AAD AAM AAS
> DAA DAS
>
> CRC32
>
>
>
>
> instruction
>
> opprefixes opcode operands opeffects
>
> opcode: the op-code
> operands:
> implicits, explicits
> inputs, outputs
> opeffects: register effects
>
> operations
>
>

Click here to read the complete article

On 03/07/2024 08:10 AM, Ross Finlayson wrote:
> On 03/04/2024 11:23 AM, Ross Finlayson wrote:
>>
>> So, figuring that BFF then is about designed,
>> basically for storing Internet messages with
>> regards to MessageId, then about ContentId
>> and external resources separately, then here
>> the idea again becomes how to make for
>> the SFF files, what results, intermediate, tractable,
>> derivable, discardable, composable data structures,
>> in files of a format with regards to write-once-read-many,
>> write-once-read-never, and, "partition it", in terms of
>> natural partitions like time intervals and categorical attributes.
>>
>>
>> There are some various great open-source search
>> engines, here with respect to something like Lucene
>> or SOLR or ElasticSearch.
>>
>> The idea is that there are attributes searches,
>> and full-text searches, those resulting hits,
>> to documents apiece, or sections of their content,
>> then backward along their attributes, like
>> threads and related threads, and authors and
>> their cliques, while across groups and periods
>> of time.
>>
>> There's not much of a notion of "semantic search",
>> though, it's expected to sort of naturally result,
>> here as for usually enough least distance, as for
>> "the terms of matching", and predicates from what
>> results a filter predicate, here with what I call,
>> "Yes/No/Maybe".
>>
>> Now, what is, "yes/no/maybe", one might ask.
>> Well, it's the query specification, of the world
>> of results, to filter to the specified results.
>> The idea is that there's an accepter network
>> for "Yes" and a rejector network for "No"
>> and an accepter network for "Maybe" and
>> then rest are rejected.
>>
>> The idea is that the search, is a combination
>> of a bunch of yes/no/maybe terms, or,
>> sure/no/yes, to indicate what's definitely
>> included, what's not, and what is, then that
>> the term, results that it's composable, from
>> sorting the terms, to result a filter predicate
>> implementation, that can run anywhere along
>> the way, from the backend to the frontend,
>> this way being a, "search query specification".
>>
>>
>> There are notions like, "*", and single match
>> and multimatch, about basically columns and
>> a column model, of documents, that are
>> basically rows.
>>
>>
>> The idea of course is to built an arithmetic expression,
>> that also is exactly a natural expression,
>> for "matches", and "ranges".
>>
>> "AP"|Archimedes|Plutonium in first|last
>>
>> Here, there is a search, for various names, that
>> it composes this way.
>>
>> AP first
>> AP last
>> Archimedes first
>> Archimedes last
>> Plutonium first
>> Plutonium last
>>
>> As you can see, these "match terms", just naturally
>> break out, then that what's gets into negations,
>> break out and double, and what gets into ranges,
>> then, well that involves for partitions and ranges,
>> duplicating and breaking that out.
>>
>> It results though a very fungible and normal form
>> of a search query specification, that rebuilds the
>> filter predicate according to sorting those, then
>> has very well understood runtime according to
>> yes/no/maybe and the multimatch, across and
>> among multiple attributes, multiple terms.
>>
>>
>> This sort of enriches a usual sort of query
>> "exact full hit", with this sort "ranges and conditions,
>> exact full hits".
>>
>> So, the Yes/No/Maybe, is the generic search query
>> specification, overall, just reflecting an accepter/rejector
>> network, with a bit on the front to reflect keep/toss,
>> that's it's very practical and of course totally commonplace
>> and easily written broken out as find or wildmat specs.
>>
>> For then these the objects and the terms relating
>> the things, there's about maintaining this, while
>> refining it, that basically there's an ownership
>> and a reference count of the filter objects, so
>> that various controls according to the syntax of
>> the normal form of the expression itself, with
>> most usual English terms like "is" and "in" and
>> "has" and "between", and "not", with & for "and"
>> and | for "or", makes that this should be the kind
>> of filter query specification that one would expect
>> to be general purpose on all such manners of
>> filter query specifications and their controls.
>>
>> So, a normal form for these filter objects, then
>> gets relating them to the SFF files, because, an
>> SFF file of a given input corpus, satisifies some
>> of these specifications, the queries, or for example
>> doesn't, about making the language and files
>> first of the query, then the content, then just
>> mapping those to the content, which are built
>> off extractors and summarizers.
>>
>> I already thought about this a lot. It results
>> that it sort of has its own little theory,
>> thus what can result its own little normal forms,
>> for making a fungible SFF description, what
>> results for any query, going through those,
>> running the same query or as so filtered down
>> the query for the partition already, from the
>> front-end to the back-end and back, a little
>> noisy protocol, that delivers search results.
>>
>>
>>
>>
>> The document is element of the corpus.
>> Here each message is a corpus. Now,
>> there's a convention in Internet messages,
>> not always followed, being that the ignorant
>> or lacking etiquette or just plain different,
>> don't follow it or break it, there's a convention
>> of attribution in Internet messages the
>> content that's replied to, and, this is
>> variously "block" or "inline".
>>
>> From the outside though, the document here
>> has the "overview" attributes, the key-value
>> pairs of the headers those being, and the
>> "body" or "document" itself, which can as
>> well have extracted attributes, vis-a-vis
>> otherwise its, "full text".
>>
>> https://en.wikipedia.org/wiki/Search_engine_indexing
>>
>>
>> The key thing here for partitioning is to
>> make for date-range partitioning, while,
>> the organization of the messages by ID is
>> essentially flat, and constant rate to access one
>> but linear to trawl through them, although parallelizable,
>> for example with a parallelizable filter predicate
>> like yes/no/maybe, before getting into the
>> inter-document of terms, here the idea is that
>> there's basically
>>
>> date partition
>> group partition
>>
>> then as with regards to
>>
>> threads
>> authors
>>
>> that these are each having their own linear organization,
>> or as with respect to time-series partitions, and the serial.
>>
>> Then, there are two sorts of data structures
>> to build with:
>>
>> binary trees,
>> bit-maps.
>>
>> So, the idea is to build indexes for date ranges
>> and then just search separately, either linear
>> or from an in-memory currency, the current.
>>
>> I'm not too interested in "rapid results" as
>> much as "thoroughly parallelizable and
>> effectively indexed", and "providing
>> incremental results" and "full hits".
>>
>> The idea here is to relate date ranges,
>> to an index file for the groups files,
>> then to just search the date ranges,
>> and for example as maybe articles expire,
>> which here they don't as it's archival,
>> to relate dropping old partitions with
>> updating the groups indexes.
>>
>> For NNTP and IMAP then there's,
>> OVERVIEW and SEARCH. So, the
>> key attributes relevant those protocols,
>> are here to make it so that messages
>> have an abstraction of an extraction,
>> those being fixed as what results,
>> then those being very naively composable,
>> with regards to building data structures
>> of those, what with regards to match terms,
>> evaluate matches in ranges on those.
>>
>> Now, NNTP is basically write-once-read-many,
>> though I suppose it's mostly write-once-read-
>> maybe-a-few-times-then-never, while IMAP
>> basically adds to the notion of the session,
>> what's read and un-read, and, otherwise
>> with regards to flags, IMAP flags. I.e. flags
>> are variables, all this other stuff being constants.
>>
>>
>> So, there's an idea to build a sort of, top-down,
>> or onion-y, layered, match-finder. This is where
>> it's naively composable to concatenate the
>> world of terms, in attributes, of documents,
>> in date ranges and group partitions, to find
>> "there is a hit" then to dive deeper into it,
>> figuring the idea is to horizontally scale
>> by refining date partitions and serial collections,
>> then parallelize those, where as well that serial
>> algorithms work the same on those, eg, by
>> concatenating those and working on that.
>>
>> This is where a group and a date partition
>> each have a relatively small range, of overview
>> attributes, and their values, then that for
>> noisy values, like timestamps, to detect those
>> and work out what are small cardinal categories
>> and large cardinal ergodic identifiers.
>>
>> It's sort of like, "Why don't you check out the
>> book Information Retrieval and read that again",
>> and, in a sense, it's because I figure that Google
>> has littered all their no-brainer patterns with junk patents
>> that instead I expect to clean-room and prior-art this.
>> Maybe that's not so, I just wonder sometimes how
>> they've arrived at monopolizing what's a totally
>> usual sort of "fetch it" routine.
>>
>>
>> So, the goal is to find hits, in conventions of
>> documents, inside the convention of quoting,
>> with regards to
>> bidirectional relations of correspondence, and,
>> unidirectional relations of nesting, those
>> being terms for matching, and building matching,
>> then that the match document, is just copied
>> and sent to each partition in parallel, each
>> resulting its hits.
>>
>> The idea is to show a sort of search plan, over
>> the partitions, then that there's incremental
>> progress and expected times displayed, and
>> incremental results gathered, digging it up.
>>
>> There's basically for partitions "has-a-hit" and
>> "hit-count", "hit-list", "hit-stream". That might
>> sound sort of macabre, but it means search hits
>> not mob hits, then for the keep/toss and yes/no/maybe,
>> that partitions are boundaries of sorts, on down
>> to ideas of "document-level" and "attribute-level"
>> aspects of, "intromissive and extromissive visibility".
>>
>>
>> https://lucene.apache.org/core/3_5_0/fileformats.html
>>
>> https://solr.apache.org/guide/solr/latest/configuration-guide/index-location-format.html
>>
>>
>>
>> It seems sort of sensible to adapt to Lucene's index file format,
>> or, it's pretty sensible, then with regards to default attributes
>> and this kind of thing, and the idea that threads are
>> documents for searching in threads and finding the
>> content actually aside the quotes.
>>
>> The Lucene's index file format, isn't a data structure itself,
>> in terms of a data structure built for b-tree/b-map, where
>> the idea is to result a file, that's a serialization of a data
>> structure, within it, the pointer relations as to offsets
>> in the file, so that, it can be loaded into memory and
>> run, or that, I/O can seek through it and run, but especially
>> that, it can be mapped into memory and run.
>>
>> I.e., "implementing the lookup" as following pointer offsets
>> in files, vis-a-vis a usual idea that the pointers are just links
>> in the tree or off the map, is one of these "SFF" files.
>>
>> So, for an "index", it's really sort of only the terms then
>> that they're inverted from the documents that contain
>> them, to point back to them.
>>
>> Then, because there are going to be index files for each
>> partition, is that there are terms and there are partitions,
>> with the idea that the query's broken out by organization,
>> so that search proceeds only when there's matching partitions,
>> then into matching terms.
>>
>> AP 2020-2023
>>
>> * AP
>> !afore(2020)
>> !after(2023)
>>
>> AP 2019, 2024
>>
>> * AP
>> !afore(2019)
>> !after(2019)
>>
>> * AP
>> !afore(2024)
>> !after(2024)
>>
>>
>> Here for example the idea is to search the partitions
>> according to they match "natural" date terms, vis-a-vis,
>> referenced dates, and matching the term in any fields,
>> then that the range terms result either one query or
>> two, in the sense of breaking those out and resulting
>> that then their results get concatenated.
>>
>> You can see that "in", here, as "between", for example
>> in terms of range, is implemented as "not out", for
>> that this way the Yes/No/Maybe, Sure/No/Yes, runs
>>
>> match _any_ Sure: yes
>> match _any_ No: no
>> match _all_ Yes: yes
>> no
>>
>> I.e. it's not a "Should/Must/MustNot Boolean" query.
>>
>> What happens is that this way everything sort
>> of "or's" together "any", then when are introduced
>> no's, then those double about, when introduced
>> between's, those are no's, and when disjoint between's,
>> those break out otherwise redundant but separately
>> partitionable, queries.
>>
>> AP not subject|body AI
>>
>> not subject AI
>> not body AI
>> AP
>>
>> Then the filter objects have these attributes:
>> owner, refcount, sure, not, operand, match term.
>>
>> This is a fundamental sort of accepter/rejector that
>> I wrote up quite a bit on sci.logic, and here a bit.
>>
>> Then this is that besides terms, a given file, has
>> for partitions, to relate those in terms of dates,
>> and skip those that don't apply, having that inside
>> the file, vis-a-vis, having it alongside the file,
>> pulling it from a file. Basically a search is to
>> identify SFF files as they're found going along,
>> then search through those.
>>
>> The term frequency / inverse document frequency,
>> gets into summary statistics of terms in documents
>> the corpus, here as about those building up out
>> of partitions, and summing the summaries
>> with either concatenation or categorical closures.
>>
>> So, about the terms, and the content, here it's
>> plainly text content, and there is a convention
>> the quoting convention. This is where, a reference
>> is quoted in part or in full, then the content is
>> either after-article (the article convention), afore-article
>> (the email convention) or "amidst-article", inline,
>> interspersed, or combinations thereof.
>>
>> afore-article: reference follows
>> amidst-article: article split
>> after-article: reference is quoted
>>
>> The idea in the quoting convention, is that
>> nothing changes in the quoted content,
>> which is indicated by the text convention.
>>
>> This gets into the idea of sorting the hits for
>> relevance, and origin, about threads, or references,
>> when terms are introduced into threads, then
>> to follow those references, returning threads,
>> that have terms for hits.
>>
>> The idea is to implement a sort of article-diff,
>> according to discovering quoting character
>> conventions, about what would be fragments,
>> of articles as documents, and documents,
>> their fragments by quoting, referring to
>> references, as introduce terms.
>>
>> The references thread then as a data structure,
>> has at least two ways to look at it. The reference
>> itself is indicated by a directed-acyclic-graph or
>> tree built as links, it's a primary attribute, then
>> there's time-series data, then there's matching
>> of the subject attribute, and even as that search
>> results are a sort of thread.
>>
>> In this sense then a thread, is abstractly of threads,
>> threads have heads, about that hits on articles,
>> are also hits on their threads, with each article
>> being head of a thread.
>>
>>
>> About common words, basically gets into language.
>> These are the articles (the definite and indefinite
>> articles of language), the usual copulas, the usual
>> prepositions, and all such words of parts-of-speech
>> that are syntactical and implement referents, and
>> about how they connect meaningful words, and
>> into language, in terms of sentences, paragraphs,
>> fragments, articles, and documents.
>>
>> The idea is that a long enough article will eventually
>> contain all the common words. It's much structurally
>> about language, though, and usual match terms of
>> Yes/No/Maybe or the match terms of the Boolean,
>> are here for firstly exact match then secondarily
>> into "fuzzy" match and about terms that comprise
>> phrases, that the goal is that SFF makes data that
>> can be used to relate these things, when abstractly
>> each document is in a vacuum of all the languages
>> and is just an octet stream or character stream.
>>
>> The, multi-lingual, then, basically figures to have
>> either common words of multiple languages,
>> and be multi-lingual, or meaningful words from
>> multiple languages, then that those are loanwords.
>>
>> So, back to NNTP WILDMAT and IMAP SEARCH, ....
>>
>> https://www.rfc-editor.org/rfc/rfc2980.html#section-3.3
>> https://datatracker.ietf.org/doc/html/rfc3977#section-4.2
>>
>> If you've ever spent a lot of time making regexes
>> and running find to match files, wildmat is sort
>> of sensible and indeed a lot like Yes/No/Maybe.
>> Kind of like, sed accepts a list of commands,
>> and sometimes tr, when find, sed, and tr are the tools.
>> Anyways, implementing WILDMAT is to be implemented
>> according to SFF backing it then a reference algorithm.
>> The match terms of Yes/No/Maybe, don't really have
>> wildcards. They match substrings. For example
>> "equals" is equals and "in" is substring and "~" for
>> "relates" is by default "in". Then, there's either adding
>> wildcards, or adding anchors, to those, where the
>> anchors would be "^" for front and "$" for end.
>> Basically though WILDMAT is a sequence (Yes|No),
>> indicated by Yes terms not starting with '!' and No
>> terms marked with '!', then in reverse order,
>> i.e., right-to-left, any Yes match is yes and any No
>> match is no, and default is no. So, in Yes/No/Maybe,
>> it's a stack of Yes/No/Maybe's.
>>
>> Mostly though NNTP doesn't have SEARCH, though,
>> so, .... And, wildmat is as much a match term, as
>> an accepter/rejector, for accepter/rejector algorithms,
>> that compose as queries.
>>
>> https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.4
>>
>> IMAP defines "keys", these being the language of
>> the query, then as for expressions in those. Then
>> most of those get into the flags, counters, and
>> with regards to the user, session, that get into
>> the general idea that NNTP's session is just a
>> notion of "current group and current article",
>> that IMAP's user and session have flags and counters
>> applied to each message.
>>
>> Search, then, basically is into search and selection,
>> and accumulating selection, and refining search,
>> that basically Sure accumulates as the selection
>> and No/Yes is the search. This gets relevant in
>> the IMAP extensions of SEARCH for selection,
>> then with the idea of commands on the selection.
>>
>>
>>
>> Relevance: gets into "signal, and noise". That is
>> to say, back-and-forth references that don't
>> introduce new terms, are noise, and it's the
>> introduction of terms, and following that
>> their reference, that's relevance.
>>
>> For attributes, this basically is for determining
>> low cardinality and high cardinality attributes,
>> that low cardinality attributes are categories,
>> and high cardinality attributes are identifiers.
>>
>> This gets into "distance", and relation, then to
>> find close relations in near distances, helping
>> to find the beginnings and ends of things.
>>
>>
>> So, I figure BFF is about designed, so to carry
>> it out, and then get into SFF, that to have in
>> the middle something MFF metadata file-format
>> or session and user-wise, and the collection documents
>> and the query documents, yet, the layout of
>> the files and partitions, should be planned about
>> that it will grow, either the number of directories
>> or files, or there depth thereof, and it should be
>> partitionable, so that it results being able to add
>> or drop partitions by moving folders or making
>> links, about that mailbox is a file and maildir is
>> a directory and here the idea is "unbounded
>> retention and performant maintenance".
>>
>> It involves read/write, instead of write-once-ready-many.
>> Rather, it involves read/write, or growing files,
>> and critical transactionality of serialization of
>> parallel routine, vis-a-vis the semantics of atomic move.
>>
>> Then, for, "distance", is the distances of relations,
>> about how to relate things, and how to find
>> regions, that result a small distance among them,
>> like words and roots and authors and topics
>> and these kinds things, to build summary statistics
>> that are discrete and composable, then that those
>> naturally build both summaries as digests and also
>> histograms, not so much "data mining" as "towers of relation".
>>
>> So, for a sort of notion of, "network distance",
>> is that basically there is time-series data and
>> auto-association of equality.
>>
>
> Then, it's sort of figured out what is a sort
> of BFF that results then a "normal physical
> store with atomic file semantics".
>
> The partitioning seems essentially date-ranged,
> with regards to then getting figured how to
> have the groups and overview file made into
> delivering the files.
>
> The SFF seems to make for author->words
> and thread->words, author<-> thread, and
> about making intermediate files what result
> running longer searches in the unbounded,
> while also making for usual sorts simple
> composable queries.
>
>
> Then, with that making for the data, then
> is again to the consideration of the design
> of the server runtime, basically about that
> there's to be the layers of protocols, that
> result the layers indicate the at-rest formats,
> i.e. compressed or padded for encryption,
> then to make it so that the protocols per
> connection mostly get involved with the
> "attachment" per connection, which is
> basically the private data structure.
>
> This is where the attachment has for
> the protocol as much there is of the
> session, about what results that
> according to the composability of protocols,
> in terms of their message composition
> and transport in commands, is to result
> that the state-machine of the protocol
> layering is to result a sort of stack of
> protocols in the attachment, here for
> that the attachment is a minimal amount
> of data associated with a connection,
> and would be the same in a sort of
> thread-per-connection model, for
> a sort of
> intra-protocol,
> inter-protocol,
> infra-protocol,
> that the intra-protocol reflects the
> command layer, the inter-protocols
> reflect message composition and transport,
> and the infra-protocol reflects changed
> in protocol.
>
> It's similar then with the connection itself,
> intra, inter, infra, with regards to the
> semantics of flows, and session, with
> regards to single connections and their
> flows, and multiple connections and
> their session.
>
> Then, the layering of protocol seems
> much about one sort of command set,
> and various sorts transport encoding,
> while related the session, then another
> notion of layering of protocol involves
> when one protocol is used to fulfill
> another protocol directly, figuring
> that instead that's "inside" what reflects
> usually upstream/downstream, or request/
> response, here about IMAP backed by NNTP
> and mail2news and this kind of thing.
>
>

Click here to read the complete article

On 03/12/2024 10:09 AM, Ross Finlayson wrote:

> So, the usual abstraction of request/response,
> and the usual abstraction of header and body,
> and the usual abstraction of composition and transport,
> and the usual abstraction of multiplexing mux/demux,
> and the usual abstraction of streaming and stuffing,
> and the usual abstraction of handles and layers,
> in the usual abstraction of connections and resources,
> of a usual context of attachments and sessions,
> in the usual abstraction of route links and handles,
> makes for a usual abstraction of protocol,
> for connection-oriented architectures.
>
>

Hipoio

"Protocol" and "Negotiation"

The usual sort of framework, for request/response or
message-oriented protocols, often has a serialization
layer, which means from the wire to an object representation,
and from an object to a wire representation.

So, deserializing, involves parsing the contents as arrive
on the wire, and resultingly constructing an object. Then,
serializing is the complementary converse notion, iterating
over the content of the object and emitting it to the wire.

Here the wire is an octet-sequence, for a connection that's
bi-directional there is the request or client wire and response
or server wire, then that usual matters of protocol, are
communicating sequential processes, either taking turns
talking on the wire, "half-duplex", or, multiplexing events
as independently, "full-duplex".

So, the message deserialization and message composition,
result in the protocol, as about those get nested, what's
generally called "header and body". So, a command or
request, it's got a header and body, then in some protocols
that's all there is to it, while for example in other protocols,
the command is its own sort of header then its body is the
header and body of a contained message, treating messages
first class, and basically how that results all sorts of notions
of header and body, and the body and payload, these are the
usual kinds of ideas and words, that apply to pretty much all
these kinds of things, and, it's usually simplified as much as
possible, so that frameworks implement all this and then
people implementing a single function don't need to know
anything about it at all, instead just in terms of objects.

Protocol usually also involves the stateful, or session,
anything that's static or "more global" with respect to
the scope, the state, the content, the completions,
the protocol, the session, the state.

The idea then I've been getting into is a sort of framework,
which more or less supports the protocol in its terms, and,
the wire in its terms, and, the resources in their terms, where
here, "the resources" usually refers to one of two things,
the "logical resource" that is a business object or has an identifier,
and the "physical" or "computational resource" which is of
the resources that fulfill transfer or changes of the state of
the "logical resources". So, usually when I say "resources"
I mean capacity and when I say "objects" it means what's
often called "business objects" or the stateful representations
of identified logical values their lifecycle of being, objects.

So, one of the things that happens in the frameworks,
is the unbounded, and what happens when messages
or payloads get large, in terms of the serial action that
reads or writes them off the wire, into an object, about
that it fills all the "ephemeral" resources, vis-a-vis vis
the "durable" resources, where the goal is to pass the
"streaming" of these, by coordinating the (de)serialization
and (de)composition, what makes it like so.

start ... end

start ... first ... following ... end

Then another usual notion besides "streaming", a large
item broken into smaller, is "batching", small items
gathered into larger.

So what I'm figuring for the framework and the protocols
and the negotiation, is what results a first-class sort of
abstraction of serialization and composition as together,
in terms of composing the payload and serializing the message,
of the message's header and body, that the payload is the message.

This might be familiar in packets, as, nested packets,
and, collected packets, with regards to that in the model
of the Ethernet network, packets are finite and small,
and that a convention of sockets, for example, establishes
a connection-oriented protocol, for example, that then
either the packets have external organization of their
reassembly, or internal organization of their reassembly,
their sequencing, their serialization.

Of course the entire usual idea of encapsulation is to
keep these things ignorant of each other, as it results
making a coupling of the things, and things that are
coupled must be de-coupled and re-coupled, as sequential
must be serialized and deserialized or even scattered and
gathered, about then the idea of the least sort of
"protocol or streaming" or "convention of streaming",
that the parsing picks up start/first/following/end,
vis-a-vis that when it fits in start/end, then that's
"under available ephemeral resources", and that when
the message as it starts getting parsed gets large,
then makes for "over available ephemeral resources",
that it's to be coordinate with its receiver or handler,
whether there's enough context, to go from batch-to-streaming
or streaming-to-batch, or to spool it off in what results
anything other an ephemeral resource, so it doesn't
block the messages that do fit, "under ephemeral resources".

So, it gets into the whole idea of the difference between
"request/response" of a command invocation in a protocol,
and, "commence/complete", of an own sort of protocol,
within otherwise the wire protocol, of the receives and
handlers, either round-tripping or one-way in the half-duplex
or full-duplex, with mux/demux both sides of request/response
and commence/complete.

This then becomes a matter relevant to protocol usually,
how to define, that within the protocol command + payload,
within the protocol header + body, with a stream-of-sequences
being a batch-of-bytes, and vice-versa, that for the conventions
and protocols of the utilization and disposition of resources,
computational and business, results defining how to implement
streaming and batching as conventions inside protocols,
according to inner and outer the bodies and payloads.

The big deal with that is implementing that in the (de)serializers,
the (de)composers, then about that a complete operation can
exit as of start -> success/fail, while commence might start but
it can fail while then it's underway, vis-a-vis that it's "well-formed".

So, what this introduces, is a sort of notion, of, "well-formedness",
which is pretty usual, "well-formed", "valid", these being the things,
then "well-flowing", "viable", or "versed" or these automatic sorts
of notions of batching and streaming, with regards to all-or-none and
goodrows/badrows.

Thusly, getting into the framework and the protocols, and the
layers and granular and smooth or discrete and indiscrete,
I've been studying request/response and the stateful in session
and streaming and batching and the computational and business
for a long time, basically that any protocol has a wire protocol,
and a logical protocol above that, then that streaming or batching,
is either "in the protocol" or "beneath the protocol", (or, "over the
protocol", of course the most usual notion of event streams and their
batches), is that here the idea is to fill out according to message
composition, what then can result "under the protocol", a simplest
definition of (de)serialization and (de)composition,
for the well-formedness and well-flowingness the valid and versed,
that for half-duplex and full-duplex protocols or the (de)multiplexer,
makes it so possible to have a most usual means to declare
under strong types, "implement streaming", in otherwise
a very simple framework, that has a most usual adapter
the receiver or handler when the work is "within available
ephemeral resources", and falls back to the valid/versed
when not, all the through the same layers and multiplexers,
pretty much any sort usual connection-oriented protocol.

Hi-Po I/O

On 03/02/2024 01:44 PM, Ross Finlayson wrote:
> On 02/20/2024 08:38 PM, Ross Finlayson wrote:
>>
>>
>> Alright then, about the SFF, "summary" file-format,
>> "sorted" file-format, "search" file-format, the idea
>> here is to figure out normal forms of summary,
>> that go with the posts, with the idea that "a post's
>> directory is on the order of contained size of the
>> size of the post", while, "a post's directory is on
>> a constant order of entries", here is for sort of
>> summarizing what a post's directory looks like
>> in "well-formed BFF", then as with regards to
>> things like Intermediate file-formats as mentioned
>> above here with the goal of "very-weakly-encrypted
>> at rest as constant contents", then here for
>> "SFF files, either in the post's-directory or
>> on the side, and about how links to them get
>> collected to directories in a filesystem structure
>> for the conventions of the concatenation of files".
>>
>> So, here the idea so far is that BFF has a normative
>> form for each post, which has a particular opaque
>> globally-universal unique identifier, the Message-ID,
>> then that the directory looks like MessageId/ then its
>> contents were as these files.
>>
>> id hd bd yd td rd ad dd ud xd
>> id, header, body, year-to-date, thread, referenced, authored, dead,
>> undead, expired
>>
>> or just files named
>>
>> i h b y t r a d u x
>>
>> which according to the presence of the files and
>> their contents, indicate that the presence of the
>> MessageId/ directory indicates the presence of
>> a well-formed message, contingent not being expired.
>>
>> ... Where hd bd are the message split into its parts,
>> with regards to the composition of messages by
>> concatenating those back together with the computed
>> message numbers and this kind of thing, with regards to
>> the site, and the idea that they're stored at-rest pre-compressed,
>> then knowledge of the compression algorithm makes for
>> concatenating them in message-composition as compressed.
>>
>> Then, there are variously already relations of the
>> posts, according to groups, then here as above that
>> there's perceived required for date, and author.
>> I.e. these are files on the order the counts of posts,
>> or span in time, or count of authors.
>>
>> (About threading and relating posts, is the idea of
>> matching subjects not-so-much but employing the
>> References header, then as with regards to IMAP and
>> parity as for IMAP's THREADS extension, ...,
>> www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
>> There's a usual sort of notion that sorted, threaded
>> enumeration is either in date order or thread-tree
>> traversal order, usually more sensibly date order,
>> with regards to breaking out sub-threads, variously.
>> "It's all one thread." IMAP: "there is an implicit sort
>> criterion of sequence number".)
>>
>>
>> Then, similarly is for defining models for the sort, summary,
>> search, SFF, that it sort of (ha) rather begins with sort,
>> about the idea that it's sort of expected that there will
>> be a date order partition either as symlinks or as an index file,
>> or as with regards to that messages date is also stored in
>> the yd file, then as with regards to "no file-times can be
>> assumed or reliable", with regards to "there's exactly one
>> file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
>> kinds of things. There's a real goal that it works easy
>> with shell built-ins and text-utils, or "command line",
>> to work with the files.
>>
>>
>> So, sort pretty well goes with filtering.
>> If you're familiar with the context, of, "data tables",
>> with a filter-predicate and a sort-predicate,
>> they're different things but then go together.
>> It's figured that they get front-ended according
>> to the quite most usual "column model" of the
>> "table model" then "yes/no/maybe" row filtering
>> and "multi-sort" row sorting. (In relational algebra, ...,
>> or as rather with 'relational algebra with rows and nulls',
>> this most usual sort of 'composable filtering' and 'multi-sort').
>>
>> Then in IMAP, the THREAD command is "a variant of
>> SEARCH with threading semantics for the results".
>> This is where both posts and emails work off the
>> References header, but it looks like in the wild there
>> is something like "a vendor does poor-man's subject
>> threading for you and stuffs in a X-References",
>> this kind of thing, here with regards to that
>> instead of concatenation, is that intermediate
>> results get sorted and threaded together,
>> then those, get interleaved and stably sorted
>> together, that being sort of the idea, with regards
>> to search results in or among threads.
>>
>> (Cf www.jwz.org/doc/threading.html as
>> via www.rfc-editor.org/rfc/rfc5256.html ,
>> with regards to In-Reply-To and References.
>> There are some interesting articles there
>> about "mailbox summarization".)
>>
>> About the summary of posts, one way to start
>> as for example an interesting article about mailbox
>> summarization gets into, is, all the necessary text-encodings
>> to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
>> or CP-1252, in the base of BE or LE BOMs, or anything to
>> do with summarizing the character data, of any of the
>> headers, or the body of the text, figuring of course
>> that everything's delivered as it arrives, as with regards
>> to the opacity usually of everything vis-a-vis its inspection.
>>
>> This could be a normative sort of file that goes in the messageId/
>> folder.
>>
>> cd: character-data, a summary of whatever form of character
>> encoding or requirements of unfolding or unquoting or in
>> the headers or the body or anywhere involved indicating
>> a stamp indicating each of the encodings or character sets.
>>
>> Then, the idea is that it's a pretty deep inspection to
>> figure out how the various attributes, what are their
>> encodings, and the body, and the contents, with regards
>> to a sort of, "a normalized string indicating the necessary
>> character encodings necessary to extract attributes and
>> given attributes and the body and given sections", for such
>> matters of indicating the needful for things like sort,
>> and collation, in internationalization and localization,
>> aka i18n and l10n. (Given that the messages are stored
>> as they arrived and undisturbed.)
>>
>> The idea is that "the cd file doesn't exist for messages
>> in plain ASCII7, but for anything anywhere else, breaks
>> out what results how to get it out". This is where text
>> is often in a sort of format like this.
>>
>> Ascii
>> it's keyboard characters
>> ISO8859-1/ISO8859-15/CP-1252
>> it's Latin1 often though with the Windows guys
>> Sideout
>> it's Ascii with 0-127 gigglies or upper glyphs
>> Wideout
>> it's 0-256 with any 256 wide characters in upper Unicode planes
>> Unicode
>> it's Unicode
>>
>> Then there are all sorts of encodings, this is according to
>> the rules of Messages with regards to header and body
>> and content and transfer-encoding and all these sorts
>> things, it's Unicode.
>>
>> Then, another thing to get figured out is lengths,
>> the size of contents or counts or lengths, figuring
>> that it's a great boon to message-composition to
>> allocate exactly what it needs for when, as a sum
>> of invariant lengths.
>>
>> Then the MessageId/ files still has un-used 'l' and 's',
>> then though that 'l' looks too close to '1', here it's
>> sort of unambiguous.
>>
>> ld: lengthed, the coded and uncoded lengths of attributes and parts
>>
>> The idea here is to make it easiest for something like
>> "consult the lengths and allocate it raw, concatenate
>> the message into it, consult the lengths and allocate
>> it uncoded, uncode the message into it".
>>
>> So, getting into the SFF, is that basically
>> "BFF indicates well-formed messages or their expiry",
>> "SFF is derived via a common algorithm for all messages",
>> and "some SFF lives next to BFF and is also write-once-read-many",
>> vis-a-vis that "generally SFF is discardable because it's derivable".
>>
>>
>
>
>
> So, figuring that BFF then is about designed,
> basically for storing Internet messages with
> regards to MessageId, then about ContentId
> and external resources separately, then here
> the idea again becomes how to make for
> the SFF files, what results, intermediate, tractable,
> derivable, discardable, composable data structures,
> in files of a format with regards to write-once-read-many,
> write-once-read-never, and, "partition it", in terms of
> natural partitions like time intervals and categorical attributes.
>
>
> There are some various great open-source search
> engines, here with respect to something like Lucene
> or SOLR or ElasticSearch.
>
> The idea is that there are attributes searches,
> and full-text searches, those resulting hits,
> to documents apiece, or sections of their content,
> then backward along their attributes, like
> threads and related threads, and authors and
> their cliques, while across groups and periods
> of time.
>
> There's not much of a notion of "semantic search",
> though, it's expected to sort of naturally result,
> here as for usually enough least distance, as for
> "the terms of matching", and predicates from what
> results a filter predicate, here with what I call,
> "Yes/No/Maybe".
>
> Now, what is, "yes/no/maybe", one might ask.
> Well, it's the query specification, of the world
> of results, to filter to the specified results.
> The idea is that there's an accepter network
> for "Yes" and a rejector network for "No"
> and an accepter network for "Maybe" and
> then rest are rejected.
>
> The idea is that the search, is a combination
> of a bunch of yes/no/maybe terms, or,
> sure/no/yes, to indicate what's definitely
> included, what's not, and what is, then that
> the term, results that it's composable, from
> sorting the terms, to result a filter predicate
> implementation, that can run anywhere along
> the way, from the backend to the frontend,
> this way being a, "search query specification".
>
>
> There are notions like, "*", and single match
> and multimatch, about basically columns and
> a column model, of documents, that are
> basically rows.
>
>
> The idea of course is to built an arithmetic expression,
> that also is exactly a natural expression,
> for "matches", and "ranges".
>
> "AP"|Archimedes|Plutonium in first|last
>
> Here, there is a search, for various names, that
> it composes this way.
>
> AP first
> AP last
> Archimedes first
> Archimedes last
> Plutonium first
> Plutonium last
>
> As you can see, these "match terms", just naturally
> break out, then that what's gets into negations,
> break out and double, and what gets into ranges,
> then, well that involves for partitions and ranges,
> duplicating and breaking that out.
>
> It results though a very fungible and normal form
> of a search query specification, that rebuilds the
> filter predicate according to sorting those, then
> has very well understood runtime according to
> yes/no/maybe and the multimatch, across and
> among multiple attributes, multiple terms.
>
>
> This sort of enriches a usual sort of query
> "exact full hit", with this sort "ranges and conditions,
> exact full hits".
>
> So, the Yes/No/Maybe, is the generic search query
> specification, overall, just reflecting an accepter/rejector
> network, with a bit on the front to reflect keep/toss,
> that's it's very practical and of course totally commonplace
> and easily written broken out as find or wildmat specs.
>
> For then these the objects and the terms relating
> the things, there's about maintaining this, while
> refining it, that basically there's an ownership
> and a reference count of the filter objects, so
> that various controls according to the syntax of
> the normal form of the expression itself, with
> most usual English terms like "is" and "in" and
> "has" and "between", and "not", with & for "and"
> and | for "or", makes that this should be the kind
> of filter query specification that one would expect
> to be general purpose on all such manners of
> filter query specifications and their controls.
>
> So, a normal form for these filter objects, then
> gets relating them to the SFF files, because, an
> SFF file of a given input corpus, satisifies some
> of these specifications, the queries, or for example
> doesn't, about making the language and files
> first of the query, then the content, then just
> mapping those to the content, which are built
> off extractors and summarizers.
>
> I already thought about this a lot. It results
> that it sort of has its own little theory,
> thus what can result its own little normal forms,
> for making a fungible SFF description, what
> results for any query, going through those,
> running the same query or as so filtered down
> the query for the partition already, from the
> front-end to the back-end and back, a little
> noisy protocol, that delivers search results.
>
>

Click here to read the complete article

ASHes to ASHes, DOS to DOS.

tech / sci.math / Re: Meta: a usenet server just for sci.math

Subject	Author
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson
Re: Meta: a usenet server just for sci.math	Ross Finlayson