Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
 If you find that it is broken please let me know here rocksolid.nodes.help


devel / comp.lang.python / Re: Code improvement question

SubjectAuthor
o Code improvement questionMRAB

1
Re: Code improvement question

<mailman.247.1700004500.3828.python-list@python.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=24709&group=comp.lang.python#24709

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Tue, 14 Nov 2023 23:25:10 +0000
Lines: 49
Message-ID: <mailman.247.1700004500.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de wZ6vCMmtio2oDV3mz1czeA0C6FVADqYt7NE7t2Inz4oA==
Cancel-Lock: sha1:5DQX0/vAEu+d6G/pKZOuJTE5f6U= sha256:Scnb5zXptYPUEXu1a1bFyJGG/zBMPITKeGnCUmES6ow=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=WRqeK5Fs;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'def': 0.04; '"""': 0.09;
'from:addr:python': 0.09; 'received:192.168.1.64': 0.09; 'set.':
0.09; 'smaller': 0.09; 'subject:Code': 0.09; 'bits': 0.16;
'extracting': 0.16; 'from:addr:mrabarnett.plus.com': 0.16;
'from:name:mrab': 0.16; 'hints': 0.16; 'message-
id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16;
'received:84.93.230': 0.16; 'received:plus.net': 0.16;
'subject:improvement': 0.16; 'super': 0.16; 'testing.': 0.16;
'wrote:': 0.16; 'advance.': 0.17; 'subject:question': 0.17;
'to:addr:python-list': 0.20; 'code': 0.23; "i'd": 0.24; 'pattern':
0.26; 'else': 0.27; 'bit': 0.27; 'header:User-Agent:1': 0.30;
'python-list': 0.32; 'specified': 0.32; 'received:192.168.1':
0.32; 'but': 0.32; "i'm": 0.33; 'header:In-Reply-To:1': 0.34;
'files': 0.36; 'received:192.168': 0.37; 'thanks': 0.38; 'use':
0.39; 'match': 0.40; 'me.': 0.62; 'skip:\xc2 10': 0.62; 'here':
0.62; 'skip:r 40': 0.64; 'improve': 0.66; 'numbers': 0.67; 'drop':
0.69; 'pieces': 0.70; '8bit%:100': 0.76; '"")': 0.84; 'cas': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1700004312; bh=63r+RU6IjI9mX5b5jUa8FRU7Cp+wrYbWocidUvxi/SA=;
h=Date:Subject:To:References:From:In-Reply-To;
b=WRqeK5Fsh5tK4W6entpJYkUqF4uO5ydi1+dm7+peH+an2ZBxVAW1blyzIBaKs0xBs
TfWCPpRECzKMHwGYKkc9K7RMpuSECDqsNP80yCQ77ana3GRqM2K7MoosOTmt2TTMlQ
yXQWLkbL3Xyk+ApKcAWwHXCIRmeJvoLyOqPRIoD3WJYByyFXxoHg70aGB8ch3nU3id
Y8cJrLMdNYS7i+on6CjtgrZIKx7PgdRsY2FitEcFLP00sFQWqjrbse++D4cL/MhQO6
5hh2ZViklPXxK7Q/C54VOPwJouly918e/Hjycg7R1K8BgQrJOduPdsr7jw7fd61sSi
hCHeSMQZzpfDg==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=U8eBsMnu c=1 sm=1 tr=0 ts=655401d8
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=OQbCfLdHpTyYd1SEJGcA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
X-CMAE-Envelope: MS4xfDpgMj8+z3PqBsSckKZ2BSH7zhuSIwCcabag/e5VEYh8M2mbifjvGBF9dSqxDB+podBXlCZqx20B2K650NJGNETKvPd1/ZthTWcSugOB74+5fJPYy5/j
evi2n0O5ijWksPx1BQd1952UzaE+pWbpVmSTXHS2uHjAt7rS+fI9i02eTXxiToKQLhZb6lFzZN6zQRAH123FGLnomFQV7/PUDRA=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
 by: MRAB - Tue, 14 Nov 2023 23:25 UTC

On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
> I'd like to improve the code below, which works. It feels clunky to me.
>
> I need to clean up user-uploaded files the size of which I don't know in
> advance.
>
> After cleaning they might be as big as 1Mb but that would be super rare.
> Perhaps only for testing.
>
> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
> xxxxxxx-xx-x eg., 1012300-77-4
>
> def remove_alpha(txt):
>
>     """  r'[^0-9\- ]':
>
>     [^...]: Match any character that is not in the specified set.
>
>     0-9: Match any digit.
>
>     \: Escape character.
>
>     -: Match a hyphen.
>
>     Space: Match a space.
>
>     """
>
>     cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>
>     bits = cleaned_txt.split()
>
>     pieces = []
>
>     for bit in bits:
>
>         # minimum size of a CAS number is 7 so drop smaller clumps of digits
>
>         pieces.append(bit if len(bit) > 6 else "")
>
>     return " ".join(pieces)
>
>
> Many thanks for any hints
>
Why don't you use re.findall?

re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)


devel / comp.lang.python / Re: Code improvement question

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor