Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

I like your game but we have to change the rules.


sport / alt.sports.basketball.nba.la-lakers / [$Bill] extract images from website

SubjectAuthor
* [$Bill] extract images from websiteAmmammata
`* Re: [$Bill] extract images from websiteAmmammata
 `- Re: [$Bill] extract images from websiteAmmammata

1
[$Bill] extract images from website

<uvap2g$fsuu$1@solani.org>

  copy mid

https://news.novabbs.org/sport/article-flat.php?id=1083&group=alt.sports.basketball.nba.la-lakers#1083

  copy link   Newsgroups: alt.sports.basketball.nba.la-lakers
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: ammammata@tiscali.it (Ammammata)
Newsgroups: alt.sports.basketball.nba.la-lakers
Subject: [$Bill] extract images from website
Date: Fri, 12 Apr 2024 09:49:34 +0200
Message-ID: <uvap2g$fsuu$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 12 Apr 2024 07:49:36 -0000 (UTC)
Injection-Info: solani.org;
logging-data="521182"; mail-complaints-to="abuse@news.solani.org"
Cancel-Lock: sha1:NEq8vbHtGTqiTM03jSjCfwiGKcA=
X-Newsreader: MesNews/1.08.06.00-gb
X-User-ID: eJwFwYkBwCAIA8CV5EnQcYiW/UfoHYLGW0kwMRit7dbVjzIo9qWnSo9YdvdnejPUynCcUl8/x0Lt8xEehh9UeBVL
 by: Ammammata - Fri, 12 Apr 2024 07:49 UTC

http://dlib.coninet.it/bookreader.php?&c=1&f=7664&p=7#page/1/mode/1up

this is just an example, it's a "only sports" newspaper
it looks like the standard access is sort of blocked by a mandatory
registration, but having the link you can actually get the scans

I browse it manually changing the parameter in the link: f=<issue>
since it lacks a previous/next button

I'd like to know whether it's possible to extract all the scanned pages
with a batch file, saving them on my disk, for a faster data search

'f' goes from 1 to 14000, with several unused numbers

pages are shown at 25%, 50%, 100% and 200%, the latter would be the
better (I presume the image is always the same and the site resizes it
on the fly)

any help is appreciated :)

Go Lakers! who cares about the final standings? being in the play-in
does matter, then you must win them all...

--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
............ [ al lavoro ] ...........

Re: [$Bill] extract images from website

<XnsB162C42B80D9ammammatatiscalineti@127.0.0.1>

  copy mid

https://news.novabbs.org/sport/article-flat.php?id=1091&group=alt.sports.basketball.nba.la-lakers#1091

  copy link   Newsgroups: alt.sports.basketball.nba.la-lakers
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: ammammata@tiscali.it (Ammammata)
Newsgroups: alt.sports.basketball.nba.la-lakers
Subject: Re: [$Bill] extract images from website
Date: Sat, 27 Apr 2024 23:12:18 -0000 (UTC)
Message-ID: <XnsB162C42B80D9ammammatatiscalineti@127.0.0.1>
References: <uvap2g$fsuu$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 27 Apr 2024 23:12:18 -0000 (UTC)
Injection-Info: solani.org;
logging-data="110681"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Xnews/??.01.30 Mime-proxy/2.3.c.1 (Win32)
Cancel-Lock: sha1:+goigYN0MNtIzMo0yMpuJxDHwM0=
X-User-ID: eJwNyskBwDAIA7CVIMYc4xQa9h+h1VuEq0+Y043Lfds4MstORnUdW2gHqN12oLFa+LtcY9neSnQ6UgTAs+cDPeAUSw==
X-Face: s5/Tl9"^@"z`/]&NeE,Wl.^B~E2x9PA?1n:XZq2^T2Prz*},H~M'2.*@~v9MXx|OL8<EqC0MU$v+`Pu;`eq$N12\Nc*)cp$iBgD/wysO=)P%n6)\Xf}L\E$-0s\mD::Dx\6QwNh<U!LBH1Xj(=zy?OV.|=]We",DS\6[:
 by: Ammammata - Sat, 27 Apr 2024 23:12 UTC

Il giorno Fri 12 Apr 2024 09:49:34a, *Ammammata* ha inviato su
alt.sports.basketball.nba.la-lakers il messaggio
news:uvap2g$fsuu$1@solani.org. Vediamo cosa ha scritto:

> http://dlib.coninet.it/bookreader.php?&c=1&f=7664&p=7#page/1/mode/1
> up
>
> this is just an example, it's a "only sports" newspaper
> it looks like the standard access is sort of blocked by a
> mandatory registration, but having the link you can actually get
> the scans
>
> I browse it manually changing the parameter in the link: f=<issue>
> since it lacks a previous/next button
>
> I'd like to know whether it's possible to extract all the scanned
> pages with a batch file, saving them on my disk, for a faster data
> search
>
> 'f' goes from 1 to 14000, with several unused numbers
>
> pages are shown at 25%, 50%, 100% and 200%, the latter would be
> the better (I presume the image is always the same and the site
> resizes it on the fly)
>
> any help is appreciated :)
>
> Go Lakers! who cares about the final standings? being in the
> play-in does matter, then you must win them all...
>

$Bill wrote:

The image URLs look like this:
http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_178_000
1 Not sure how you'd have to loop through them and ignore the empty ones.

Hi

the sample page corresponds to this link:

http://dlib.coninet.it/bookreader.php?&c=1&f=441#page/1/mode/1up

0001 should be the publication code
1929 is the year
178 is the issue
0001 is the page, I presume

I will create a test list of links, with ONE year (1929), all the
available issues (from 1 to 178) and some pages (say 1 to 8, those
missing will return an error), then WGET will run the task

thank you for spotting the images links :)

--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
............ [ al lavoro ] ...........

Re: [$Bill] extract images from website

<XnsB162104587EB3ammammatatiscalineti@127.0.0.1>

  copy mid

https://news.novabbs.org/sport/article-flat.php?id=1092&group=alt.sports.basketball.nba.la-lakers#1092

  copy link   Newsgroups: alt.sports.basketball.nba.la-lakers
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: ammammata@tiscali.it (Ammammata)
Newsgroups: alt.sports.basketball.nba.la-lakers
Subject: Re: [$Bill] extract images from website
Date: Sat, 27 Apr 2024 23:35:58 -0000 (UTC)
Message-ID: <XnsB162104587EB3ammammatatiscalineti@127.0.0.1>
References: <uvap2g$fsuu$1@solani.org> <XnsB162C42B80D9ammammatatiscalineti@127.0.0.1>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 27 Apr 2024 23:35:58 -0000 (UTC)
Injection-Info: solani.org;
logging-data="110681"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Xnews/??.01.30 Mime-proxy/2.3.c.1 (Win32)
Cancel-Lock: sha1:q/qQLG6b+wwapUXqAlpLbE5CcZM=
X-Face: s5/Tl9"^@"z`/]&NeE,Wl.^B~E2x9PA?1n:XZq2^T2Prz*},H~M'2.*@~v9MXx|OL8<EqC0MU$v+`Pu;`eq$N12\Nc*)cp$iBgD/wysO=)P%n6)\Xf}L\E$-0s\mD::Dx\6QwNh<U!LBH1Xj(=zy?OV.|=]We",DS\6[:
X-User-ID: eJwNyckRACEIBMCUhmMQw0FY8w9hrX42LSR6eTCc9zmj30iO4frSVqCHZrQtN1/3IKc/ntLEXgWt7FMQ7KbLD1fyFXI=
 by: Ammammata - Sat, 27 Apr 2024 23:35 UTC

Il giorno Sun 28 Apr 2024 01:12:18a, *Ammammata* ha inviato su
alt.sports.basketball.nba.la-lakers il messaggio
news:XnsB162C42B80D9ammammatatiscalineti@127.0.0.1. Vediamo cosa ha
scritto:

> I will create a test list of links

some Excel VBA:

Sub create_links()

' sample link:
http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_178_000
1

Dim g, y, i, p As Integer
Dim g1, y1, i1, p1 As String
Dim image, link As String

' https://www.automateexcel.com/vba/write-to-text-file/
Dim FileName As String
FileName = "c:\BibDig\1953-54\test.txt"

Open FileName For Output As #1

For g = 1 To 1
g1 = Format(g, "0000")
For y = 1929 To 1929
y1 = Format(y, "0000")
For i = 1 To 178
i1 = Format(i, "000")
Debug.Print i1 ' just to check it's working... ;-)
For p = 1 To 8
p1 = Format(p, "0000")
image = g1 + "_" + y1 + "_" + i1 + "_" + p1
link = "http://dlib.coninet.it/view_foto_inside.php?tipo=o&id="
+ image
Print #1, link
Next p
Next i
Next y
Next g

Close #1

End Sub

I ran the above code and got a text file like:

http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
1 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
2 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
3 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
4 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
5 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
6 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
7 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_001_000
8 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
1 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
2 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
3 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
4 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
5 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
6 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
7 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_002_000
8 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_003_000
1 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_003_000
2 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_003_000
3 http://dlib.coninet.it/view_foto_inside.php?tipo=o&id=0001_1929_003_000
4

then wget started downloading: missing images are two bytes long and
contain "no" but the rest is fine (I'll rename the downloaded files):

view_foto_inside.php@tipo=o&id=0001_1929_051_0001 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0002 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0003 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0004 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0005 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0006 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0007 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_051_0008 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0001 2,248,887 28/04
/2024 01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0002 2,257,880 28/04
/2024 01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0003 2,336,223 28/04
/2024 01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0004 2,264,053 28/04
/2024 01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0005 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0006 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0007 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_052_0008 2 28/04/2024
01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0001 2,148,825 28/04
/2024 01:27 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0002 2,157,392 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0003 2,374,169 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0004 2,366,378 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0005 2,403,519 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0006 2,310,746 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0007 1,913,336 28/04
/2024 01:28 -a--
view_foto_inside.php@tipo=o&id=0001_1929_053_0008 2,113,558 28/04
/2024 01:28 -a--

ok, it's 1.30 AM, time to go to bed :)

next step: identify better the meaning of the digits 0001_1929_178_0001

--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
............ [ al lavoro ] ...........

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor