Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"It's the best thing since professional golfers on 'ludes." -- Rick Obidiah


computers / comp.sys.apple2 / NES games on Apple IIgs

SubjectAuthor
o NES games on Apple IIgsD Finnigan

1
NES games on Apple IIgs

<dog_cow-1713544951@macgui.com>

  copy mid

https://news.novabbs.org/computers/article-flat.php?id=6729&group=comp.sys.apple2#6729

  copy link   Newsgroups: comp.sys.apple2
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dog_cow@macgui.com (D Finnigan)
Newsgroups: comp.sys.apple2
Subject: NES games on Apple IIgs
Date: Fri, 19 Apr 2024 16:42:33 -0000 (UTC)
Organization: Mac GUI
Lines: 155
Message-ID: <dog_cow-1713544951@macgui.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 19 Apr 2024 18:42:34 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="150edc286f130dcb2685913b6d9fd5ae";
logging-data="3266156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19r7InQv7n337uex/3kQkFZ"
User-Agent: Mac GUI Usenet
Cancel-Lock: sha1:kAGWxOZQgZz8PuuBVgRHjK6lVOE=
 by: D Finnigan - Fri, 19 Apr 2024 16:42 UTC

Last week, Lucas Scharenbroich (Super Mario GTE fame) released silky-gs
which he calls "an NES runtime for the Apple IIgs that provides a NES PPU
and APU compatibility layer."

https://github.com/lscharen/silky-gs

Here are some of his remarks on development:

---

Fixed a long-standing rendering issue.  Left half of the video is the
"jitter", right side is fixed.

Since practically I can only render graphics on byte boundaries on the IIgs,
the NES graphics are effectively snapped to even pixel boundaries.  I
noticed that there are quite a few cases where the sprites look "jittery" or
don't exactly line up with their expected positions.  Turns out there
were two bugs.

First, I had been naively clamping both the scroll position and the sprite
horizontal coordinates.  This is not correct when both of the values are odd
in their original NES coordinates, e.g. scroll_x = 1 and sprite_x = 49.  In
this case the sprite should be placed up at x = 50 (byte 25) to align with
the background, but was being clamped to x = 48 instead.  This fix corrected
the calculation, but the jittering was still present.

The second bug was more subtle.  In the Super Mario Bros ROM, the sprite
data is uploaded to the PPU at the start of each frame and then the scroll
positions are set just before exiting the NMI handler mid-frame right after
the status bar.  In order to optimize my rendering, I had been ignoring the
sprite data upload and reading them directly from the game's RAM just before
blitting to the screen.  However, my code only gets control after the ROM
has executed and it has already calculated the sprite positions for the next
frame at that point.  So my code was always reading the sprite data one
frame ahead of the scroll position.

I resolved this by falling back and actually performing the data copy.  I'll
find a way to remove it later because it's quite the performance hog since
even copying the 256 bytes of data in an unrolled loop takes ~2,500 cycles
and this copy is happening during every VBL interrupt instead of just when
the IIgs code tries to render the screen. This ends up burning nearly
100,000 extra cycles per seconds -- a fair chunk of the entire CPU budget,
but working around it will be a game-specific tweak and not a generic
improvement to the runtime.

--------------

And to show concretely how much CPU time the ROM code consumes. Here's
Balloon Fight running at stock speeds with a red border that starts when the
framework calls the NES interrupt vector and clears it back once control
returns.

Once the gameplay starts more than half of the CPU time is spent running the
ROM code, so I'm actually pretty happy getting the frame rates we do.

Incidentally, this is why an accelerator really helps on this code. The NES
code is all in 16KB or 32KB of memory, so very cache-friendly and it's pure
code that doesn't touch anything that requires the system to slow down to
1MHz.

--------------

This might be interesting / useful for people and I need to start
documenting things anyway. Here is a breakdown of how the dirty rendering
works on the IIgs side of things. I have linked into the relevant bits of
code and provided some sidebar comments on what might be possible
optimizations. Most of these are ideas I'm planning to look into
post-release.
We'll assume that the ROM code does not update any tiles in the frame and is
only moving sprites around.
The IIgs fires a native VBL interrupt and begins executing the interrupt
handler.
The interrupt handler calls NES_TriggerNMI, which simulates the NMI
interrupt on the NES
NES code runs for one frame (this could take a while)
Return from the native interrupt
At this point the sprite information is sitting in NES RAM at $0200. This
is technically supposed to be uploaded to the PPU OAM memory via DMA, but
the runtime cheats to avoid copying 256 bytes. The IIgs frame is built as
following.
From the main event loop the framework calls the NES_RenderFrame function
This function disables interrupts and scans the NES sprite information
First it clears 30 bytes used for a bitmap that tracks which scanlines the
sprites are on
Then it scans all 64 sprites (probably some loop unrolling / register
optimizations here)
Each game defines a macro for game-specific exclusions (like fully
transparent sprites)
Sprites outside of the visible IIgs screen area are skipped
A couple of table lookups are used to set the bits in the bitmap
It freezes a few essential variables so the VBL interrupts don't change them
while rendering
Any PPU tiles that changed since the last IIgs render are copied into shadow
memory (we assume no tiles change, so almost no time spent here)
Calls back to the game-specific RenderScreen function
This does the necessary work to set up the graphics screen (this can be done
once and then skipped for a static screen; optimization not yet implemented)
Then calls the dirty screen rendering function
SHR shadowing is turned off
A macro is used to walk current and previous frame's bitmap and draw the
background on lines that are occupied by sprites on the current frame and
the prior frame. These are the lines that need to be erased before drawing
the current frame.
The sprites are drawn. (this can definitely be optimized. No compiled
sprites or any serious work at simplifying yet)
SHR shadowing is turned on
The background is drawn only on lines occupied by sprites in the prior
frame, but not the current frame.
Finally, the lines that the new sprites were drawn on are exposed via a PEI
Slam.
Cleanup is done to get ready for the next frame (as above, possible to be
deferred until a non-dirty update happens)
The macro that does the bitmap walking is moderately expensive. It has to
look at 25 bytes (25 * 8 = 200 scanlines) three times. Not terrible, but a
couple thousand cycles for sure.

----------

So I started looking at the Balloon Fight ROM and it has turned my
experience on its head!

In the Super Mario Bros ROM the Reset vector calls a small routine that
initializes the hardware and memory to a known state and then jumps into an
infinite loop. All of the game logic is driven by the VBL/NMI interrupt
which exits cleanly via an RTI once the logic for the frame is finished.

This is a nice setup because my framework can call the reset vector, get
control back and then set up a native VBL interrupt that simply calls the
ROM routine. A clean 1-to-1 mapping.

In Balloon Fight, the Reset code initializes the system as expected, but
then continues on to the game logic. The code is full of places where it
busy-waits on the hardware VBL flag to clear before continuing on with the
main program’s execution.

The Balloon Fight VBL/NMI logic is just a tiny routine that copies the
current sprite data to the PPU and updates the sound registers.

This is exactly the opposite of how the SMB code was structured, so I’ll
need the find a way to break out of the program code so my framework can get
control back in time to do the actual drawing on the IIgs side of things.

Since the BF ROM has a “wait for vbl” subroutine, I can patch that out to
behave as a “yield” back to my code, but I’ll need to add some simple
context switching management since I’ll no longer be calling an interrupt
vector in the ROM on each native VBL interrupt, but instead returning
control to the point that was yielded on the previous frame.

Doable, but a surprising twist to be sure.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor