• xor_mgambrell_0 xor_mgambrell_0

    Let's talk about hacking roms. Well, more precisely, let's talk about NOT hacking roms.

    I can do it, but I'm not great at it.. and I find most emulator debuggers quite disagreeable. I think it would take me a long time to do some of the hacks I need to do. I have some workarounds that have proved quite versatile... they are MORE versatile than hacks, in some cases. And less, in some others. But they can be way faster than hacking, in some cases.

    First, I should mention, classic game genie and action replay cheats are easier than hacks. Yeah, sure, always having 99 lives isn't quite as elegant as it would be to simply never lose any of your 3 starting lives. And blinking invincible as if you just got hit for the entire game isn't quite as elegant as simply clipping through bad guys without taking damage. But it does get the job done.

    Moving on to more interesting things: I call them "input macros". As a simple example, I have a game with a hidden mode that requires pressing select five times on a menu. Now, I don't want to even have to expose the select button to the user as a binding. So what I do instead is configure the triangle button to set up a sequence that seizes control of the input and sends a series of five select button presses across several frames. It can get more complex than that. In Turrican, we have dozens of screens full of code doing this kind of stuff. For instance, to drop a bomb you have to crouch and shoot. Ideally we'd have a button that does this instead, so we have to crouch for the right (minimum) number of frames and then presses the button at the right time and waits for long enough crouched for it to register.

    Another input-related thing we need to do is to fix menus so they behave in more sensible ways, using the console's normal accept/cancel buttons (which varies by console and region of course, most noticeably differing between Nintendo and everyone else). This requires unhooking the normal input bindings completely and simulating a different kind of setup depending on whether we're in a menu. Detecting if we're in a menu... can be hard. In theory somewhere there's probably a variable that says what game mode we're in.. maybe, maybe not. Maybe you can only tell by noticing that certain code only runs when you're in a menu, and you can detect when that PC is getting hit. That's too close to hacking for my tastes though. Here's what I do instead: I scan the VRAM for particular patterns of tiles. That works for any screens that have some unique static elements. If that's not good enough, and it's a transitory condition, you can set up a state machine to catch transitions between menus.

    Scanning VRAM is kind of more of a dumb thing. You do this, I do that. If you try to analyze the game's programming, you can get into mind games really quickly. The game will be reusing code for something, or it will do every screen in a completely different way (that's totally normal; each screen may have been programmed by a different person or be pushing the hardware enough to be basically a different program) so your reverse engineering workload ends up proportional to the number of scenes. But scanning VRAM is the same trick every time, and you can build tools to help you do it.

    The final workaround I want to mention is the inverse of scanning VRAM -- it's modifying it! Suppose a game prints something static in Japanese and you want it to be in English. The tiles are compressed, the tilemap data's compressed... you're gonna crack all that, PLUS the code that draws it, and redirect it all somewhere? NOPE! Detect it, and replace it. This is easy if you have spare tiles in VRAM. But you don't always...

    I came up with a neat way to fix that which has totally worked for me on the PC Engine. It's not directly scalable to the other consoles, but I'll tell you how it might be (not proven yet.) The PC Engine is odd for having twice as much VRAM addressible for tiles as it has VRAM. So you can just park extra tiles in the 2nd half of VRAM and rewrite the tilemap to reference it. A big concern when patching art like this is whether it will fade out with the rest of the screen. You could patch it with pngs that don't use the proper console palette, but what's going to make it fade out? It can be really tricky (in an emergency you can fade it yourself by checking a known-white color in the palette).

    Here's the plan I have for other consoles (and I'm definitely going to have to attempt it on the SNES soon). Who says the hardware can't have a separate bitplane tacked onto it in a way that the CPU can't even access? I can access it from the emulation framework. So I will just "add" tiles by putting them in an auxiliary bank of vram and then set a bit in the separate fake bitplane which will signal the PPU renderer for the system to fetch the tiles from the auxiliary blank.

    This is kind of an approximation of how consoles have extended their PPU capabilities before (the GBA and the NDS did quite a bit of this). There's no reason you can't apply it more generally, though. I feel sure you could do it on a playstation, for instance. In fact every HD textures approach for 3d consoles is working in a philosophically similar way. There's additional resolution packed away in a buffer not accessible to the CPU (because it wouldn't know how). I look forward to running into the problem on a PSX game, but I have to find a PSX game to port first...

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    One of the funnest parts of my job is picking out icons for projects on our gitlab server. I try to balance encapsulating the spirit of the game with finding something iconic while trying to keep it varied so it's not a wall of 8bit sprites or anime girls. I don't always do a good job but I do put some serious consideration into it. I made a collage of jumbled up icons.icons.png

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    So I've got this game that plays 4 movies at once. Well... due to how the engine works, it actually plays 8 movies at once--a 2nd instance is spun up paused and queued after the 1st instance so that it's hot and ready to go in order to effect loops. This is actually important... many games in this engine are using it for looping effects, instead of sprite animations. It's super annoying.

    Needless to say, I did not plan for this possibility when porting the engine. When I first started handling this game, I added support for hardware-accelerated movie decoding, because I knew it was doing 2 at once at least (the 4/8 was a surprise) and game consoles can barely manage 1. Well... wouldn't you know it, but some hardware-accelerated video decoders use, like, 13 threads per video. And game consoles are always limited in resources (the overhead and unpredictability required to create the abstraction of unlimited resources is an unacceptable imposition--you are supposed to accept the limitations in exchange for the benefits) so of course the threads are limited, also. Now the console-makers aren't stupid. There's plenty of threads for a game to play a movie or two. But 8 movies? Actually there's a reason the movie decoder uses 13 threads that really pisses me off and would piss you off too if you heard it, but it's too revealing to go into. Really unflattering to all involved. To be fair, I am a little wasteful of threads, too. In fact it takes me 3 threads to play a video on my own, on top of the 13 for the hardware decoder. One for mux, one for video, one for audio. But you can tell what some of these 13 threads are used for, and it's so lazy.

    Man, let me tell you some of the other things I did for this game. To cut down on time wasted copying 1080p frames (did I mention these were 1080p and even bigger videos?) from a hardware-accelerated output memory, to an engine-internal soft-surface, and then BACK to a texture.. what I do is allocate a texture, specify it as the hardware-accelerated decode target, and then return the mapped pointer as the engine-internal soft-surface... and then, after all that, add some magic logic in my port layer that detects soft-surfaces which are actually mapped texture pointers which then elides the copy when creating the GPU texture from the soft-surface. Whew! That really took some doing. Probably one of the most amazing things I ever did, but what's even more amazing about it is that I didn't do it in a fork of our porting framework but feel it's generally stable enough to keep around forever. WOW that does not happen often.

    So... since I added the hardware decoder in order to decode movies fast enough, I no longer have the threads to do it (as soon as the game gets to the near-endgame point where 8 play at once).

    I thought about re-engineering my system so it didn't need 3 of my own threads per video which would actually get me down to near the limit... I should really not be so sloppy... but I ran the numbers and even though it would put me in the safe zone (which is why I thought of it in the first place) it wouldn't put me FAR ENOUGH in the safe zone. Why knows what this game may do on top of 8 videos? Ahhh now's the time to say, I've got like 30 threads reserved... for playing sound effects.... sigh...... ok, let me explain.

    This engine doesn't really distinguish between music and sound effects. Sound effects are streamed just like music. And sometimes a game will use a long sound effect for something like "guiselect.wav" or "chitchatter.wav" for no good reason--that is, it will have a long silent tail or a super long reverb or somesuch. Yeah, I could fix that, but I'm trying to build a rugged engine port here. And don't forget, games may be playing some ambient sounds--doubled up for the looping the way this engine does. So I did some measurement and figured out 30 should be safe. I didn't know I'd need 100 threads for movies when I figured that.

    And anyway that re-engineering is... pretty not very safe. I don't want to do this kind of thing now. This game has been a thorn in our side for months now.

    So my next step was to find out if I can fix this doubling-up for the loops that the engine does. Turns out I could. I had to get help from the engine devs to figure out how, but if I simply played the second video after the first one ended, it would look nice enough -- on this game, at least. I didn't want to break any other games using this engine, so I did this hack just for this game. That combined with ANOTHER hack to lower the 30 multimedia threads to 20 (since by now I'd seen the characteristics of this game enough to know it would be safe) got me safely into the safe zone. Yeah, there's safe zones and then there's the safe zone in the safe zone. I honestly couldn't believe I could spin up new movies without even hitching a frame, but that's how it was. The videos, despite being 1080p, are actually a little low on the framerate side, so that helps things.

    Now after all this, would you believe.... in the very same scene that plays 8 videos... microseconds after it would have exhausted its threads and crashed... it now HANGS because of a new bug in my hardware-accelerated video decoding that caused threads to go into busy loops trying to consume frames without leaving the main thread free to produce them? Well, that was just my bad. Only a game that used so many videos at once would have ever smoked that out. Took some time to find but wasn't so hard to fix. Would not have been a problem if I hadn't had to hardware-accelerate the videos for this game.

    (Actually I'm glad I finally did the hardware-acceleration--I felt like such a loser leaning on libvpx to do it, even though I have nice things to say about the library. It's just not the Console Way to use that, though)

    This @*#@ing game also smoked out a 1/10000 times problem in my basic video playback logic which would rarely hang due to a race condition between the engine consuming the last frame and the porting framework producing it. It just plays so many videos that it actually made it happen reliably. So I had previously fixed it (just for this game) but made a mistake in it: I'd added a std::atomic<int> which wasn't initialized to zero. Yeah, those aren't initialized to zero. In general it's completely impossible to predict what anything in c++ is initialized to nowadays due to some combination of the semantics being completely off the rails in complexity, and the language design being absolutely hostile to users (I mean, even 20 years ago we were joking that everyone knew only half of c++ but it's a different half for everyone); and I am only very slowly learning how to properly initialize everything. I mean, you don't want to type std::vector<int> = {} -- isnt that stupid? But you better type std::atomic<int> = {0} or else it could be garbage (std::atomic<int> = 0 is a compile error for reasons so complex explaining them would be a great term paper for a graduate student)

    By the way, while I'm typing this, I'm nursing the game through an autoplay mode with my left pinkie finger. On top of all this other crap (and more problems just for this game I haven't told you about) the game is simply running out of memory due to some unlucky timing of spikes before garbage collections. So I'm testing some complete playthroughs now to see if some finetuning is going to work to fix that.

    Now I'm adding "playthroughs" and "finetuning" and "autoplay" to my dictionary file in this text editor. Man, what an annoying day.

    Well, I've got 866MB memory free and the game's 2/3 over. Time to focus on ending this thing (and then test the campaign again without rebooting the game, just in case). Seeya!

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    I've got this game with 12 hours and 23 minutes of extra music. It's pretty good music, so it's worth it.... I'm just listening to some of it for the first time as I write this....

    This is supposed to be split up between two skus somehow, and released on switch, so the media size is an issue (since it affects the manufacturing cost). We also have a to-be-determined amount of extra junk in the form of movies and collectible art. So I don't know yet what format we're going to ship that music in. And there's no need to even encode it on the big console versions since the blu ray discs are huge. I want to encode the music as best as possible for the switch version, but it's not clear what quality level to use. Currently I'm assuming it will be ogg, but I really should make some effort to support opus... but that library is kind of hard to compile. I gave up on it once. Hmm...

    We're using git for source control, so I didn't want to repeatedly assault my repository with multiple versions of encoded music as I continually fool around and change my mind. I'm constantly worried about git repositories getting too large to be practical. So I commit them as wav to another repository as a submodule, knowing I could completely wipe the that repository and recreate it if for some reason I need to. I used wav instead of flac for convenience.. it was already gonna be a big repository, cutting the size to 2/3 won't really make a big difference.

    So I made this little tool that will convert all the wav files into encoded forms in whatever formats and bitrates are needed. For development purposes I have it set to make low quality oggs just so our test builds aren't huge. OK, problem mostly solved.

    Now today I get a bunch more more music and one of them is a 170MB file... even flac'ed, that's more than 100MiB. Did I mention I'm using github for this and there's a file size limit of exactly 100MiB for any file in the repository. I am pretty sure there is also be a limit to how big a single push can be, So I commit and push them in ~100MB batches.

    So.. hmm... what to do? Well, since I have that encoding tool... I just split the wav in two halves and commit those. The tool now knows how to merge a -PART1.wav and a -PART2.wav file into a single one before encoding it. Problem solved!

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    I ran into a problem today with a game that we thought was already totally DONE and basically ready to go to manufacturing. Sony bounced it back because it sat too long at a black screen while booting up (like, almost a minute). They gave me the clue I needed to pretty much immediately figure it out: this only happened if you started the game right after inserting the disc for the first time. After the game is fully copied to the hard disk, the problem doesn't occur.

    This isn't the first game I've had go on a PS4 disc, but I have never actually had to worry about this before. I mean, it's always a concern if you're making real CD games but my stuff has been all digital-first with physical editions as an afterthought and so I simply don't even bother to think about it nowadays. And it's a relief not to.

    This one game posed a problem though because it was loading 70 giant high-def pngs during this time. ... on as many threads as the system has. 70MB worth. It would ordinarily take longer to decode them than to read them so the threads are useful. It's not a great idea on a console to let the OS read the media from so many threads, but it doesn't really matter for this kind of thing since the reads are done whole-file-at-a-time and there's no reason for the game or the OS to do anything smart about scheduling them. Actually in practice that work (when the game is fully installed) happens in the blink of an eye.

    Except... in this case... PlayGo was busy installing the the game to the hard disk. So the disc was being asked to constantly seek between the pngs and the task of reading the whole contents. That turned something which was supposed to take a blink of an eye plus an invisible background process (lasting about 10-20 seconds I believe) into 1 minute of thrashing.

    Actually I didn't even know PlayGo was gonna do that. I always thought that it would finish installing the game to the hard disk before it would boot, just like it would for a downloaded game (since I do not configure PlayGo to do anything smart.) But apparently not.

    So after considering waiting for PlayGo to report that it was done, or adding a loading screen, I came up with something super safe and easy: my engine has the capability to load rom files into a memory cache by glob. This happens in one thread. So I use the synchronous version of this to load all the pngs into memory cache. This takes about 17 seconds since it still has to fight with PlayGo, but it's only one thread so it isn't as bad. Once this is done the pngs are loaded through unmodified code that transparently uses the cached files in my virtual filesystem--and that's the part that takes a blink of an eye. So I transformed umpteen threads thrashing the disc to 2 and cut the runtime to 33%.

    The moral of the story is that when dealing with media access on game consoles you are expected to do this kind of stuff. The OS isn't gonna bail you out.

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    Some platforms have really weird patching systems. If you do something like frameshift all your data by 1 byte in an update, the patch engine will fall apart and it will be 100% the same size as the new file. And there are limits to how large patches can be, too. This sounds like it might be amateurish, but there's reasons for it: making a fully-powered patching system has performance implications. Specifically on the disk IO system. And consoles are all about the bargain of the platform guaranteeing resources while the developer guarantees he'll use them. So the patching apparatus cannot be infinitely-powered, like to the extent where it's a fully optimal difference which requires unbounded time to uncrunch patch data while referencing sectors from all over the version0 datafile. Specifically that last part: in the worst case scenario: the individual seek and IO time will get out of control, as will the wear and tear on the flash media. I call it read amplification because that's basically what it is.

    There's a piece of the puzzle you're missing if there was something creepy in the above that was giving you a bad puzzled feeling. And it's this: the premise for a console platform's patching engine is that the version0 datafile is always present. It's in the physical media! The patch must be applied on the fly. There's no separate storage for it. Even on modern consoles, this same principle may be in effect.

    On the developer's side of that guarantee, then: we are supposed to read the docs about how the platform's patching apparatus works, and then design disk file formats which don't cause problems for it. This always catches Unity developers by surprise--the default way for doing things there makes a complete mess out of the data every time you touch anything. It's always sad to see the horror slowly dawn on them. Well, it catches everyone by surprise at least a little the first time, but at least the developer of a proprietary engine would have had this kind of thing on their radar already (and it's not as likely to be their first time doing anything.)

    But I'm personally in a bit of a pickle. See, I'm porting other people's engines, and they've unknowingly made poor choices about the structure of the packfiles. I don't want to hack the engine (part of my business plan is to do everything I can to touch engines and games as little as possible, to keep costs low.) But what do I do when I need to patch one of these games? I ran into this scenario last week. So here's how I handled it.

    I have a file format which consists of an 8 byte record per 'sector', where the 'sector' size can be tuned somewhat depending on the nature of the data. This sector consists of what I call a 'literal prefix', a 'chunk', and a 'literal suffix'. Tracking that requires 8 bytes per sector which can be directly-indexed by sector. The literal prefix and suffix are stored in the patch file at the end together. The 'chunk' comes from the version0 datafile. The sector may naturally be 100% literal prefix or 100% chunk--and commonly is. After all, we're dealing with packfiles, and unless internal data has been re-indexed somehow so a bunch of IDs and references are changed (we'll hope that doesn't happen *), most of the data is the SAME--only moved around. And most importantly, the implications on read amplification are contained: reading one sector of data from the patched file results in reading no more than two sectors of data from the version0 file.

    I made an effort to find the best 'chunk' that I possibly could. I do exhaustively search for those, using an acceleration data structure that consists of a list of offsets of strings in the version0 file beginning with each 16bit value. So that actually cuts the number of strings in the version0 file I have to search by 1/65536th. Using that filter, I just look for the longest matching string I can. Even this would be excruciatingly slow (hours?) but I found another technique: It so happens that the best string in version0 for version1's sector 124 is most commonly the string immediately succeeding version1's sector 123 chunk (as long as the literal suffix for sector 123 was empty--the sector is concluded with a chunk). That is, in other words, I can find the next piece of data from the packfile's inner-packed file without even needing to know anything about the structure of the packfile. With THAT (and a 32 thread processor) I can handle a 1GB packfile in 30 seconds or so. The whole algorithm is trivially parallelizable. Even the acceleration table is parallelizable (each of 256 threads indexes the strings starting with that byte). That table is about 10GB for a 1GB packfile, when it's all said and done.

    I did evaluate all the existing patch file formats, but none of them were really designed for on-the-fly use. For sensible reasons. I'm sure in exotic filesystems there may have been something like I needed but it would have been impossible to reuse that stuff. I didn't even look for it.

    Now to finally complete the picture, what we do is make version1 of the game out of the version0 packfile + the patch file. The patch that gets shipped to users is no larger than our patch file, then.

    Actually the format I devised is kind of bloated, so I do compress the whole thing. Could compress to anywhere from 10% to 99% of the original file size.. depends on details. Since this is done on the entire file, and you may have noticed my read-amplification analysis didn't discuss reading the literal parts, it follows that my patch files are loaded completely into memory. My defense of this is that any engines we have that make such good use of a packfile that they are enormous enough to cause problems, will also be rather efficient about NOT loading too much data in memory, meaning I will have plenty of room to hold a patchfile. Hope I'm not wrong...

    I had to modify my engine to try opening myfile.ext.patch any time myfile.ext is opened (actually this analysis is done when our filesystem is built during content building) and then have the ability to do this patching on the fly. But there's one more read-amplification issue I had to deal with: since my patch file is sector-based, and usually games will read entire multi-sector files from the packfile all in one operation, I don't want to turn that into several individual-sector reads from the physical filesystem. To make sure this doesn't happen, I interposed another layer of logic that 'coalesces' adjacent reads from the version0 file and doesn't execute them until a read comes from somewhere else.

    Whew, that was a lot of work! I hope it works out. Part of the concept here is that no work is required IN ADVANCE, so despite my testing, I won't be 100% sure about it until I actually run into the scenario where I have to do the patch (this whole endeavour was precautionary before we shipped a game with huge packfiles).

    • In case this does happen, I have another strategy planned. What I'll do is find the best-matching sector's worth of data in the version0 file and then store the XOR of it vs version1 in the patch. The whole patch will then be compressed (and it's hopefully mostly zeroes) but it needs to be done one sector at a time, since the mountain of zeroes will be too big for RAM. I think this approach has promise, but it's just not what I ended up doing. Plus, I think the computation to find "best-matching sector" is pretty rough. I don't want a patch file that takes days to compute... unless I'm really, REALLY desperate. Since I have the on-the-fly patching built into my engine now, I could make an alternate patch format without too much work. Note that in order to avoid read-amplificaton issues, I would need to STRONGLY prefer sequences of sectors. I mean, I know it's paranoid, but I'd rather miss once and have an inefficient sector than even have a chance of a perverse case turning one long sequential read into dozens of short ones.

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    So a game was running slow on the switch. After a while I realized it was because of some text I'd added which was getting rendered every frame. Like... one sentence of text. Using a spritefont and drawn in a batch. What the heck? This shouldn't be so slow.

    After digging in, I realized it was due to the kerning support I'd added to imgui which necessitates lookups of kerning pairs between letters. On top of this, imgui's basic glyph lookup was slow. (Using imgui was a mistake, but that's a story for another day and I'm stuck with it for now.)

    Both these things are ABSURDLY slow because they use a binary search through a rather sizable list. Binary searches are slow because by definition they hop around memory without paying any heed to where they are in a cache. Every single hop is to a fresh cache line. This is 100% the worst possible case for a memory subsystem. And that's the weakest point of mobile platforms, which the switch is. Turns out this just cost me a few FPS but it was enough to drop it below 60 and thus to 30.

    I'll have to redo this some day. But not now... I just optimized a different thing entirely to earn back some FPS from a different budget and so the game regained 60fps. However, I do know how to do it. And as luck would have it... I had to do basically the same thing for another text renderer just the same week!

    In this other case, I'm high-level emulating the Turbo CD "system card" bios. This is so we can release games that are emulated without having to distribute the copyrighted bios, which there's no way we could ever get the rights to. This proceeded pretty well. But finally I discovered at the very end of the game we were porting, it would use a built-in SJIS font that the bios has for rendering the credits. Now, I really don't want to include that copyrighted font verbatim. On the other hand, it takes a LOT of work to make a Japanese font. So I wondered... had they just used someone else's public domain font? So I spent hours scouring the web and could find no proof that they had. Fortunately while doing so I found 100s of BDF-format dot fonts, a few of which were 16x16 (which is what I needed), and 3 of which had the needed attribute of leaving one column empty (so that the glyphs would be spaced at least a little if put on a grid).

    So it was time to make a BDF-processing tool, and a necessary part of that is creating the glyph lookup table so that I could get from an input SJIS code to a specific offset in the dot font bitmap data. And now I will tell you the proper way to do it.

    SJIS is a two-byte code. UTF16 and even UTF-8 can be handled the same way (albeit 3-byte codes are common in UTF-8 when it comes to Japanese). You can make a 64k-entry table if you want (the entry is either the offset to the glyph or missing-glyph sentinel). This involves just one random access cache hit per lookup. The table entries really need to be 4 bytes so it ends up being 256KB. Not bad, but we could do better.

    So generally what you do is you simply make a 256-entry "1st-level" table. The first byte of the code is the key index and the value is the number of a 2nd-level table. These can be bytes, so long as you know how to calculate the offset of a 2nd-level table from the number [it's 256+number*sizeof(2nd-level table)]. So the first level table is tiny, although it's still basically a random access each time. The 2nd level table can be 2 byte entries (to index another table of offset to the final glyph) or 4 byte entries (to directly index the offset of the final glyph).

    My glyphs were all fixed-size (32 bytes apiece) so I could calculate them from a number, so I ended up with 2 byte entries. Note that in MANY cases your glyphs will be fixed-size: insofar as they are records containing e.g. texcoords. If you're careful, a spritefont renderer can operate completely in-cache, and really only the final access of the glyph record is random (the gpu batch data you're producing is written consecutively so it's actually the ideal case for any CPU and is basically negligable compared to all the rest, in this case where it's not much data).

    But anyway that's more optimization than I really care to think about now. By using the 2-level index you change a dozen or more random accesses to 2 or 3 and due to the way these memory bandwidth-constrained devices work, that can directly map to a 4x speedup. I did it with a 2-level index in this case because I actually put the data in my fake bios and I didn't want to worry about what would happen if it grew too large.

    One day I'll improve that imgui text renderer some more so I can actually use it seriously; in that case I will probably make a direct 64k-entry map from 2byte codes to glyph data. But wait--what about the kerning! We're not through yet. I brought all this up for a reason! The kerning data is kind of a multi-level index, isn't it? 1st index is the 1st character and 2nd index is the 2nd character. Now in this case you can end up with some giant tables (if any 2-byte character can be kerned with any other 2-byte character, you can end up with 64k*64k=4GB entries in the worst case...) but still it's only a small number of random accesses. In practice the table won't be that big, and if it is too big you can work it into 3 levels somehow. Not a big deal. I'll write about how that goes when I get around to doing it.


    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    Not a day goes by I don't have to do something like this...

    This time, I had to read a png myself and receive the palette indexed data, because I'm using it to patch tiles in the 8bit system VRAM. I can't (straightforwardly) draw it on top of the finished output because it wouldn't fade along with the rest of the screen unless I took some really wacky measures. My content building process munges pngs phenomenally into a custom format which uses LZ4 to compress (pngs are very slow to decode and LZ4 advertises "faster than memcpy"). This works well except on big photorealistic stuff which it can inflate by 3x relative to the pngs.. everything else is just fairly comparable in size but also loads faster. So I fiddle with my build scripts to replace the munged pngs with specially chosen pristine ones.

    Anyway the point of all that was to show what a highly managed process importing PNGs are in my engine, so that I always end up with 32bpp data. And yet in this case I need palettes. And I don't want to re-paletteize it because sometimes these patches need to use the palette that's already there (replacing tiles is an order of magnitude easier than replacing palettes, which are computed on the fly [fading, you know] rather than just parked there like tiles and BG entries). So rather than generalize my 32bpp textures system, I added a new 8bpp image loading system.

    stb_image is what I use for loading unmunged pngs, which can only load images into 32bpp format. I don't to add lodepng... I don't know, I just don't want another library. Corona is overkill (and another library I'm not using yet). SDL_image is not going to happen because I'm not using SDL. But I already have libpng available as an importable module in my engine (some games need it) so I might as well use that.

    Problem is, libpng is like... a nuts and bolts library, not a user-friendly library at all. There are a lot of user-unfriendly parts of it (that is to say, all the parts) but the one in particular that's got me down today is that it could give me an image with a palette that's size 16 or 256 depending on how I saved the PNG. So far out of 2 images I have saved the PNG both ways. Now, libpng has a lot of fun transformations it can do on its pixel formats, but not the one I need (or I can't figure out which one it is). So it's time to write my own 4bpp-to-8bpp converter! It took a few tries, but I came up with this, which is so neat and nutty its what inspired me to write this post today:

    if(bit_depth == 4)
      for (int y=0; y<height; y++)
        for (int x=0; x<width; x++)
          int xx=width-x-1;
          row_pointers[y][xx] = (row_pointers[y][xx/2]>>(((xx&1)^1)<<2))&0xF;

    Here we work from right to left (don't ask me why I don't reverse the loop, I have no idea) because we're expanding the data so the store is always further to the end than the load so that we never step on our own feet. Then we just have to pick the N/2th 4bpp byte for the Nth pixel, shift it down if it's the upper nybble, and mask it down to 1 nybble in case it wasn't. Easy! To tell you the truth, I didn't even check if there was a png transformation, I just did it this way myself immediately because I knew it would be fun.

    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    There aren't a lot of PS5 games. Not as many as you'd expect. I mean, you can get a lot of games on your PS5 but most of them are PS4 games.

    That may or may not have to do with this:

    In order to make a PS5 game, you have to implement this complicated thing Sony is trying to promote called "activities" or possibly "stories". I don't know exactly what it's called; it's complicated.

    The idea is that it should let people jump into the game from the OS with minimum fuss -- no splash screens (insofar as it's possible), no title screens, no menus, no file selects. It's like a menu your game has to support, outside the game. You could call it kind of like windows jump lists. I might have a limited view of this feature (I think it's way more powerful) because I have made limited use of it.

    If you haven't heard of this thing, don't feel bad. I don't know if anyone actually uses it.

    OK, so you want to do this. Prepare to turn your game inside out. You need to boot straight into gameplay, so any accidental dependencies your game had on earlier screens need to be handled differently. You have to tell the OS when the player has begun an activity (so it puts it in the list of resumable activities) and tell the OS when the player has ended it. You need to be able to handle an instruction from the OS to jump from whatever state the game is in IMMEDIATELY to the user's selected activity. The meaning of jumping to the activity is left up to the game: it could be a save file, or it could spawn you in the starting area rather than where you left off last time so you can begin doing some more chores. That helps a bit.

    But I'm working on these emulated games and I can't turn them inside out. I can only make tiny laser incisions. Here's some of the things I've had to do:

    1. I need to decide when the player has "begun an activity" (namely, "goal: beat the game".) The game has an attractmode that shows a gameplay demo so I can't use anything in-game to identify that the player has done this. Before the game begins, it shows the weapon loadout selection menu. This it does NOT show in the attractmode. It DOES show it inbetween levels and when you die. So what I do is this: I check a tile in a certain spot to see if it's part of the loadout menu. When I find it, I tick a state machine. If I don't find it, I tick it again: and at that point the "activity" begins. I can't trigger "begin" when the user exits the title menu because we're supposed to skip as many menus as possible (making as many assumptions as we please) so I have to wait until after the menu is done and the gameplay has really begun.

    2. I need to jump to the activity begin point even if the player has never begun the activity in the actual game. That means I possibly don't even have a savestate. So I have prepared a savestate just before the first level fades in, along with an arbitrarily-selected loadout. What else am I supposed to do?

    3. I need to jump to the activity begin point if the player has actually begun the activity. In this case, I use the last savestate he loaded or saved. Unless that one's not there, then I punt to item 2.

    4. I need to decide when the player has "ended an activity". I have previously prepared hooks for when each level is beat, for achievements, but these hooks don't exist in an "un-enhanced" version of the game shipped with the SKU. In case this is what the player's playing, I still have to detect the end of the game. The achievement hook I created before was done when a specific cutscene is shown at the end of the game. It's a good, dramatic timing. So, I decided to reproduce that for the activity ending. To do this I will repeatedly memcmp the beginning of VRAM where I happen to know the tiles for that cutscene are. The moment they appear, the game is over. Well... the cutscene is fading in, so the screen is black at that time (that's not ideal for achievements because screenshots get taken at that time) but that's what we've got for now.

    In another game I'm working on, which wasn't emulated, I had to skip the language selection menu and just assume the user wanted English. What else am I supposed to do? What else is anyone doing? I should probably get a PS5 so I can check some games. In retrospect I should have made an effort to base it on the user's OS-selected language, but what about people using languages I don't support?


    posted in Blogs read more
  • xor_mgambrell_0 xor_mgambrell_0

    Well, I'm not working on my own game right now. What if I tell you some interesting stories each week from the other people's games I'm working on which is keeping me from the real business of making my own game? I hope you don't mind.

    This week I had a few walls of text in FIGS languages which lost formatting during localization. I needed to break the paragraphs at "Unfortunately" and "Watching". I felt a little powerful for recognizing those words immediately. Italian "Guardare"... that one tripped me for a half a beat. Sounds like 'block' or 'pause'... those I have seen before. Ah, but "regard".. get it?

    My general process for double-checking stuff like this (triple check is to ask a native speaker) is to paste it in google translate and see if the splits make sense in English. This is not hard for the FIGS languages... but... with Japanese... ok, let me show you what happens.

    Here's a Japanese sentence we might have to wrap ourselves:

    "You have saved this world from the terrifying hands of Dr. Hankyo!"

    Now what I'm going to do is split it at the most reasonable points and show you how google translate re-interprets the sentence:

    あなたは -- 恐ろしいハンギョ博士の魔の手からこの世界を救ったのです!
    You -- He saved this world from the terrifying hands of Dr. Hankyo!

    あなたは恐ろし -- いハンギョ博士の魔の手からこの世界を救ったのです!
    You are scared -- He saved this world from the magical hands of Dr. Hankyo!

    あなたは恐ろしいハンギョ -- 博士の魔の手からこの世界を救ったのです!
    You are scary hangyo -- He saved this world from the hands of the doctor's devil!

    あなたは恐ろしいハンギョ博士の -- 魔の手からこの世界を救ったのです!
    You are a terrifying Dr. Hankyo -- I saved this world from the hands of the devil!

    あなたは恐ろしいハンギョ博士の魔 -- の手からこの世界を救ったのです!
    You are the terrifying Dr. Hankyo's demon -- I saved this world from my hands!

    あなたは恐ろしいハンギョ博士の魔の手からこの -- 世界を救ったのです!
    You are this from the terrifying Dr. Hankyo's devil's hand -- You saved the world!

    あなたは恐ろしいハンギョ博士の魔の手からこの世界を救 -- ったのです!
    You save this world from the terrifying hands of Dr. Hankyo -- It was!

    The guidance on Japanese typesetting will tell you it doesn't really matter much, as long as maybe you cut it between kanji and kana, but these examples run the whole gamut of persons, tenses, senses, and storylines, so wtf?

    Anyway I had to prepare all these FIGS translations in a Photoshop file so I could export the 1080p pngs for display in-game without needing a Photoshop-calibre typesetting engine. I'd say this should be someone else's job, but apparently it's my job to do whatever everybody else doesn't do in cases where it would take me longer to tell them to do it, explain how to do it, and then check their work. Photoshop does have its own wrapping logic which is probably good but in this context I had to be creative about where to wrap it or it would turn into too many lines.

    posted in Blogs read more