4dsdev
Views: 578,520 Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 06-26-17 08:14 PM
Guest:

0 users reading DSi Font File Format | 1 bot

Main - Reverse-engineering - DSi Font File Format New reply


nocash
Posted on 06-19-16 02:59 PM (rev. 4 of 06-19-16 04:02 PM) Link | #1044
One more DSi mystery getting solved: the 843Kbyte "\sys\TWLFontTable.dat" font file. As one may assume, it does contain fonts, and it's been fairly obvious that it's compressed somehow, but without actually knowing how the compression works - or even where the compression starts.
The answer is surprising simple (and a bit weird): The fonts are compressed backwards; the compression starts at the end, and ends at the start.

The exact decompression function can be found in Flipnote (EUR) at address 20BF8E4h (which is mainly doing error checking, and then calling the actual decompression function at 20BF938) (Flipnote does also contain several custom fonts, the TWLFontTable.dat file is used only for Flipnote's "Help" function).
The decompression is about same as the BIOS/SWI "LZ77" decompression function, but processed backwards, with some extra footer, using 8bit flags, and 4bit len(+3), and, oddly, 12bit disp(+3) instead of disp(+1).
And, it's intended to store the compressed and decompressed data in the same memory block, so that the compressed data will get overwritten during decompression - that would cause problem on worst-case compression ratios (which can require 9 compressed bytes for 8 uncompressed bytes; and which would actually cause problems at the begin if the file; assuming that the bytes in the chunk headers aren't occuring elsewhere in the font data). To avoid still unprossed data getting overwritten in such cases, the first some bytes (usually 15h bytes) are left uncompressed, and the decompession stops when reaching that location.

In decompressed form, the font tiles are looking as so:
http://problemkaputt.de/dsifontl.png - Large Font, 16x21pix, 2bpp, 7365 characters
http://problemkaputt.de/dsifontm.png - Medium Font, 12x16pix, 2bpp, 7365 characters
http://problemkaputt.de/dsifonts.png - Small Font, 10x12pix, 2bpp, 7365 characters
The three fonts seem contain exactly the same characters (and do differ only by the font size). The font width's are proportional, eg. the 16x21pix font be max 16 pixels wide (plus spacing), but most characters are using than 16 pixels (as defined in the Font Width chunk).
From what I can see, the font seems to cover english letters, european letters with accent marks, greek, cyrillic, several symbols & punctuation marks, and... something for some eastern language(s) that I am not really familar with.
If I should guess: It's probably japanese. Or, it might be some rather incomplete mixup of japanese, chinese, and/or korean.

Anyways, I am still having some questions...

1) Can somebody confirm if it's japanese and/or chinese and/or korean?

2) If it's ONLY japanese: Can anybody check if chinese/korean DSi consoles do have a different file, or extra font files?

3) Can somebody check how many letters are actually legible? The symbols seem to have underwent some ugly resampling process, resulting in rather smeared/dirty character glyphs. The large font might be still 95% legible (?) but the Small font... is anybody able to deciphier more than 10% of those characters? Or well, maybe you can intuitively identify everything if you are familar with the language, even if the letters are only smeared gray rectangles with a few black dots here and there?

4) Does the 3DS have the same font (in the DSi partition)? And do all DSi's have same font version? That is, if it's same as mine, then the RSA signature at the begin of the file should start with 23h,8Bh,F9h,08h,...

5) Is there more info about the Nitro Font format? The stuff that's known to me (see link below) is still a bit incomplete on some chunk header entries.

PS. some more info about the overall .dat file format, and about the decompressed Nitro Font format can be found here:
http://problemkaputt.de/gbatek.htm#dsisdmmcfirmwarefontfile

Opposing Force
Posted on 06-19-16 04:35 PM Link | #1045
That twlfontTable.dat file is the same in all of my 3ds and dsi nand dumps. crc32 f1953b32

gain
Posted on 06-19-16 04:52 PM (rev. 2 of 06-19-16 04:56 PM) Link | #1046
This program seems to have an implementation of NFTR decoding: https://github.com/pleonex/tinke/blob/353c96ba39fe59cc5662cf524c63385a757b936b/Plugins/Fonts/Fonts/NFTR.cs

So does nwvr: https://code.google.com/archive/p/nvwr/source/default/source

And this tool has documentation in Spanish: http://www.romhacking.net/download/utilities/879/

54634564
Posted on 06-21-16 05:43 PM Link | #1047
Definitely has Japanese. Highlighted here are the katakana and hiragana syllabaries used in Japanese:
[image]

Shortly after that highlighted bit, you have the large block of kanji. Most kanji are shared with other Asian languages, but I'm not seeing stuff like the Hangul used in Korean. Chinese and Korean units probably have their own font.

nocash
Posted on 06-22-16 08:45 PM (rev. 2 of 06-22-16 08:49 PM) Link | #1049
Thanks for confirming that it's really japanese! Didn't knew that kanji is shared for different languages, I though japanese would be totally different than chinese and korean, and that chinese and japanese would each have more than 6000 symbols, and sum up to more than 12000 for both languages. But with shared symbols... they would sum to something ways less?

I guess nobody has ever dumped the eMMC filesystem from chinese/korean DSi or 3DS yet, or did anybody do so?
Aside from the font file, the chinese DS was also known to contain an extra font in Wifi FLASH memory. I am not absolutely sure if or how games were allowed to use that font - if they were allowed to do so, then chinese DSi and 3DS should also include that old DS font somewhere for backwards-compatibilty.
And maybe same for korea - but I don't even know when (and if) the DS was ever sold in korea, if yes, if it did contain some similar extra font in Wifi FLASH as in china.

Oh, and one note on the character sets supported in the TWLFontTable.dat file. It has been said to support
ASCII ;english
ISO 8859-1 ;english+european
ISO 8859-7 ;english+greek
CP 932 ;english+sjis
CP 1252 ;english+european
CP 1253 ;english+greek
JIS X0201 ;english+jis8bit
JIS X0208 ;english+jis16bit
but that isn't exactly right. The Char Map chunks are containing only one set of character numbers (though split into separate chunks). The character numbers are probably 16bit Unicode (as indicated by the "Encoding" byte in the Font Info chunk). There isn't anything related to ISO/CP/JIS character numbers in the file. Of course, one could use some external translation table to convert such numbers to unicode.

For the meaning of the various bytes in the chunks, the Glyph chunks is fairly simple:
Character Glyph (Tile Bitmaps)
00h 4 Chunk ID "PLGC" (Character Glyph)
04h 4 Chunk Size (10h+NumTiles*siz+padding)
08h 1 Tile Width in pixels
09h 1 Tile Height in pixels
0Ah 2 Tile Size in bytes (siz=width*height*bpp+7)/8)
0Ch 1 Underline location
0Dh 1 Max proportional Width including left/right spacing
0Eh 1 Tile Depth (bits per pixel) (usually 1 or 2, sometimes 3)
0Fh 1 Tile Rotation (0=None/normal, other=?)
10h ... Tile Bitmaps
... ... Padding to 4-byte boundary (zerofilled)
The rotation byte is usually zero (meaning of other rotation values are rather unclear).
Byte 0Dh seems to be the "max proportional width" (ie. the biggest value from the Char Width chunk) (not 100% sure, but I've examing about 10 nitrofont files, and it does match that rule with all of those files.
Byte 0Ch might be the underline location. But it's hard to tell for sure because the "upper part" of the character is usually always "square", so the byte could be either related to the underline location, or to the bitmap width (unless any games are actually drawing underlined text (?) then one could check if those games use that byte).

Character width is also quite simple...
Character Width
00h 4 Chunk ID "HDWC" (Character Width)
04h 4 Chunk Size (10h+NumTiles*3+padding)
08h 2 First Tile Number (should be 0000h)
0Ah 2 Last Tile Number (should be NumTiles-1)
0Ch 4 Unknown/unused (zero)
10h+N*3 1 Left Spacing (to be inserted left of character bitmap)
11h+N*3 1 Width of Character Bitmap (excluding left/right spacing)
12h+N*3 1 Total Width of Character (including left/right spacing)
... ... Padding to 4-byte boundary (zerofilled)

But the Font Info chunk is pretty unclear:
Font Info Chunk
00h 4 Chunk ID "FNIF" (Font Info)
04h 4 Chunk Size (1Ch or 20h)
08h 1 Unknown/unused (zero)
09h xxx 1 Height ;or Height+/-1
0Ah xxx 1 Unknown (usually 00h, or sometimes 1Fh)
0Bh 2 Unknown/unused (zero)
0Dh xxx 1 Width ;\or Width+1
0Eh xxx 1 Width_bis (?) ;/
0Fh 1 Encoding (0/UTF8, 1/UNICODE, 2/SJIS, 3/CP1252) (usually 1)
10h 4 Offset to Character Glyph chunk, plus 8
14h 4 Offset to Character Width chunk, plus 8
18h 4 Offset to first Character Map chunk, plus 8
1Ch (1) Tile Height ;\present only
1Dh xxx (1) Max Width or so +/-? ; when above
1Eh (1) Underline location ; Chunk Size = 20h
1Fh (1) Unknown/unused (zero) ;/(version 0102h)
Most of the entries seem to be duplicated stuff that's also found in the Character Glyph chunk (ie. the width/height/underline stuff). Which would be rather useless if it's really like that. And it would be even weirder because there newer 20h-byte chunk version invented more of those useless entries.
And in case of the "xxx" marked entries, the values aren't even exact dupes (in the files that I've tested, they are sometimes +/-1 bigger or smaller as in the Glyph chunk; there might be some reason for that, or maybe somebody just didn't iniatialize those bytes properly).

I haven't checked the open source stuff recently (but used sources like those last year when trying to understand the nitrofont format, and least back then, they couldn't explain all bytes in the Font Info chunk either).

Anyways, at the moment, the biggest font mystery would be what's going on in china and korea.


Main - Reverse-engineering - DSi Font File Format New reply

Page rendered in 0.042 seconds. (2048KB of memory used)
MySQL - queries: 28, rows: 73/73, time: 0.026 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2015-10-07)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.