Les membres ayant 30 points peuvent parler sur les canaux annonces, projets et hs du chat.
La shoutbox n'est pas chargée par défaut pour des raisons de performances. Cliquez pour charger.

Forum Casio - Autres questions


Index du Forum » Autres questions » gint programming questions
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

gint programming questions

Posté le 03/12/2024 22:32

Hello! I'm planning on making a new program with fxSDK and gint, and I have a few questions. I have an fx-9750GIII and I hope to get an fx-CG50 soon, so I'm curious about a few things for both calculators:

- How can I allocate as much RAM as possible for the program to use? I read a few years ago that malloc is a little buggy, but can I get a big chunk of heap that way? I imagine if the heap is mostly empty but has a small allocation in the middle that will limit the largest piece I can allocate. Also, I read about the extra RAM in the SPU and understand the 24-bit accesses in case I need to do that.

- Does anything persist in RAM when an add-in exits?

- If the add-in allocates memory with malloc but doesn't free it, does the OS recover it or is it just lost?

- Does the add-in execute directly from flash or do I lose RAM as the program gets bigger?

- How can I read and write files in flash like Python scripts?

Thanks in advance!


1, 2 Suivante
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 03/12/2024 22:41 | #


Welcome back! I recognize your avatar but haven't seen it in a while. :o

- How can I allocate as much RAM as possible for the program to use?

About 90-100 kB is available on the G-III, up to 600-700 kB for the CG-50. On the CG-50 there are also large swaths of unused RAM of which we commonly grab 2-3 MB but that technically total 6 MB.

The OS malloc is known to be a bit unreliable with fragmentation on the fx-CG, but most of the default heap memory is managed by gint anyway. On the G-III I don't have clean reports of fragmentation issues, however almost all the heap you can get is from the OS, so you don't have as much control over it and no statistics.

Also, I read about the extra RAM in the SPU and understand the 24-bit accesses in case I need to do that.

This is a great resource on the G-III, the largest one you have access to, although hard to use efficiently at the same time. You can read this documentation page about SPU memory.

The easiest piece of SPU memory to use is called PRAM0 and it's quite normal except that you can only use 32-bit accesses. So you can't use it to store, e.g., strings for strlen() and co., which frequently use 8-bit accesses. In practice, this means you are limited to using this memory with code that you write yourself.

The rest is XRAM0, YRAM0 and YRAM1, and they work the same except that when you write 32 bits of data only 24 bits are retained.

volatile uint32_t *YRAM0 = (void *)0xfe280000;
*YRAM0 = 0x11223344;
*YRAM0; // 0x11223300

I have yet to see these used in an add-in. It's feasible, but a pain.

- Does anything persist in RAM when an add-in exits?

As per the runtime contract, nothing. The OS doesn't spend time needlessly clearing out memory so you're likely to find scraps of previous executions' data. But that's not a behavior you should rely upon. You can store data safely using the filesystem, or if you want more speed, the main memory (but gint has no wrappers for MCS syscalls, you'll need to world switch and call them manually).

- If the add-in allocates memory with malloc but doesn't free it, does the OS recover it or is it just lost?

It remains unavailable until the add-in exits, then it is recovered by the OS. "Recovery" is however a misleading term here. The heap just ceases to exist when the add-in quits, and then it is recreated anew on top of whatever there is in memory when another add-in starts.

- Does the add-in execute directly from flash or do I lose RAM as the program gets bigger?

The add-in is mapped by the MMU directly from Flash at no cost in RAM usage. The add-in file is limited to 512 kB (G-III) / 2 MB (CG-50) after which it does not show up in the main menu anymore, and if you need more room you need to start using external files and loading them to RAM yourself, at which point there is a cost in RAM usage.

- How can I read and write files in flash like Python scripts?

You can use the standard POSIX file API (open() and co) or the standard libc API (fopen() and co) on both the G-III and CG-50. Do remember however to world switch during file operations, otherwise the add-in can crash.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Fcalva Hors ligne Membre Points: 618 Défis: 10 Message

Citer : Posté le 03/12/2024 22:43 | #


Hey, welcome here
- You can't have more than the standard heap on G-III (and the SPU only is for the G-III/ fx-CGs), though you can have an extra ~2MB on the CG50 (and on it the heap is already 512KiB)
- It's complicated, but it's not the case anywhere you should use (Basically it only persists if you corrupt important stuff )
- The OS recovers all memory from leaks because it's not the OS that handles it (and with the old SDK i think it just resets malloc()s anyways)
- The addin is loaded into RAM but in a reserved area so you can't lose/gain heap/stack (See Lephe's message)
- For Python file I/O, you would have to look at Slyvtt's experimental branch of PythonExtra (community python). It works though, just don't forget to .close() your files because the garbage collection seems a bit flaky
In general, i would recommend not searching too hard for more RAM, but rather just optimizing your program
Edit : i misunderstood the last question, you have standard C I/O with gint/fxlibc (see stdio.h)
Apréciateur de Noctua moyen
Caltos : G35+EII, G90+E (briquée )
Calamari En ligne Membre Points: 408 Défis: 0 Message

Citer : Posté le 03/12/2024 22:54 | #


How can I allocate as much RAM as possible for the program to use?

I've been using malloc and it seems good. There might be a better method, but what I do to get the largest contiguous block of RAM is start from 64KB on FX (can test GINT_HW_FX) otherwise 512KB (CG). Then I try to malloc in a loop, decrementing by 1KB each time malloc fails. Then once it succeeds, I free it, add 1024 and then loop from there, this time decrementing with the alignment I actually wanted (4 bytes, 8 bytes, etc). This seems to be quite fast. Example: https://git.planet-casio.com/calamari/Duvet/src/branch/main/src/sheet.c#L168

Of course if you're allocating a linked list or such then you can just malloc each element individually. Although, I ended up not doing that in Duvet in order to minimize overhead, hence the single large allocation.

Does anything persist in RAM when an add-in exits?

I'd recommend to avoid exiting at all, unless you really need to exit due to a fatal error or such. If you exit, then they have to run something else before they can run your add-in again. Instead, allow them to go back to the main menu, for example by pressing MENU. Check out getkey_opt and gint_osmenu. When they're at the main menu, if they choose run something else, your memory is "freed" automatically. Otherwise, they can come back to your add-in and your memory is intact. There are some display details you'll have to deal with when they come back, but it's manageable. Here's how I did it in a recent add-in: https://git.planet-casio.com/calamari/Duvet/src/branch/main/src/kbd.c#L85
“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 05/12/2024 22:06 | #


Lephenixnoir a écrit :
Welcome back! I recognize your avatar but haven't seen it in a while. :o
That's right! You were a great help figuring out how to install fxSDK a few years ago. I've gotten a lot better at Linux since then! Thanks for your information this time too.

The OS malloc is known to be a bit unreliable with fragmentation on the fx-CG, but most of the default heap memory is managed by gint anyway. On the G-III I don't have clean reports of fragmentation issues, however almost all the heap you can get is from the OS, so you don't have as much control over it and no statistics.
So if I understand correctly, the OS will use a variable amount of RAM for itself but doesn't seem to have fragmentation issues, so I can use Calamari's strategy or similar if I want the biggest chunk of RAM possible? Does gint make any calls to malloc that I need to account for before I gobble up all the RAM?

You mentioned that you haven't seen any add-ins use the 24-bit SPU memory. What about the OS? I was wondering if I could leave data there after my add-in ends and expect it to stay there unchanged until it starts again (assuming a second add-in doesn't overwrite it).

Can you explain or link me to a little more about world switching?
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 05/12/2024 22:22 | #


The OS uses a fixed amount of RAM and what you can get from the OS malloc() is also a fixed amount. That much is not a variable. Fragmentation is this issue where if you fill the heap then over the next cycles of freeing and allocating you end up with swiss cheese with no large blocks and you might be unable to get large allocations. This is a concern you might have on the G-III where gint's heap (which is the default) is small and you quickly end up using the OS heap. By contrast, this is less of a problem on the CG-50 because gint's heap and other extra memory is much larger than the OS heap.

(...) so I can use Calamari's strategy or similar if I want the biggest chunk of RAM possible? Does gint make any calls to malloc that I need to account for before I gobble up all the RAM?

If you want small bits of RAM, use malloc(). It will handle things for you. If you need a large chunk or RAM early on and hold on to it to make sure you keep access to large blocks, then :

- On the G-III, do a single malloc() of the largest size you can, which I'm expecting will be ~90 kB. No point in allocating multiple times as the buffers won't be contiguous. If that's not enough, address directly in PRAM0, but you'll need to stick to 32-bit accesses.
- On the CG, use gint's kmalloc_max() function to get the largest block available in the _uram arena if ~400 kB is enough for you. Otherwise, address directly at 0x8c200000 where you have a few MB.

You mentioned that you haven't seen any add-ins use the 24-bit SPU memory. What about the OS? I was wondering if I could leave data there after my add-in ends and expect it to stay there unchanged until it starts again (assuming a second add-in doesn't overwrite it).

I don't think the OS uses it, but other add-ins might. And it disappears the moment you press OFF. I advise against relying on long-term storage in RAM in general.

Can you explain or link me to a little more about world switching?

Basically gint is its own mini-OS and you can't use arbitrary official-OS functions while using gint. Usually this comes up when using the filesystem or when using specific syscalls for which gint has no equivalent. To use such functions, you need to leave gint temporarily, run them, then come back into gint, a maneuver known as a world switch. The world switch mechanism, which is hidden from users, ensures that gint and the official OS won't try to use the same piece of hardware at the same time.

In practice, you just have to isolate your file-manipulation code in dedicated functions and call these functions through gint_world_switch() instead of directly.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 06/12/2024 01:53 | #


Ok, great. That's just about everything I was wondering about. On the G-III, do I need a world switch to malloc the ~90K from the OS heap instead of gint's heap? Where is gint's heap? Thanks!
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 06/12/2024 08:27 | #


You don't need to world switch to use malloc(). By default, malloc() will take memory from gint's heap and go to the OS heap when that is full. You can control which heap ("arena") to allocate from by using gint's slightly-lower-level kmalloc() function directly.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 26/01/2025 18:31 | #


I've been reading the optimization reference thread which is really amazing! I had a few more questions mainly out of curiosity (I know we can get good performance without going to great lengths but it's still good to know):

- Does gint handle initializing everything for the internal and SPU RAMs?

- One of the posts mentioned that gint's DMA functions use ILRAM. Does this happen for every LCD update? Is part of the ILRAM free if I want to poke some machine code in there and run it?

- Can code execute from the internal XRAM and YRAM?

- Does XRAM and YRAM access compete with DMA during LCD refresh?

- How does gint set up the pages for PSRAM0/1? The post mentions that several pages are available and can be selected.

The "documentation" link in this post that should go to Renesas is now 404. I think it's the same document on the bible site here.
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 26/01/2025 18:49 | #


- Does gint handle initializing everything for the internal and SPU RAMs?

There's almost nothing to do, but yeah, it's supposed to handle that.

- One of the posts mentioned that gint's DMA functions use ILRAM. Does this happen for every LCD update? Is part of the ILRAM free if I want to poke some machine code in there and run it?

Only dma_memset() uses ILRAM. You can still put functions in ILRAM by tagging them with GILRAM (from <gint/defs/attributes.h>) and the linker will put them at addresses that don't collide with the dma_memset() buffer.

- Can code execute from the internal XRAM and YRAM?

I think so, yes, but I'm not sure what the benefits would be. ILRAM should be faster for code.

- How does gint set up the pages for PSRAM0/1? The post mentions that several pages are available and can be selected.

gint assigns all the configurable memory to PRAM0 and XRAM0. There's no point in using other configurations unless you're programming the SPU DSP, which we don't know how to do or if it even exists.

The "documentation" link in this post that should go to Renesas is now 404. I think it's the same document on the bible site here.

I can't quite see what link you're referring to. The ones I find already link to the bible?
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 26/01/2025 19:15 | #


Sorry, wrong link. It was on an old DSP thread linked to from there. It's the one labeled "manuel du SH4-AL DSP."
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 26/01/2025 19:28 | #


Updated, thanks!
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 16/03/2025 16:06 | #


What's the best way to handle unaligned RAM access? I want to write a mixture of bytecode and data to the RAM at 0x8c200000 so I can't use a struct and don't want to waste a whole word for things like a single byte. I tried, for example, casting a pointer to int to a pointer to char, incrementing it by 5, then recasting the pointer to int. I was hoping the compiler would spot the misalignment and switch to loading the value pointed to byte by byte since one or both pointers are misaligned. If you do this on MIPS, for example, it will switch to different instructions that are slower but allow for unaligned reads. Instead, sh gcc tries to load from both addresses which should cause an exception. Apparently, casting a pointer to an address that violates alignment is UB, so this is correct output: https://godbolt.org/z/odYGTMEeP

I read that memcpy is usually best since the compiler may spot aligned accesses and optimize the function call out to a few instructions. I was able to get it to work for sh gcc on Godbolt but haven't been able to generate an assembly listing with fxsdk to check that it works there too: https://godbolt.org/z/cqG8fTqWT

I tried adding -Wa,-aghlns=$<.lst to target_compile_options in CMakeLists.txt. This option generates a separate listing file for each C source file in make, but I'm not sure what the equivalent is here. It appears to generate a single file called '$<.lst' for each source file, so the contents only show the listing for the last file compiled. If I can get this to work, I can double check that it optimizes out memcpy. The listing shows data but none of it is interpreted as instructions:
1217                   .text
1218                   .section    .gnu.lto_.ipa_sra.ff8d44b206af85f5,"e",@progbits
1219 0000 789C6D4D         .ascii  "x\234mM\211\021\2030\f\263d\223\247\260Jw\204\315k%\224\203^"
1219      89118330
1219      0CB36493
1219      A7B04A77
1219      84CD6B25
1220 0017 95933FC5         .ascii  "\225\223?\305\326\033va55\341\0303\367\231b\246\205\031\227Q"
1220      D61B7661
1220      3535E118
1220      33F79962
1220      A6851997
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 18/03/2025 20:11 | #


That's a tricky question! This might seem hopeless at first. All unaligned accesses are UB:

ISO C99 6.5.3.2 § 4 a écrit :
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.⁸³⁾

⁸³⁾ (...) Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

Which, of course, the alignment of int is 4 on our platform.

_Static_assert(_Alignof(int) == 4);

Note that creating unaligned pointers is not UB in itself, it's accessing it that's a problem. In general it is legal, somehow, to make pointers go out-of-bounds and unaligned and later access them if you bring them back. I believe there are good reasons for this, but I don't remember which right now.

Anyway, the next stop is whether GCC has an extension for it. Turns out, it has! You can specify arbitrary alignment on any piece of data with the aligned() attribute. Although again, the semantics are not very clear:

extern int x __attribute__((aligned(1)));
int f(void) { return x; }

_f:
    mov.l    .L2,r1
    rts    
    mov.l    @r1,r0
.L2:
    .long    _x

This is because aligned() specifies a minimum alignment, and both the compiler and linker will go out of their way to keep things aligned. You don't have a very solid semantic safety net here either since the standard refutes the existence of unaligned objects anyway.

The one thing that is guaranteed to destroy alignment constraints is the packed attribute on structures.

struct s { int field; } __attribute__((packed));
extern struct s ss;
int g(void) { return ss.field; }

_g:
    mov.l    .L5,r1
    mov.b    @r1,r2
    shll16    r2
    shll8    r2
    mov.b    @(1,r1),r0
    extu.b    r0,r3
    shll16    r3
    or    r2,r3
    mov.b    @(2,r1),r0
    extu.b    r0,r0
    swap.b    r0,r2
    or    r3,r2
    mov.b    @(3,r1),r0
    extu.b    r0,r0
    rts    
    or    r2,r0
.L5:
    .long    _ss

This gives you the following way to access completely unaligned values:

#define UDEREF4(PTR) (((struct s *)(PTR))->field)
int f1(char *buf) { return UDEREF4(buf+2); }

_f1:
    add    #2,r4
    mov.b    @(1,r4),r0
    mov.b    @r4,r1
    extu.b    r0,r2
    mov.b    @(2,r4),r0
    shll16    r1
    shll8    r1
    extu.b    r0,r0
    shll16    r2
    or    r1,r2
    swap.b    r0,r1
    mov.b    @(3,r4),r0
    or    r2,r1
    extu.b    r0,r0
    rts    
    or    r1,r0

Having to specify it manually is annoying, but important. If you know that the field is 2-aligned, you can use a different call that'll go much faster.

struct s2 { int field } __attribute__((packed, aligned(2)));
#define UDEREF4_2(PTR) (((struct s2 *)(PTR))->field)
int f2(char *buf) { return UDEREF4_2(buf+2); }

_f2:
    add    #2,r4
    mov.w    @r4,r1
    mov.w    @(2,r4),r0
    shll16    r1
    extu.w    r0,r0
    rts    
    or    r1,r0

In fact, for unaligned reads in general you can use inline assembly to leverage the SH4-only instruction movua.l, which, well, reads 4 bytes at an unaligned location. I believe it has 2 pipeline LS cycles and then it pieces reads together back in hardware.

#define UDEREF4(PTR) ({ \
    int _x; \
    __asm__("movua.l @%1, %0\n\t": "=r"(_x): "r"(PTR)); \
    _x; })
int f3(char *buf) { return UDEREF4(buf+2); }

_f3:
    add    #2,r4
    movua.l @r4, r0
    rts    
    nop

I believe this will be the fastest method for 2-byte and 4-byte unaligned reads regardless of alignment. For writes, you can use the structure method, and there being aware of 2-alignment when it is available will be useful.

You mention using memcpy(). This is compelling and basically "the" way in C++. Although that's only if you start with a pointer to char. Creating an unaligned pointer to int is pretty much a non-starter because, despite the caveat I mentioned above about this being legal, statically determining (un)alignment is of course undecidable and so the compiler is liable to assuming than any random pointer to int that it can't trace is aligned.

Getting the compiler to optimize out the memcpy() is the hard part:

int h(char *str) { int x; memcpy(&x, str, sizeof x); return x; }

_h:
    sts.l    pr,@-r15
    mov    r4,r5
    mov.l    .L10,r0
    add    #-4,r15
    mov    #4,r6
    jsr    @r0
    mov    r15,r4
    mov.l    @r15,r0
    add    #4,r15
    lds.l    @r15+,pr
    rts    
    nop
.L10:
    .long    _memcpy

And even if you could make it here you'd not even be out of the woods yet. This optimization relies on the compiler identifying memcpy() as a builtin, for which the fxSDK has bad news in the form of the -fno-builtin flag, which it enables to bypass a linker bug about incorrect rescanning of LTO archives. (I believe this is fixed now but haven't updated yet.)

You can force using the builtin by writing __builtin_memcpy() in the code but this doesn't help here either.

Morality: I recommend inline assembly and the structure trick. I don't believe you'll be able to avoid using special syntax instead of the * and -> operators. The rest should work nicely though.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 21/03/2025 19:07 | #


Thanks! This is great information, especially the movua.l instruction which looks like what I was looking for. If memcpy doesn't get optimized out, I'll see if I can create a macro for the inline assembly or get the compiler to inline a separate assembly function for unaligned access.

Did you generate the assembly from your post with FXSDK? The arguments I have in CMakeLists.txt are -Wall -Wextra -Os -g -flto Are there any other options I need to add to get sh gcc to compile a test function with the exact same options as FXSDK?

Unrelated to this, when I start my add-in then turn off the calculator for a while, the MENU key doesn't work when I turn it back on. It briefly flashes another screen including the battery indicator then immediately goes back to the add-in. Is this a known problem? If not, I'll work on a minimal example.
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 21/03/2025 19:55 | #


I tested the examples with sh-elf-gcc -S whatever.s -o - -O1.

If you want to find the exact options used by the fxSDK, compile your add-in with fxsdk build-{fx,cg} VERBOSE=1, which instructs CMake to print the full commands. There's nothing special about the options though, the most important are the -f* but I don't believe any is relevant here apart from -fno-builtin as mentioned above.

Unrelated to this, when I start my add-in then turn off the calculator for a while, the MENU key doesn't work when I turn it back on. It briefly flashes another screen including the battery indicator then immediately goes back to the add-in. Is this a known problem? If not, I'll work on a minimal example.

It is not a known problem. A MWE would be welcome. However I'm slightly worried as to whether this is a solvable one, as RAM just doesn't retain data for that long while turned off. Should your observation be caused by memory being wiped, we might not be able to do anything about it.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 23/03/2025 02:01 | #


It looks like the debug info from -g goes into the listing, and -flto outputs data to the listing instead of instructions. The listings are normal when I leave those two out. I went down the rabbit hole of assembling a function with movua.l in an assembly file and trying to have the compiler inline that which I understand now is not possible. The inline __asm__ will work I think.

For testing, I switched from sh-elf-gcc to sh4-linux-gnu-gcc so I have a regular int main with printf and can run the program with qemu as if it were native. I know the calculator is big-endian, but I haven't figured out how to get qemu to run a program compiled with -mb. qemu does not generate exceptions on misaligned accesses as I hoped, but it's still convenient for testing.

I read the post about memory timings and how slow writes to RAM are. Does the processor block until the write is finished or does it start executing the next instructions and block if it encounters a second write before the first one finishes? Maybe the structure trick for writing to RAM isn't much slower than aligned writes considering how slow writes are.

Can I catch misaligned access exceptions and recover? One thing I'm working on is a Forth for the calculator. Unlike C, Forth lets you read and write whatever data of whatever size and type arbitrarily. I think it's fine to have users align the data in their own programs as long as the calculator can recover gracefully from exceptions. Like BASIC, it shouldn't be possible to crash the calculator.
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 23/03/2025 09:15 | #


I read the post about memory timings and how slow writes to RAM are. Does the processor block until the write is finished or does it start executing the next instructions and block if it encounters a second write before the first one finishes?

There are multiple layers here. Logic-wise the CPU keeps running and stalls if another instruction with an LS (memory access) cycle occurs. However it also stalls on instruction fetches, which need access to the bus. IF happens every 4-aligned instruction which is basically every other instruction, plus after jumps. So the CPU could stall quickly before you reach the next write. I would think that most of the time you'll write to cache, which takes a single cycle, and the copy-back of a full 32-byte cache line to the RAM is what is slow. However, checking my memory benchmarks (https://bible.planet-casio.com/lephenixnoir/en/sh7305/calc/memory-perf#8-mib-graph-90e-ram columns 3 to 5) shows that access size does matter, which seems to contradict this intuition. Morality: it's complicated, but a bunch of byte writes is definitely 4 times slower than a bunch of long writes according to benchmark.

Can I catch misaligned access exceptions and recover?

Yes. gint has a mechanism that allows you to run a small function during exceptions: https://git.planet-casio.com/Lephenixnoir/gint/src/commit/badbd0fd2bd8ac796fd55d49b93691741bd8a139/include/gint/exc.h#L31-L54

I'm illustrating that on the example of gintctl's memory viewer where you get a "TLB Error" message if you navigate to an MMU-managed area where nothing is mapped. You can use this mechanism to log to a global and decide how to recover. Use gint_exc_catch() to set it up, and later again with a NULL parameter to disable it (original gintctl code):

    /* First do a naive access to the first byte, and record possible
       exceptions - TLB miss read and CPU read error.
       I use a volatile asm statement so that the read can't be optimized
       away or moved by the compiler. */
    exception = 0;
    gint_exc_catch(catch_exc);
    uint8_t z;
    __asm__ volatile("mov.l    %1, %0" : "=r"(z) : "m"(*mem));
    gint_exc_catch(NULL);

    sprintf(header, "%08X:", (uint32_t)mem);

    if(exception == 0x040)
    {
        sprintf(bytes, "TLB error");
        *ascii = 0;
        return 1;
    }
    if(exception == 0x0e0)
    {
        sprintf(bytes, "Read error");
        *ascii = 0;
        return 1;
    }

The exception-catching function is called if an exception occurs, and receives the exception code. You must check it against the alignment error codes (reads or writes), otherwise you will ignore too many exceptions (original gintctl code). You can use gint_exc_skip(N) to skip N instructions before continuing. By default, some exceptions will continue to the next line, ignoring the faulty instruction, while others will try to re-execute the faulty instruction. Memory access exceptions are of the second type, so you need to gint_exc_skip(1) otherwise you will find yourself in an infinite loop.

/* Code of exception that occurs during a memory access */
static uint32_t exception = 0;
/* Exception-catching function */
static int catch_exc(uint32_t code)
{
    if(code == 0x040 || code == 0x0e0)
    {
        exception = code;
        gint_exc_skip(1);
        return 0;
    }
    return 1;
}

Unlike interrupt callbacks, this function runs in "interrupt mode" and does not have access to any drivers or hardware. It is only designed for recording that something happened. If you don't have a good opportunity to check the error flag in the main thread you can try to setjmp() from the catch-function to an error handler, which I think should work (but is untested).
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 23/03/2025 15:02 | #


Morality: it's complicated, but a bunch of byte writes is definitely 4 times slower than a bunch of long writes according to benchmark.
Fair enough. This is a good place to start. I'll do some measurements of actual code once I have things working.

Thanks for the examples on catching exceptions.

Here is an example of the menu bug. https://github.com/JoeyShepard/prog-tools/blob/main/menu-bug/src/main.c I start the add-in, press menu and ON a few times to make sure they work then leave the calculator off while the add-in is running. I tried turning it back on after 5 minutes then again after 10 minutes and the menu key worked so I waited an hour and a half and the bug appeared when I turned it on. This is similar to what happened before.
Lephenixnoir Hors ligne Administrateur Points: 24934 Défis: 174 Message

Citer : Posté le 23/03/2025 17:16 | #


Thanks for the example. The RAM used in the 90+E/CG-50 just gets wiped after about an hour of not being powered. The OS knows that and saves a lot of stuff during poweroff. You'll notice that pressing ON quickly after OFF gives you the OS menu instantly, but doing so after a while takes a few more seconds with a spinner going off (even if it's not a full restart). I'm afraid gint can't reasonably do the same because apps tend to put data a bit wherever (rightly so, it's quite useful...) but then we can't easily save that to ROM during poweroff.
Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)
Druzyek Hors ligne Membre Points: 44 Défis: 0 Message

Citer : Posté le 23/03/2025 22:41 | #


Is there anything I can do as an alternative? I don't mind saving some state to ROM manually before gint_poweroff. Is there a way for my add-in to reload itself after the calculator comes back on? It can then load what it saved to ROM if necessary.
1, 2 Suivante

LienAjouter une imageAjouter une vidéoAjouter un lien vers un profilAjouter du codeCiterAjouter un spoiler(texte affichable/masquable par un clic)Ajouter une barre de progressionItaliqueGrasSoulignéAfficher du texte barréCentréJustifiéPlus petitPlus grandPlus de smileys !
Cliquez pour épingler Cliquez pour détacher Cliquez pour fermer
Alignement de l'image: Redimensionnement de l'image (en pixel):
Afficher la liste des membres
:bow: :cool: :good: :love: ^^
:omg: :fusil: :aie: :argh: :mdr:
:boulet2: :thx: :champ: :whistle: :bounce:
valider
 :)  ;)  :D  :p
 :lol:  8)  :(  :@
 0_0  :oops:  :grr:  :E
 :O  :sry:  :mmm:  :waza:
 :'(  :here:  ^^  >:)

Σ π θ ± α β γ δ Δ σ λ
Veuillez donner la réponse en chiffre
Vous devez activer le Javascript dans votre navigateur pour pouvoir valider ce formulaire.

Si vous n'avez pas volontairement désactivé cette fonctionnalité de votre navigateur, il s'agit probablement d'un bug : contactez l'équipe de Planète Casio.

Planète Casio v4.3 © créé par Neuronix et Muelsaco 2004 - 2025 | Il y a 97 connectés | Nous contacter | Qui sommes-nous ? | Licences et remerciements

Planète Casio est un site communautaire non affilié à Casio. Toute reproduction de Planète Casio, même partielle, est interdite.
Les programmes et autres publications présentes sur Planète Casio restent la propriété de leurs auteurs et peuvent être soumis à des licences ou copyrights.
CASIO est une marque déposée par CASIO Computer Co., Ltd