Posté le 03/12/2024 22:32
Planète Casio v4.3 © créé par Neuronix et Muelsaco 2004 - 2025 | Il y a 97 connectés | Nous contacter | Qui sommes-nous ? | Licences et remerciements
Planète Casio est un site communautaire non affilié à Casio. Toute reproduction de Planète Casio, même partielle, est interdite.
Les programmes et autres publications présentes sur Planète Casio restent la propriété de leurs auteurs et peuvent être soumis à des licences ou copyrights.
CASIO est une marque déposée par CASIO Computer Co., Ltd
Citer : Posté le 03/12/2024 22:41 | #
Welcome back! I recognize your avatar but haven't seen it in a while. :o
About 90-100 kB is available on the G-III, up to 600-700 kB for the CG-50. On the CG-50 there are also large swaths of unused RAM of which we commonly grab 2-3 MB but that technically total 6 MB.
The OS malloc is known to be a bit unreliable with fragmentation on the fx-CG, but most of the default heap memory is managed by gint anyway. On the G-III I don't have clean reports of fragmentation issues, however almost all the heap you can get is from the OS, so you don't have as much control over it and no statistics.
This is a great resource on the G-III, the largest one you have access to, although hard to use efficiently at the same time. You can read this documentation page about SPU memory.
The easiest piece of SPU memory to use is called PRAM0 and it's quite normal except that you can only use 32-bit accesses. So you can't use it to store, e.g., strings for strlen() and co., which frequently use 8-bit accesses. In practice, this means you are limited to using this memory with code that you write yourself.
The rest is XRAM0, YRAM0 and YRAM1, and they work the same except that when you write 32 bits of data only 24 bits are retained.
*YRAM0 = 0x11223344;
*YRAM0; // 0x11223300
I have yet to see these used in an add-in. It's feasible, but a pain.
As per the runtime contract, nothing. The OS doesn't spend time needlessly clearing out memory so you're likely to find scraps of previous executions' data. But that's not a behavior you should rely upon. You can store data safely using the filesystem, or if you want more speed, the main memory (but gint has no wrappers for MCS syscalls, you'll need to world switch and call them manually).
It remains unavailable until the add-in exits, then it is recovered by the OS. "Recovery" is however a misleading term here. The heap just ceases to exist when the add-in quits, and then it is recreated anew on top of whatever there is in memory when another add-in starts.
The add-in is mapped by the MMU directly from Flash at no cost in RAM usage. The add-in file is limited to 512 kB (G-III) / 2 MB (CG-50) after which it does not show up in the main menu anymore, and if you need more room you need to start using external files and loading them to RAM yourself, at which point there is a cost in RAM usage.
You can use the standard POSIX file API (open() and co) or the standard libc API (fopen() and co) on both the G-III and CG-50. Do remember however to world switch during file operations, otherwise the add-in can crash.
Citer : Posté le 03/12/2024 22:43 | #
Hey, welcome here
- You can't have more than the standard heap on G-III (and the SPU only is for the G-III/ fx-CGs), though you can have an extra ~2MB on the CG50 (and on it the heap is already 512KiB)
- It's complicated, but it's not the case anywhere you should use (Basically it only persists if you corrupt important stuff
- The OS recovers all memory from leaks because it's not the OS that handles it (and with the old SDK i think it just resets malloc()s anyways)
- The addin is loaded into RAM but in a reserved area so you can't lose/gain heap/stack(See Lephe's message)- For Python file I/O, you would have to look at Slyvtt's experimental branch of PythonExtra (community python). It works though, just don't forget to .close() your files because the garbage collection seems a bit flaky
In general, i would recommend not searching too hard for more RAM, but rather just optimizing your program
Edit : i misunderstood the last question, you have standard C I/O with gint/fxlibc (see stdio.h)
Caltos : G35+EII, G90+E (briquée
Citer : Posté le 03/12/2024 22:54 | #
I've been using malloc and it seems good. There might be a better method, but what I do to get the largest contiguous block of RAM is start from 64KB on FX (can test GINT_HW_FX) otherwise 512KB (CG). Then I try to malloc in a loop, decrementing by 1KB each time malloc fails. Then once it succeeds, I free it, add 1024 and then loop from there, this time decrementing with the alignment I actually wanted (4 bytes, 8 bytes, etc). This seems to be quite fast. Example: https://git.planet-casio.com/calamari/Duvet/src/branch/main/src/sheet.c#L168
Of course if you're allocating a linked list or such then you can just malloc each element individually. Although, I ended up not doing that in Duvet in order to minimize overhead, hence the single large allocation.
I'd recommend to avoid exiting at all, unless you really need to exit due to a fatal error or such. If you exit, then they have to run something else before they can run your add-in again. Instead, allow them to go back to the main menu, for example by pressing MENU. Check out getkey_opt and gint_osmenu. When they're at the main menu, if they choose run something else, your memory is "freed" automatically. Otherwise, they can come back to your add-in and your memory is intact. There are some display details you'll have to deal with when they come back, but it's manageable. Here's how I did it in a recent add-in: https://git.planet-casio.com/calamari/Duvet/src/branch/main/src/kbd.c#L85
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic
Citer : Posté le 05/12/2024 22:06 | #
Welcome back! I recognize your avatar but haven't seen it in a while. :o
You mentioned that you haven't seen any add-ins use the 24-bit SPU memory. What about the OS? I was wondering if I could leave data there after my add-in ends and expect it to stay there unchanged until it starts again (assuming a second add-in doesn't overwrite it).
Can you explain or link me to a little more about world switching?
Citer : Posté le 05/12/2024 22:22 | #
The OS uses a fixed amount of RAM and what you can get from the OS malloc() is also a fixed amount. That much is not a variable. Fragmentation is this issue where if you fill the heap then over the next cycles of freeing and allocating you end up with swiss cheese with no large blocks and you might be unable to get large allocations. This is a concern you might have on the G-III where gint's heap (which is the default) is small and you quickly end up using the OS heap. By contrast, this is less of a problem on the CG-50 because gint's heap and other extra memory is much larger than the OS heap.
If you want small bits of RAM, use malloc(). It will handle things for you. If you need a large chunk or RAM early on and hold on to it to make sure you keep access to large blocks, then :
- On the G-III, do a single malloc() of the largest size you can, which I'm expecting will be ~90 kB. No point in allocating multiple times as the buffers won't be contiguous. If that's not enough, address directly in PRAM0, but you'll need to stick to 32-bit accesses.
- On the CG, use gint's kmalloc_max() function to get the largest block available in the _uram arena if ~400 kB is enough for you. Otherwise, address directly at 0x8c200000 where you have a few MB.
I don't think the OS uses it, but other add-ins might. And it disappears the moment you press OFF. I advise against relying on long-term storage in RAM in general.
Basically gint is its own mini-OS and you can't use arbitrary official-OS functions while using gint. Usually this comes up when using the filesystem or when using specific syscalls for which gint has no equivalent. To use such functions, you need to leave gint temporarily, run them, then come back into gint, a maneuver known as a world switch. The world switch mechanism, which is hidden from users, ensures that gint and the official OS won't try to use the same piece of hardware at the same time.
In practice, you just have to isolate your file-manipulation code in dedicated functions and call these functions through gint_world_switch() instead of directly.
Citer : Posté le 06/12/2024 01:53 | #
Ok, great. That's just about everything I was wondering about. On the G-III, do I need a world switch to malloc the ~90K from the OS heap instead of gint's heap? Where is gint's heap? Thanks!
Citer : Posté le 06/12/2024 08:27 | #
You don't need to world switch to use malloc(). By default, malloc() will take memory from gint's heap and go to the OS heap when that is full. You can control which heap ("arena") to allocate from by using gint's slightly-lower-level kmalloc() function directly.
Citer : Posté le 26/01/2025 18:31 | #
I've been reading the optimization reference thread which is really amazing! I had a few more questions mainly out of curiosity (I know we can get good performance without going to great lengths but it's still good to know):
- Does gint handle initializing everything for the internal and SPU RAMs?
- One of the posts mentioned that gint's DMA functions use ILRAM. Does this happen for every LCD update? Is part of the ILRAM free if I want to poke some machine code in there and run it?
- Can code execute from the internal XRAM and YRAM?
- Does XRAM and YRAM access compete with DMA during LCD refresh?
- How does gint set up the pages for PSRAM0/1? The post mentions that several pages are available and can be selected.
The "documentation" link in this post that should go to Renesas is now 404. I think it's the same document on the bible site here.
Citer : Posté le 26/01/2025 18:49 | #
There's almost nothing to do, but yeah, it's supposed to handle that.
Only dma_memset() uses ILRAM. You can still put functions in ILRAM by tagging them with GILRAM (from <gint/defs/attributes.h>) and the linker will put them at addresses that don't collide with the dma_memset() buffer.
I think so, yes, but I'm not sure what the benefits would be. ILRAM should be faster for code.
gint assigns all the configurable memory to PRAM0 and XRAM0. There's no point in using other configurations unless you're programming the SPU DSP, which we don't know how to do or if it even exists.
I can't quite see what link you're referring to. The ones I find already link to the bible?
Citer : Posté le 26/01/2025 19:15 | #
Sorry, wrong link. It was on an old DSP thread linked to from there. It's the one labeled "manuel du SH4-AL DSP."
Citer : Posté le 26/01/2025 19:28 | #
Updated, thanks!
Citer : Posté le 16/03/2025 16:06 | #
What's the best way to handle unaligned RAM access? I want to write a mixture of bytecode and data to the RAM at 0x8c200000 so I can't use a struct and don't want to waste a whole word for things like a single byte. I tried, for example, casting a pointer to int to a pointer to char, incrementing it by 5, then recasting the pointer to int. I was hoping the compiler would spot the misalignment and switch to loading the value pointed to byte by byte since one or both pointers are misaligned. If you do this on MIPS, for example, it will switch to different instructions that are slower but allow for unaligned reads. Instead, sh gcc tries to load from both addresses which should cause an exception. Apparently, casting a pointer to an address that violates alignment is UB, so this is correct output: https://godbolt.org/z/odYGTMEeP
I read that memcpy is usually best since the compiler may spot aligned accesses and optimize the function call out to a few instructions. I was able to get it to work for sh gcc on Godbolt but haven't been able to generate an assembly listing with fxsdk to check that it works there too: https://godbolt.org/z/cqG8fTqWT
I tried adding -Wa,-aghlns=$<.lst to target_compile_options in CMakeLists.txt. This option generates a separate listing file for each C source file in make, but I'm not sure what the equivalent is here. It appears to generate a single file called '$<.lst' for each source file, so the contents only show the listing for the last file compiled. If I can get this to work, I can double check that it optimizes out memcpy. The listing shows data but none of it is interpreted as instructions:
1218 .section .gnu.lto_.ipa_sra.ff8d44b206af85f5,"e",@progbits
1219 0000 789C6D4D .ascii "x\234mM\211\021\2030\f\263d\223\247\260Jw\204\315k%\224\203^"
1219 89118330
1219 0CB36493
1219 A7B04A77
1219 84CD6B25
1220 0017 95933FC5 .ascii "\225\223?\305\326\033va55\341\0303\367\231b\246\205\031\227Q"
1220 D61B7661
1220 3535E118
1220 33F79962
1220 A6851997
Citer : Posté le 18/03/2025 20:11 | #
That's a tricky question! This might seem hopeless at first. All unaligned accesses are UB:
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.⁸³⁾
⁸³⁾ (...) Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.
Which, of course, the alignment of int is 4 on our platform.
Note that creating unaligned pointers is not UB in itself, it's accessing it that's a problem. In general it is legal, somehow, to make pointers go out-of-bounds and unaligned and later access them if you bring them back. I believe there are good reasons for this, but I don't remember which right now.
Anyway, the next stop is whether GCC has an extension for it. Turns out, it has! You can specify arbitrary alignment on any piece of data with the aligned() attribute. Although again, the semantics are not very clear:
int f(void) { return x; }
_f:
mov.l .L2,r1
rts
mov.l @r1,r0
.L2:
.long _x
This is because aligned() specifies a minimum alignment, and both the compiler and linker will go out of their way to keep things aligned. You don't have a very solid semantic safety net here either since the standard refutes the existence of unaligned objects anyway.
The one thing that is guaranteed to destroy alignment constraints is the packed attribute on structures.
extern struct s ss;
int g(void) { return ss.field; }
_g:
mov.l .L5,r1
mov.b @r1,r2
shll16 r2
shll8 r2
mov.b @(1,r1),r0
extu.b r0,r3
shll16 r3
or r2,r3
mov.b @(2,r1),r0
extu.b r0,r0
swap.b r0,r2
or r3,r2
mov.b @(3,r1),r0
extu.b r0,r0
rts
or r2,r0
.L5:
.long _ss
This gives you the following way to access completely unaligned values:
int f1(char *buf) { return UDEREF4(buf+2); }
_f1:
add #2,r4
mov.b @(1,r4),r0
mov.b @r4,r1
extu.b r0,r2
mov.b @(2,r4),r0
shll16 r1
shll8 r1
extu.b r0,r0
shll16 r2
or r1,r2
swap.b r0,r1
mov.b @(3,r4),r0
or r2,r1
extu.b r0,r0
rts
or r1,r0
Having to specify it manually is annoying, but important. If you know that the field is 2-aligned, you can use a different call that'll go much faster.
#define UDEREF4_2(PTR) (((struct s2 *)(PTR))->field)
int f2(char *buf) { return UDEREF4_2(buf+2); }
_f2:
add #2,r4
mov.w @r4,r1
mov.w @(2,r4),r0
shll16 r1
extu.w r0,r0
rts
or r1,r0
In fact, for unaligned reads in general you can use inline assembly to leverage the SH4-only instruction movua.l, which, well, reads 4 bytes at an unaligned location. I believe it has 2 pipeline LS cycles and then it pieces reads together back in hardware.
int _x; \
__asm__("movua.l @%1, %0\n\t": "=r"(_x): "r"(PTR)); \
_x; })
int f3(char *buf) { return UDEREF4(buf+2); }
_f3:
add #2,r4
movua.l @r4, r0
rts
nop
I believe this will be the fastest method for 2-byte and 4-byte unaligned reads regardless of alignment. For writes, you can use the structure method, and there being aware of 2-alignment when it is available will be useful.
You mention using memcpy(). This is compelling and basically "the" way in C++. Although that's only if you start with a pointer to char. Creating an unaligned pointer to int is pretty much a non-starter because, despite the caveat I mentioned above about this being legal, statically determining (un)alignment is of course undecidable and so the compiler is liable to assuming than any random pointer to int that it can't trace is aligned.
Getting the compiler to optimize out the memcpy() is the hard part:
_h:
sts.l pr,@-r15
mov r4,r5
mov.l .L10,r0
add #-4,r15
mov #4,r6
jsr @r0
mov r15,r4
mov.l @r15,r0
add #4,r15
lds.l @r15+,pr
rts
nop
.L10:
.long _memcpy
And even if you could make it here you'd not even be out of the woods yet. This optimization relies on the compiler identifying memcpy() as a builtin, for which the fxSDK has bad news in the form of the -fno-builtin flag, which it enables to bypass a linker bug about incorrect rescanning of LTO archives. (I believe this is fixed now but haven't updated yet.)
You can force using the builtin by writing __builtin_memcpy() in the code but this doesn't help here either.
Morality: I recommend inline assembly and the structure trick. I don't believe you'll be able to avoid using special syntax instead of the * and -> operators. The rest should work nicely though.
Citer : Posté le 21/03/2025 19:07 | #
Thanks! This is great information, especially the movua.l instruction which looks like what I was looking for. If memcpy doesn't get optimized out, I'll see if I can create a macro for the inline assembly or get the compiler to inline a separate assembly function for unaligned access.
Did you generate the assembly from your post with FXSDK? The arguments I have in CMakeLists.txt are -Wall -Wextra -Os -g -flto Are there any other options I need to add to get sh gcc to compile a test function with the exact same options as FXSDK?
Unrelated to this, when I start my add-in then turn off the calculator for a while, the MENU key doesn't work when I turn it back on. It briefly flashes another screen including the battery indicator then immediately goes back to the add-in. Is this a known problem? If not, I'll work on a minimal example.
Citer : Posté le 21/03/2025 19:55 | #
I tested the examples with sh-elf-gcc -S whatever.s -o - -O1.
If you want to find the exact options used by the fxSDK, compile your add-in with fxsdk build-{fx,cg} VERBOSE=1, which instructs CMake to print the full commands. There's nothing special about the options though, the most important are the -f* but I don't believe any is relevant here apart from -fno-builtin as mentioned above.
It is not a known problem. A MWE would be welcome. However I'm slightly worried as to whether this is a solvable one, as RAM just doesn't retain data for that long while turned off. Should your observation be caused by memory being wiped, we might not be able to do anything about it.
Citer : Posté le 23/03/2025 02:01 | #
It looks like the debug info from -g goes into the listing, and -flto outputs data to the listing instead of instructions. The listings are normal when I leave those two out. I went down the rabbit hole of assembling a function with movua.l in an assembly file and trying to have the compiler inline that which I understand now is not possible. The inline __asm__ will work I think.
For testing, I switched from sh-elf-gcc to sh4-linux-gnu-gcc so I have a regular int main with printf and can run the program with qemu as if it were native. I know the calculator is big-endian, but I haven't figured out how to get qemu to run a program compiled with -mb. qemu does not generate exceptions on misaligned accesses as I hoped, but it's still convenient for testing.
I read the post about memory timings and how slow writes to RAM are. Does the processor block until the write is finished or does it start executing the next instructions and block if it encounters a second write before the first one finishes? Maybe the structure trick for writing to RAM isn't much slower than aligned writes considering how slow writes are.
Can I catch misaligned access exceptions and recover? One thing I'm working on is a Forth for the calculator. Unlike C, Forth lets you read and write whatever data of whatever size and type arbitrarily. I think it's fine to have users align the data in their own programs as long as the calculator can recover gracefully from exceptions. Like BASIC, it shouldn't be possible to crash the calculator.
Citer : Posté le 23/03/2025 09:15 | #
There are multiple layers here. Logic-wise the CPU keeps running and stalls if another instruction with an LS (memory access) cycle occurs. However it also stalls on instruction fetches, which need access to the bus. IF happens every 4-aligned instruction which is basically every other instruction, plus after jumps. So the CPU could stall quickly before you reach the next write. I would think that most of the time you'll write to cache, which takes a single cycle, and the copy-back of a full 32-byte cache line to the RAM is what is slow. However, checking my memory benchmarks (https://bible.planet-casio.com/lephenixnoir/en/sh7305/calc/memory-perf#8-mib-graph-90e-ram columns 3 to 5) shows that access size does matter, which seems to contradict this intuition. Morality: it's complicated, but a bunch of byte writes is definitely 4 times slower than a bunch of long writes according to benchmark.
Yes. gint has a mechanism that allows you to run a small function during exceptions: https://git.planet-casio.com/Lephenixnoir/gint/src/commit/badbd0fd2bd8ac796fd55d49b93691741bd8a139/include/gint/exc.h#L31-L54
I'm illustrating that on the example of gintctl's memory viewer where you get a "TLB Error" message if you navigate to an MMU-managed area where nothing is mapped. You can use this mechanism to log to a global and decide how to recover. Use gint_exc_catch() to set it up, and later again with a NULL parameter to disable it (original gintctl code):
exceptions - TLB miss read and CPU read error.
I use a volatile asm statement so that the read can't be optimized
away or moved by the compiler. */
exception = 0;
gint_exc_catch(catch_exc);
uint8_t z;
__asm__ volatile("mov.l %1, %0" : "=r"(z) : "m"(*mem));
gint_exc_catch(NULL);
sprintf(header, "%08X:", (uint32_t)mem);
if(exception == 0x040)
{
sprintf(bytes, "TLB error");
*ascii = 0;
return 1;
}
if(exception == 0x0e0)
{
sprintf(bytes, "Read error");
*ascii = 0;
return 1;
}
The exception-catching function is called if an exception occurs, and receives the exception code. You must check it against the alignment error codes (reads or writes), otherwise you will ignore too many exceptions (original gintctl code). You can use gint_exc_skip(N) to skip N instructions before continuing. By default, some exceptions will continue to the next line, ignoring the faulty instruction, while others will try to re-execute the faulty instruction. Memory access exceptions are of the second type, so you need to gint_exc_skip(1) otherwise you will find yourself in an infinite loop.
static uint32_t exception = 0;
/* Exception-catching function */
static int catch_exc(uint32_t code)
{
if(code == 0x040 || code == 0x0e0)
{
exception = code;
gint_exc_skip(1);
return 0;
}
return 1;
}
Unlike interrupt callbacks, this function runs in "interrupt mode" and does not have access to any drivers or hardware. It is only designed for recording that something happened. If you don't have a good opportunity to check the error flag in the main thread you can try to setjmp() from the catch-function to an error handler, which I think should work (but is untested).
Citer : Posté le 23/03/2025 15:02 | #
Thanks for the examples on catching exceptions.
Here is an example of the menu bug. https://github.com/JoeyShepard/prog-tools/blob/main/menu-bug/src/main.c I start the add-in, press menu and ON a few times to make sure they work then leave the calculator off while the add-in is running. I tried turning it back on after 5 minutes then again after 10 minutes and the menu key worked so I waited an hour and a half and the bug appeared when I turned it on. This is similar to what happened before.
Citer : Posté le 23/03/2025 17:16 | #
Thanks for the example. The RAM used in the 90+E/CG-50 just gets wiped after about an hour of not being powered. The OS knows that and saves a lot of stuff during poweroff. You'll notice that pressing ON quickly after OFF gives you the OS menu instantly, but doing so after a while takes a few more seconds with a spinner going off (even if it's not a full restart). I'm afraid gint can't reasonably do the same because apps tend to put data a bit wherever (rightly so, it's quite useful...) but then we can't easily save that to ROM during poweroff.
Citer : Posté le 23/03/2025 22:41 | #
Is there anything I can do as an alternative? I don't mind saving some state to ROM manually before gint_poweroff. Is there a way for my add-in to reload itself after the calculator comes back on? It can then load what it saved to ROM if necessary.