Alrighty, then. After a good night's sleep and with my blood/caffeine ratio well on its way to zero, I'm ready to try writing up what I know about the animation format. Some of this has already been mentioned in this thread, but I'll go ahead and write it up again (or use copy and paste, hehe) so it's all in one place. Let me know if any of this is unclear, contradictory, needs further explaining, etc. Here goes.
Animation format:
Number of frames (4 bytes)
Pretty self-explanatory
Animated articulation identifier (? bytes)
Just a placeholder name since I don't know what to call these. The length of this part seems to be dependent on the model, possibly the first multiple of 4 that allows it to contain two bits for every articulation in the model. (I'll look into that. I didn't notice that it was different for Serge and Kid until just now.) Serge has 8 bytes here, Kid has 0xC. This set of bytes tells the game which of the character's articulations will be animated by commands in frames after the first. I'll use Serge's idle animation (which has 5665600015150000 for this segment) as an example of how to read them.
First read the bytes in little endian order, so they become 0000151500606556. Then convert them to binary, which makes them
0000000000000000000101010001010100000000011000000110010101010110.
This is much easier to work with if you reverse the order and put the least significant bit on the left, making it
0110101010100110000001100000000010101000101010000000000000000000.
From here, I generally split them into two rows, with even-numbered bits on top and odd-numbered bits on bottom. The set we're working with looks like this:
ABCDEFGHIJKLMNOPQRSTUVW
01111101000100001110111000000000
10000010001000000000000000000000
The letters I've placed above the binary there represent which of Serge's articulations each column applies to. A 1 in the top row indicates that that articulation will be given rotation commands in later frames of the animation, while a 1 in the bottom row indicates that it will be given translation commands. In other words, articulations B, C, D, E, F, H, L, Q, R, S, U, V, and W will be rotated in the animation, and A, G, and K will be translated.
Frame pointers (4 bytes each)
The pointers are relative to the start of the animation header. Not much else to be said about them.
Frames of animation (Variable length. See below.)
Would you believe this is the easy part? The commands given to articulations in the animation are basically identical to the way that the model's "skeleton" was set up in Section 2. Each command will be 6 bytes long, 2 bytes for each axis. The order of the axes is probably XYZ, as it was in section 2, but I have a hard time thinking in 3D, so anyone's welcome to double check that. The values the command can take range between 0xF000 and 0xFFF, or -4096 to 4095. In the case of rotations, 4096 means 360 degrees.
First frame of animation ( (0xC * # of articulations in the model) bytes)
The first frame of the animation gives two commands to each articulation of the model, presumably in the order that the articulations were defined in Section 2 of the model. Each articulation has 0xC bytes of commands given to it here - 6 bytes of rotation commands followed by 6 bytes of translation commands. The rotations and translations are relative to the model's "default" position as it was set in Section 2.
Other frames of animation ( (6 * # of 1s in the binary of the animated articulation identifier) bytes each)
For each frame after the first, the animation contains only the commands that were defined in the animated articulation identifier, starting with articulation A's rotation, then its translation, then B's rotation, then its translation, etc. For the example of Serge's idle animation given above, the first command in each frame in this section will be the translation command for articulation A, the second command will be the rotation for articulation B, the third command will be the rotation for articulation C, and so on. Note that each command is still relative to the positioning of the articulation in the model's default position (how it was set in Section 2). They are NOT relative to the position of the model in the first frame of the animation or the previous frame of the animation.
Assorted notes
The animations in game don't ever seem to give an articulation both rotations and translations in a given animation. In addition, it seems that any articulation that was translated or rotated at all in the first frame of an animation is always marked in the animated articulation identifier, and therefore always appears in frames of animation after the first. Both of these appear to be quirks of either convention or whatever program Square was using to animate the models. The game's engine doesn't seem to have a problem with articulations receiving both rotation and translation commands, and it's very possible to set an articulation's position in the first frame of animation and lock it there for the duration of the animation by not marking it in the animated articulation identifier. I used the second trick pretty frequently in my demo video.
EDIT: I accidentally gave my explanation of the articulation identifier as though Serge's was 0xC bytes long, when it's actually 8. Way too many zeroes in there. Corrected.