Moho Forum

Posted: **Fri Oct 01, 2004 7:34 pm**

PLEASE, somebody can help me (or give me some clues) to modificate the "Bone Audio Wigle" script for it can rotate a LAYER instead a bone? A "Layer Audio Wigle" will be very usefull in some complex cases in where you can't (or you don't want) bind a layer to a bone for several reasons... I think that do it must not be very difficult (only some changes on the existing script), I've tried to do it seen other scripts, but it doesn't works because I don't Know write in LUA (a pity)

...well, THANKS!

Posted: **Fri Oct 01, 2004 9:23 pm**

I can't write it for you right now, but if you take a look at the Rotate Layer Z tool you'll see how to change a layer's rotation value. Take some of that code and replace the bone rotation code, and you should have your script.

Posted: **Wed Oct 13, 2004 3:45 pm**

Well,thanks for yout clue LM! It was be very easy, and now for me very usefull, now I can change the opacity of the layer with the sound and scale bones too, but I don´t know how move bones or how scale layers, because this need change the Horiontal and Vaertical parameters at time... Well, I´ll cotinue investigating, THANKS!!

Posted: **Wed Oct 13, 2004 8:02 pm**

Ramon-

From you're previous request, I whipped up a script that allows you to do X, Y, and Z rotations, in different amounts, with an audio file. Don't have time now, but I'll put it up when I get back from work. I was also trying to include layer scaling, but I've obviously got some horrible misunderstanding about that part of the API, because it crashes Moho with a memory error every time. I'll post my tentative code for that bit as well, and maybe someone can point out where I've gone astray.

--Brian

Posted: **Wed Oct 13, 2004 9:14 pm**

Brian, I'd like to see that crashing script as well. We've tried to make Moho respond to bad scripts with error messages instead of crashing. I'm sure there are lots of ways a bad script can crash Moho, but we would like to minimize them.

Posted: **Fri Oct 15, 2004 3:54 am**

No problem. The part that is making it crash (the layer scaling bit) is commented out.

I have another question about how the getAudioAmplitude fuction is working. As a number, I am never seeing the amount get much over .03, which strikes me as a bit odd. I was wondering if you are averaging audio sample amplitudes over the length of the frame, which could give you kinda wonky results depending on the frequencies being sampled. I would imagine it might track more accurately (and perhaps be easier to program) if you simply used the maximum amplitude during the specified time. The numbers are so small that it makes me think that you are doing some sort of averaging, and just recording the peaks would be , IMHO, better for this application. Especially for audio files that include a lot of transients, an average is going to be skewed WAY low. I just tried 2 different sine wave tones, both normalized for maximum volume. The first, a 4000 Hz tone, gave me an amplitude of approx. .40. Another sine wave of 80 Hz gave me an average of .64. I'm not a software engineer, but I am a fairly old school audio engineer, and that looks distinctly odd to me. This would explain a lot about old comments I had seen about the difficulty of getting the auto lipsync to track properly on some files. Filtering for the audio peaks should solve that.

On that subject, and I guess this is actually a feature request, adding a user input offset to compensate for varying audio file volumes to get things to track properly would be a great help. I know you're probably going nuts trying to finish this thing, but if you could implement these suggestions I think it would make the audio aspects of the program much more functional.

Okay, I had to interupt typing this post to try an implement this within the script, which I did. It is now WAY more accurate. A little slow running, but I'm sure it would be much faster done in C.

Anyway, here are the files. LayerSound is the layer twisting script. I've also patched in the peak searching code into the Bone Audio Wiggle script. It's mo' better. And a tip for using the Bone Audio Wiggle (old or *new and improved*). Use it on a bone attached to a characters "jawline" it you're using the auto (or any other kind of) lipsync. Makes things easier to manage.

--Brian

Posted: **Fri Oct 15, 2004 6:50 am**

I'm definitely not an audio guy, so I'll have to check into your suggestions, but it seems to me that if you look for the peak amplitude in each frame, aren't you going to pick up any pops and crackles? Hopefully your audio file doesn't have any, but if it does, this seems like a real problem.

Posted: **Fri Oct 15, 2004 8:10 am**

Well, pops and crackles could be a problem. However, they're generally not so much of a problem in this case. If you have that sort of background noise that's as loud as the actual audio, most people would give that file up as a lost cause. That gets into "vinyl thats so scratched up you can't possibly listen to it" territory.

To clarify, things like clicks and pops fall generally under the heading of transients. That means they are sound events that take place during a very short period of time. Things like snare hits are also pretty transient events, so this is where we have to be careful. But there's some wiggle room. If there is such a thing, an average click or pop is, if you look in detail at the waveform, going to be only a few samples long at most. Filtering the peak data to cull peaks that are, say, over the previous peak, but of less than 3 or 4 samples in length should work. Most of these sorts of transients are VERY short in length. I'll whip up some code and try it on some ugly files.

An alternate solution would take an average level of perhaps 10 audio samples, progressively keeping the max until you get to the end of the frames worth of audio. That should lessen the effect of unwanted transients and still get the "feel" of the audio across to the program. A less optimal solution, I think, but overall more workable. I'll haveta give a peek over at Sourceforge and see what's particularly good for transient removal. This is an area where I think a "good enough" approach would work just fine..

I had only thought of this in the first place 'cause when I was debugging the script, I was running prints of most of the variables and the amplitude variable just looked wrong right off the bat.

If anyone has any godawful audio files for me to test this out on, send 'em to MoreMoho (at) yahoo.com. I haven't got anything quite wretched enough lying around. If no one else does, I'll ask my brother. He loves sampling from horribly scratched records for some reason.

Also, if you get the time to explain, roughly, how I was able to script badly enough to blow Moho up, I'd sure appreciate it.

--Brian

Posted: **Fri Oct 15, 2004 9:08 am**

7feet wrote:Also, if you get the time to explain, roughly, how I was able to script badly enough to blow Moho up, I'd sure appreciate it.

I tried your LayerSound script, and it didn't crash Moho. Is that the one that was supposed to crash? Are there particular settings I need to use?

As far as scratched up vinyl, that's just the worst case. What about someone whispering into a microphone and getting a few pops with their 's' and 'p' sounds? The maximum amplitude isn't going to be very representative of the actual sound in each frame. Yeah, I know, on a good recording you wouldn't have that, but pops happen.

I see what you're saying about different frequencies, but who's going to be pumping a pure 80 Hz sine wave into Moho? (I guess you are, but I mean for real world animation purposes.)

I'm not convinced that using the maximum is the best thing to do. Think about a loud sound that lasts for 10 frames, then cuts off. If frame 11 has one last sample of the loud sound, is that really representative of the sound for frame 11? In this case, an average is clearly better.

Posted: **Fri Oct 15, 2004 12:48 pm**

First, like I said, the part that crashed it I commented out inside the script. Couldn't have put it up for other folks to try otherwise. The rest of it works pretty nice, I think. But if you un-comment the layer scaling section Moho goes belly up right off. I've been trying to give the beta the test-to-destruction workout, but I didn't mean to go that far.

The sample I've been using is for testing is you're very own Pres. Clinton bit from the tutorials. Okay, they probably had him on a pretty nice mic. But even someone who is recording their own dialog on a crappy headset mic in front of their computer is probably going to reject a take that is so rife with annoying P-pops and sibilance that it's unlistenable. Let alone unprofessional. But in this case, it's also really not that much of a problem. These sounds are, on a small time basis, at least related to the souunds you are trying to catch.

I picked the 2 frequencies I did because thay are, for the most part, the fundamental frequencies of bass drum and snare hits. They are both fairly short time frame events, but also well over the time limit of many annoying transients. I whipped up the LayerSound script as a rough analog to some of the audio based AfterEffects plugs, which seem to be used more for tracking music.

Most vocal frequencies tend to center around 2.5 kHz. P-pops are down around that 80 cycle tone, sibilance most prominent at pretty high registers, say 8k and up. Filtering for the midrange shouldn't be that difficult. Like I said, I'll try to track dowm a handy-dandy algorithm or 2.
Most audio compression, variations of which you use to control these things, are basically just averaging, but specific and context based. On a vocal, do I see a spike below 100 Hz that lasts for less that .05 seconds. Yup, that's probably a pop, filter it out. Anything on a vocal above 6K that's not Maria Callas? Crap, make it go away. Those are 2 audio benchmarks you can hardly go wrong with. Sibilance is people, but P-pops are a mic artifact that have a signature that's hard to miss.

Like I said in the last post, I don't have any handy problem files. Still think that fiinding peaks while discarding transients would be a good, "best fit" solution. I'm not really a coder, but I could certainly work up some logically proper psuedocode to do the job if you thought it was worthwhile
to send me a look at the relevant source. I know this is you're baby (and a big, fat smilin' brat it is), but this is the one area I know I could actually be useful in.

Forgot about this...

but who's going to be pumping a pure 80 Hz sine wave into Moho?

Your average garden variety Jeep Beat is constructed from a low end percussive sample overlayes with a 40 - 80 Hz sine wave, lasting very vaguely on the order of a tenth of a second (from listening out my window). Thats what makes the car go boom. I only patcherd in a pure tone to confirm what I thought, which was that the way Moho is determining the amplitude of a file had little relation to the actual volume. I got an average "amp" of

.03 with the Clinton sample (max, min of arounf .02)
.64 with the 80 Hz sine (last 2 are consistent within reason per frame)
.40 with the 4KHz sine

.03? A maximum of 3% possible volume on an audio file, properly normalized, that you can hear clear as a bell?Sorry, man, but that is clearly not a proper result. I just ran off a test in the script, with the Clinton sample, and I get results for the frames amplitude running from .011 to 1.0, with entirely proper tracking.

So that's the "I" in the "I did not have" from that sample. The maximum amplitude spikes appear to take up a rather small amount of room in the waveform. There is much more in the way of lower amplitude, "intermediate" waveforms.

This is a closeup , of about 8 milliseconds, within the "I". Each horizontal line represents one audio sample. You can see that averaging by audio sample will directly skew the used amplitude down. You can also see from the previous image, how depending on where the frame divisions fall you can end up with variying results, sometime wildly. In any case, a raw averaging is never going to give you a result that means much.

Summation -- Pops, hisses and crackles rarely come to be the loudest events in an audio file. I think that worry belongs in the same category as people who won't wear a seatbelt because 2% of people in a crash are hurt worse instead of less if they do.

--Brian

Added-- A loud sound that lasts for 10 frames. Hmmm. An explosion? A 1950's horror flick chick screaming? Perhaps extreme examples, but if the effect runs into the next frame in these cases, is anyone going to notice? I still go with greatest benefit, greatest part of the time. Most audio related apps pride themselves on transient detection in areas like this, bad audio be damned.

Posted: **Fri Oct 15, 2004 5:17 pm**

OK, there's something funny going on. How exactly did you get the 3% average amplitude? For one particular frame, for the entire sound file, or what?

Because that's not what I'm seeing here. When I run the original bone wiggle script, I get some frames with an amplitude as low as 0.3%, and some as high as 31%, with lots of frames in the teens or twenties. The average amplitude I get for the entire clip is 9%, but I don't see how the amplitude of the entire sound clip has any value whatsoever.

How exactly did you come up with the 3% number?

If you want to try it, run this version of the bone wiggle script:

http://www.lostmarble.com/misc/lm_bonesound.lua

It prints out the amplitude it gets for each frame. The final number is the total averaged amplitude of the entire clip.

Posted: **Fri Oct 15, 2004 5:21 pm**

As far as the crash in the script, it's this line:

vec:Set(self.startScale)

There is no self.startScale variable defined. Earlier in your script, you create a local variable called startScale, so the line should be:

vec:Set(startScale)

Posted: **Sat Oct 16, 2004 12:39 am**

I'm surprised I didn't see any mention in this thread of RMS (root mean square). This is where you take the root of the average of the square of all the samples in a section of audio to get the overall "energy level" of that section.

Posted: **Sat Oct 16, 2004 4:26 am**

Sorry, LM, I was just bloody tired and shoulda gone to sleep. I misread the numbers, was eyeballing an average, and should have said .3 anyway. I was also really talking about the max level. It just seemed a little odd that on a file that was normalized that you were only getting amplitude levels up to 30%. The particular reason, in this case, is if I set a maximum rotation value of 90 degrees, I'm never going to get much more that 30 degrees of movement.

Thanks AcouSvnt, good call. to paraphrase Dr. McCoy, I'm a knob twiddler, not a programmer. RMS was just letters on a dial, never really put much thought into how it was mathematically derived. So, I whipped up a quick RMS calculator, and whadayaknow, the numbers came out pretty close to what the GetAudioAmplitude function was returning. Then I broke down each frames duration into 20 separate blocks, kept the results from the block with the max RMS, and the results looked better. At a hundred blocks, even better but doing all that calculation in Lua was getting pretty pokey. Instead of doing all those calculations samplewise in the script, I think I'm gonna go back and break each frame up into millisecond blocks and just call the GetAudioAmplitude function, and that should work a lot closer as well as much faster. I think the problem is that getting the RMS for a whole 2 or 3 frame section skews the results way down and loses the sense of the peak volume you are trying to track.

Thanks, fixed the bug, so the layer scaling bit works. Added separate X and Y scaling factors, and a radio button to let you choose between RMS and absolute peak levels for the volume detection. Here it is. This is still the somewhat slower version, but I'll update it momentarily.

A question on the API. What would be the function (if there is one) that I could use to directly access information in an audio file bytewise. Microsoft's Linguistic Sound Editor (part of Agent) actually does a pretty good job of extracting phoneme info from a dialogue file automatically, encoding the position of the phonemes at the end of the file, and it's free. I've been wanting to write a script that would decode that data and convert it to a switch file within Moho. End up with free, phoneme based auto lipsync. I just need to know how to get at the raw data. If you could let me know how to do that one thing, I would love to write the thing right now. It was the first thing I thought of when I heard about the scripting, and I think I could actually make it work now.

Thanks a ton

--Brian

Posted: **Sat Oct 16, 2004 5:33 am**

7feet wrote:It just seemed a little odd that on a file that was normalized that you were only getting amplitude levels up to 30%. The particular reason, in this case, is if I set a maximum rotation value of 90 degrees, I'm never going to get much more that 30 degrees of movement.

Keep in mind that 30% is the average over a frame (1/24th of a second). Peak amplitudes are probably higher, but when you average it out you're going to cut off the peaks and troughs.

7feet wrote:A question on the API. What would be the function (if there is one) that I could use to directly access information in an audio file bytewise.

Sorry, Moho doesn't provide a way to get at the raw data. The Lua API is meant to be pretty high-level. Anyway, the phoneme data wouldn't be part of the actual audio stream anyway. You might be able to just open the file directly in Lua and get the information you want.

Take a look at the lm_openfile.lua script. It shows you how to read data from a text file. Reading the phonemes encoded in a WAV file is going to be significantly more complicated, but that might be a very basic starting point.

Moho Forum

Help to modificate an existing script (PLEASE)

Help to modificate an existing script (PLEASE)