Lip Synch in ASP 7

kenmead · Post by **kenmead** » Fri Jun 11, 2010 6:47 pm

O.K. I've been waiting for automated lip synch for a long time. It's the most tedious part of my workflow, as I create at least one three-minute voiced animation per week. I currently use Papagayo. Although it works extremely well and allows a high level of accuracy, it takes forever lining up each and every word. Three questions: 1) how accurate is the ASP 7 lip sync in comparison with manual syncing via Papagayo? Because the Pro version allows text entry just like Papagayo, I'm assuming it's fairly accurate...correct? 2) Does it allow any re-alignment of the words after the initial processing for slight tweaks? 3) What is the "AST Production Sync Library" that is referred to in the product description on the web site?

heyvern · Post by **heyvern** » Fri Jun 11, 2010 7:37 pm

Not sure which version of AS you want to know about so I will cover both.

The AS Pro lip sync is a bit better than Debut. It allows you to enter text in the audio properties that helps with lip sync (I believe that is the libraries you mentioned).

From my testing and personal preference, and considering how much time it takes to lip sync by hand, I think the auto lip syncing is very very very good. It works fast. All of this depends on the lip sync file of course. Long stretched out unusual speech patterns may not match up as well but overall it's really quite good.

The Debut auto lip sync is also very good. Not sure what makes it different in final results. It does NOT have the text entry for the audio properties. However it worked great in my testing.

This is NOTHING like Papagayo. Not at all. In Papagayo you have more precise control. In AS the auto lip sync is very basic. You assign a switch layer to the audio and keys are generated for that switch layer based on the audio only in Debut, or audio and text in Pro.

Once the keys are in, you can go back and move keys, or change the switch layers to make it better or fix mistakes.

------------

In my testing for both, I used the Integrated Audio Recording inside AS pro and Debut. I recorded my voice speaking and changed the pitch for fun. Then in Pro I typed in the words I spoke for that audio file and linked it to a mouth switch layer. In Debut I just linked the switch layer.

In both cases the results were 99.99% usable without any other changes... in my opinion and based on a situation that required FAST results that still looked good. In some spots I did have to assign switch layers by hand. Mainly due to a "pop" or extraneous sound in the recording that caused a mouth change. Or in the beginning/end if it didn't put a rest pose. Sometimes the mouth was "hanging open" because it keyed the first phoneme right at the beginning or left the last phoneme at the end instead of inserting a rest.

-------------------------

Doing lip sync by hand will always give you better results. Automated lip sync is never going to beat that personal touch. However if you need quick easy lip sync that looks dang good, the auto lip sync in AS is fantastic and a real time saver. Even just getting the rough keys in first so you can tweak it by hand to make it perfect saves a ton of time.

-vern

J. Baker · Post by **J. Baker** » Fri Jun 11, 2010 9:10 pm

I always tell everyone to learn to scrub the audio in AS and inserting your mouth switch layers by hand. I find it to be most accurate and fast once you learn the process. For me, lip-syncing is the easiest and fastest to do throughout the whole animation process.

heyvern · Post by **heyvern** » Fri Jun 11, 2010 10:27 pm

As I said, hand done lip sync is going to be better. However even if it is simple to just scrub the timeline and add the keys... it still takes more time than "automatic". If you have a TON of lip sync to do VERY FAST, auto lip sync is a great way to get the job done. It takes seconds not minutes... or even seconds not HOURS if you have a lot to do.

-vern

J. Baker · Post by **J. Baker** » Sat Jun 12, 2010 12:19 am

heyvern wrote:As I said, hand done lip sync is going to be better. However even if it is simple to just scrub the timeline and add the keys... it still takes more time than "automatic". If you have a TON of lip sync to do VERY FAST, auto lip sync is a great way to get the job done. It takes seconds not minutes... or even seconds not HOURS if you have a lot to do.

-vern

I tried the auto at one time but it didn't work as well I as like. But anything "auto" isn't going to be as good as doing it manually. But it should fit most users needs.

sbtamu · Post by **sbtamu** » Sat Jun 12, 2010 12:33 am

I noticed that when I used a soft (low) in volume voice the auto lip sync would just be a grin and teeth, but if I imported a loud male voice I would get decent looking auto sync mouths. Is this normal for auto sync?

J. Baker · Post by **J. Baker** » Sat Jun 12, 2010 12:47 am

sbtamu wrote:I noticed that when I used a soft (low) in volume voice the auto lip sync would just be a grin and teeth, but if I imported a loud male voice I would get decent looking auto sync mouths. Is this normal for auto sync?

Very well could be as it's probably checking for levels in the audio. The louder the level, the bigger the mouth shape.

sbtamu · Post by **sbtamu** » Sat Jun 12, 2010 12:54 am

J. Baker wrote:
sbtamu wrote:I noticed that when I used a soft (low) in volume voice the auto lip sync would just be a grin and teeth, but if I imported a loud male voice I would get decent looking auto sync mouths. Is this normal for auto sync?
Very well could be as it's probably checking for levels in the audio. The louder the level, the bigger the mouth shape.

Is there any way to use, lets say my grand daughters voice, amplified with audacity then lip sync it and get a good looking mouth and then lower the volume w/o altering the mouth?

heyvern · Post by **heyvern** » Sat Jun 12, 2010 2:02 am

You can lower the audio in the audio properties tab of the audio layer. You can key frame this value as well to do fades. So yes, if you boost the audio you can lower it in AS.

The default is 1. To lower it set it to 0.7 or some fraction like that.

-------------------------------------------------------

I did my tests with the built in microphone in a macbook with 3 computers running and an overhead fan on high. I was about 1 foot away from the mic. I was able to get usable lip sync.

If your source audio is not loud enough to get a good lip sync you should boost it. If your audio is not suitable you need to lip sync by hand. Garbage in Garbage out. Not a comment on you skills but the truth is that audio for ANY animated production needs to be CLEAN and as good as you can get it and not just for lip sync.

Another problem that can occur is not having "good" phonemes for the characters mouth shapes. I did several tests of the exact same audio with different characters. Some had wonderfully done mouth shapes and the lip sync looked fantastic. Some had very badly done mouth shapes with very little difference. The same audio and auto lip sync looked HORRIBLE with those characters. However doing it by hand wouldn't improve it very much.

--------------------------------------------------

I don't want to beat a dead horse but "auto lip sync" is "automated". Automation NEVER has the touch of a human hand and artists eye. Automation can never replace an "artist". However not everyone has the time or the skills to do hand lip sync.

The entire point of auto lip syncing in AS is for those who want something done quickly and easily with good results. I just did a bunch of quick tests with the audio recording and my headset. Lip synced all of them in just a few minutes. Very fast very simple. Usable results. The lip sync looked as good as any Saturday morning crap or 12fps anime cartoon.

Like I said there may be some spots that involve HUMAN DECISIONS. For example:

One recording I said

"this is a test this is a test, testing 1 2 3"

The word "testing" is made up of the same "phonemes"; T-S-T. AS only has the ten default phoneme positions. It doesn't have an "NG" phoneme. "T-S-T" with auto lip sync is going to use the "etc" mouth position THREE TIMES IN A ROW. To make it look right, I as a human, had to choose alternate phonemes that better represented the mouth at those points. By adding a closed "E" shape I was able to break up that static word and improve the look.

HOWEVER, I only spent a few seconds in that one spot. Everything else was as good as I would have done it myself. Each word, phoneme was broken down correctly. If I was in a hurry it would be done. If I choose to I could spend more time tweaking it.

-vern

sbtamu · Post by **sbtamu** » Sat Jun 12, 2010 2:18 am

yes TY heyvern, I boosted the low voice of the woman in audacity and imported and lip synced it then used the property tap to lower the volume, and got nice results, not as good as doing it manual but will work.

Thanks