The above is an example of a single photo used to create the animation and background.

Voice is synthesized, but could be replaced with actors' or owners' voice.

(This is just a mock-up to give you an idea of how this concept works)