The above is an example of a single photo used to create the animation and background.
Voice is synthesized, but could be replaced with actors' or owners' voice.
(This is just a mock-up to give you an idea of how this concept works)