The above is an example of two photos used to create the animation and the background.
Voice is synthesized, but could be replaced with actors' or owners' voice.
(This is just a mock-up to give you an idea of how this concept works)