Fig.1. Audio without books - no better than the books on their own. Research shows that what works is when the two work together.
Too many companies are currently touting software that can take a 45 minute lecture and package it in a form that makes in bitesized and tagged - butting it through the MagiMix, diced. I can't say it will necessarily improve or add to the learning experience, though I do like to stop start, rewind, play over, repeat, take notes ... go back to the start.
The definitive research on use of audio and text to enhance effective learning was done in the 1990s and published in various papers starting with 'When two sensory modes are better than one' (1997).
Worth the read and written with the multimedia world that was then emerging in mind.
It takes skill and thought to get it right - we've all heard of 'Death by Power Point' - we used to try to avoid 'Death by talking head' - this doesn't add much, what you want is the voice over explaining actions as they take place with text superimposed where the action takes place - even captions and subtitled can cause a cognitive split, increase mental overload and diminish the effectiveness of the learning experience.
Tindall-Ford, S, Chandler, P, & Sweller, J 1997, 'When two sensory modes are better than one', Journal Of Experimental Psychology: Applied, 3, 4, pp. 257-287