First of all: Thank you for the correction. The user manual is indeed not correct. Mono connectors will short circuit your microphone, resulting in a clean 0 energy input.
Then, I unfortunately need to repeat my caveat again, that ultimately all we know about the microphone circuitry in the A13 chip is described here: http://dl.linux-sunxi.org/A13/A13%20User%20Manual%20-%20v1.2%20(2013-01-08).pdf (chapter 23)
Whenever you connect a microphone to the external mic jack, you directly connect to the A13 chip and, as said above, in working with the chip, we found out that pretty much any electret microphone will work. These are the types of microphones that are usually included in headsets or in cell phones. See also: https://en.wikipedia.org/wiki/Electret_microphone
Whenever you use the internal microphone, however, the microphone output goes through an Automatic Gain Control (AGC) circuit. This is a circuit that will keep the signal’s low energy ranges low (ignore noise), amplify the middle energy ranges (assuming it’s the signal far away), and dampen the high energy (to prevent clipping). For a description of how an AGC works, check here: https://en.wikipedia.org/wiki/Dynamic_range_compression So the internal microphone isn’t actually amplified (ie., the signal is scaled up proportionally) but it’s dynamic range is compressed.
So now to your question about integrating MOVI into a box. Whether you need dynamic range compression for the microphone that you externally connect to MOVI depends foremost on your application. Speech recognition experts will distinguish two use cases: Far field and near field. Far field speech recognition is the speech recognition MOVI is performing when you talk to it (from a distance) using the onboard microphone. Near field speech recognition is what happens when you use a microphone close to your mouth, e.g. using a headset microphone.
For near-field speech recognition, you just connect a headset-type microphone to the external Mic jack of MOVI and you are done. However, if you want to do far-field speech recognition using an external microphone, it’s a bit more complicated:
First, you need to chose a microphone that will actually allow you to catch the signal over a distance. Obviously a microphone capsule like the one we integrated in MOVI works. Then I would recommend to use a dynamic gain control circuit, just like we did in MOVI. The chip we integrated in MOVI is the MAX9814. It’s very easy to use and Maxim Integrated has an evaluation kit. Also, Adafruit sells a little board with it: https://www.adafruit.com/products/1713 .
The chip, however, like any other of these chips will output line signal. Therefore, you will need a circuit like the one already depicted above to attenuate the signal back to microphone signal:
+Line level in --||----R1----±- +Mic level output
Ground (input)----±-------------- Ground (output)