The size of the proof is irrelevant as far as its trustworthiness is concerned, it is only a measure of the amount of effort it took. The proof can be extracted and independently checked, eg. by a small, special-purpose, independently-developed proof checker. This is what evaluation labs would presumably do.
The size definitely tells you about scalability of what can be verified. Basically, verification is out of the question for an OS kernel that isn't a microkernel.
This doesn't mean you cannot verify bigger systems. Once the kernel is verified, verifying user-level bits of the trusted computing base is easier, as everything in userland is subject to the (verififed) kernel mechanisms. We're working on a real-world system where the whole TCB is verifiable (and may very well be verified).
Gernot