The network I run isn't nearly as big as a half-million posts (we've got about 700 blogs alongside a very active BuddyPress network). But maybe there are some things you can take away from our development procedures.
- We've got a production environment and a staging/acceptance environment running on the same VM, so we are guaranteed identical environments. Development is done on local machines. About 8 people can push to our repository on Github. When it's time to migrate code to staging, or to release a stable version on production, I shell into that server and git pull from our central repository. Except in emergencies or when I'm tweaking a release, no one touches code on the live servers. This ensures that commit trees don't get out of sync.
- We don't really attempt to do any kinds of database syncing between dev environments. We do nightly backups, of course, and those dumps are available for developers if they need to refresh their local installations. That means that, every month or two (less if I don't think about it) I do a manual mysql import of last night's backup into my dev environment or into the staging environment. On occasion it can be a pain to have the databases out of sync, but my thought is this: if you are developing in such a way that it requires a particular piece of data in the database, you are probably developing wrong. This is of a piece with your point about code being environment-agnostic.
- When we DO decide to import a database dump into a new machine, we have two options. One is to do the import wholesale and to edit the hosts file on the local machine so that the production domain points to localhost. This has the advantage of being easy and relatively foolproof in the execution. On the other hand, it freaks me out, because often will get distracted and forget whether I'm editing a local or a remote copy of the website (since the URLs are the same). The other option is to run a script on the local copy of the database before or after importing. We have one that does a pretty thorough job, looking through every field in every table, unserializing if necessary, replacing the old domain with the one you specify, and resaving. Obviously, when you are working with a huge database, this can take a while, but since we don't do it very often, it's not a big deal. I'd be happy to share this script with you if you're interested. I launch it manually, but it'd be easy to hook it as part of an automated chain.
- There are occasional exceptions to environment-agnosticism, where the particular data in the database really is of paramount importance. One kind of scenario is where you have to do something simple like activate a plugin - simply pushing it up is not enough, but you actually have to change a setting. The other kind of case is where you have bugs in data. I recently had a situation where I migrated a few tens-of-thousands of email subscription data to a totally different format, and I found out after migrating and launching that in a few edge cases my migration script had the wrong logic. In that case, I did a fresh import of the database and wrote the script to fix the problem all on my local installation. Then, when I committed, I made a note in my commit message that the script would have to be triggered after the site got upgraded (we have a convention of putting ACTION_REQUIRED in the commit message - that way I can easily git log | grep ACTION_REQUIRED to get a sense of what has to be done at release time before taking the site out of maintenance mode). Same for plugin that need to be activated, themes that need to be made available, settings that need to be set, etc.
- Because we have a pretty fair amount of confuguration data in our wp-config and other config files, we abstracted the environment specific data (which really boils down to dbname, dbuser, and db password), defined those constants in a separate file, and then included it at the top of wp-config. That way we get to keep the main config file in the repo.
- We do not have a good system in place for quality assurance. Especially with BuddyPress, there are so many different kinds of content that it's not possible for our small team to check every little thing every time we do a release (time between releases probably averages 1-2 weeks, sometimes much less). I try to mitigate this by having multiple instances of the site on my local machine, in a configuration borrowed from the way that WP versions itself: a master branch where new development goes, and a stable branch for bugfixes. All of the developers but me develop on the master branch, toward the next feature release. When they commit something that I think should go into the stable branch, I use git cherry-pick. We generally run the master branch in the staging environment, because that's the place where we can get the most eyeballs on it (especially from non-coding members of the team, who don't maintain local dev environments) and generally, the features in the master/dev branch are the ones that need the most testing anyway. If WordPress were a different kind of software, we would be hardcore about having unit tests, and maybe it's something we'll move toward in the future - but at the moment it's all human powered.
Hope some of that helps.