Build internal tools that are easy and safe to use

Building internal tools, scripts and in general "facilities" that ease the development process is something almost all of us do. Whenever it is a DFS shared folder containing frequently used applications, scripts, or common documentation or UML diagrams, almost every company has one.

When companies grow and have multiple developers, other tools start to surface: Reporting tools, internal wikis or CMS, a continuous integration server, some virtual machines or testing and staging environments/machines.

And when you do high scalability projects, many more tools and scripts are required: custom source code management tools or commit hooks, deployment scripts, automated testing launchers and assistant tools, ways to restart or "wipe" caches, web servers and builds...

But no matter in what you are, you need two basic things: to make those tools as easy as possible to use, and to make them as human failure-proof as possible.

I'll go with the easy one first.

If you don't make easy tools, some engineers might get stuck using them, and either get delayed (due to having to rely on other engineers to properly use them) or in the worst case scenario, maybe try to find an alternative shortcut that might even be dangerous (for example, doing a manual deploy vs. using a deploy script). It is important that scripts be verbose, explain everything properly, contain easy instructions or steps:

For example, favor "press Y to update XXXX with root credentials, or type the IP of desired address" instead of "type: update_staging server indicating as parameters the ip address, the root username, root password and the debug flag". Let the script run and ask for confirmation, and instead make it read from a config file the root credentials. This way enginners not need to memorize them (or worse, write them anywhere like in a post-it).

If something is clumbersome, try to build a "validated" shortcut that does everything properly and easier. Think this: If your tool is "awesome" but harder to use than the old approach, you've failed in your goal.

And now, to the big part: Building safe tools.

If you have a DFS, either create a "public" subfolder being the only place where everybody has write access, or individual folders per users, but never ever give everybody full access everywhere. I've seen more than once deleted documents, code and other files per mistake on drives without any backup procedure.

If you have a deployment script, you should for example enforce:

  • Only allowing to deploy in the purposed servers (staging servers for a staging deploy, testing servers for a testing build...)
  • Separate configuration deployment from code deployment, to avoid overriding production config with testing config
  • Deploy with impersonation instead of with the user credentials (so nobody needs to have more permissions to deploy)
  • Deploy from an intermediate place in which you need to do a clean checkout of the code. This avoids the "works on my machine" effect (works on your machine but nowhere else) plus minimizes deploying "garbage" (files only present in the dev. environment, temporally files, etcetera)
  • Make a build, or store the last successful build and deploy it. Unstable code should be present only in your development environment, even code for testing under a CI machine should be expected to compile without errors (or at least not have syntax errors in case of scripted languages)

An example of what can happen if your tools are not safe is what happened to me few months after starting my current job.
We were using back then PHPUnit + Selenium + MySQL and not much more for testing. We had one shared DB with some data for develpment, and when changing the "build script" to be in testing mode a specific DB for each user would be truncated and recreated for running the test batteries.

The test script required implicitly the developer to always change to testing mode, but without any safeguard or check. Guess what happened when I once forgot to change to test mode... instead of creating a specific empty DB for me, it truncated and recreated all tables... on the shared development DB! :O

We had a backup and only lost around 2 hours to restore everything to normal, but wasn't the best way to think about improving our scripts. Of course the script was improved to a) check explicitly for the testing mode and b) prohibit itself from running the DB recreation subscript if not in testing mode.

And same goes for many other things, from source code repositories (backup them, restrict access and for example do not allow to delete the trunk branch), application configuration (an intern might want to do his best, but if messes up the configuration of a company-wide application it would delay everyone until repaired), data management (clearly separate scenarios and avoid overlappings, whenever they are by mistake or in purpose)...

Lastly, this will always help in case of angry fired employees. If access is layered (inner circles of greater granted access, but always requiring access from the nearest circle), removing the credentials should stop all access: If you don't know the dev. DB credentials but run a script instead, if you get fired you cannot be a bad boy and execute a DROP DATABASE as you no longer have any user to access level A and from there jump to level B and once there execute the reset database script.

Remember, "Better safe than sorry".


Posted by Kartones on 2011-05-29