Forbes Lindesay

Native Node Modules on Windows

I recently came upon the problem of installing native node.js modules on windows again. I just wanted to record the trick that I keep forgetting:

Add --msvs_version=2012 to the end of you install command and it will just magically work on modern operating systems with modern versions of visual studio.

It looks like the best way to make sure this setting persists so you don’t have to remember it every time is to set the Environment Variable GYP_MSVS_VERSION to the value 2012 (source: http://stackoverflow.com/a/25071620/272958).

N.B. If you have a different version of Visual Studio installed, you may need to use a different number in there.

esdiscuss.org downtime

esdiscuss.org is a node.js website hosted on nodejitsu and backed by a Mongo database provided by Mongo Lab. It has a worker process that pulls messages from a pipermail archive and puts them into Mongo (status page: http://pipermail-bot.jit.su/) and a web process that displays this data as HTML.

Unfortunately, the last few weeks have seen pretty high downtime for esdiscuss.org. Last night it was down for about 5 hours. Tracking down the issues, it seemed to be related to the server losing the database connection and never recovering it. I attempted to add reconnect logic to mitigate this but got it slightly wrong and ended up opening too many connections. As a result Mongo Lab (very sensibly) blocked my IP address. This resulted in the website going offline.

After talking to Mongo Lab support, I got the IP address unblocked and fixed my reconnect logic so it would never try more frequently than once per minute. With this new logic in place, I’ve also upgraded to a payed for mongo instance which means a dedicated process. Hopefully this should improve stability and performance.

Thank you to Mongo Lab support for being really helpful with debugging the issue and apologies to anyone who tried to visit the site while it was down.

12 years ago

Two Types of Duplex Streams

Duplex Streams are data streams that are both readable and writable. Many languages that have some kind of stream concept build two way network connections as an object with an input stream property and an output stream property. An example of this can be seen in node.js (process.stdin is the input and process.stdout is the output). Many node.js users choose instead to have the item itself be both a readable and writable stream (e.g. substack’s dnode).

This is the classic server.pipe(client).pipe(server).

The other way that people use Duplex Streams is for transformations. You can pipe raw network data into the stream and recieve parsed output on the other side, or you can pipe structured data in and get raw bytes out (that you can send over a network or write to a file).

This is the classic inputfile.pipe(parser).pipe(update).pipe(stringify).pipe(outputFile).

Error Handling

The problem with these two different concepts being represented by exactly the same programming concept is that they have different desirable error handling characteristics.

For the client/server architecture you want your errors thrown immediately (and crashing the applications) if they are not handled. From the point of view of handling, you want to make the consumers of the API handle the errors as close as possible to where it was thrown. That means you do no automatic forwarding and require users to listen to the 'error' event immediately. This is how node currently works.

The reset of this post represents how I feel streams should work, it may not always reflect the views of the node community at large (that’s up to you to decide)

For the transformation streams, the desirable error handling behavior is totally different. When readable streams are piped, you want their errors to go with the data. Consider the following function to return a parsed stream of a file:

function read(path) {
  return fs.createReadStream(path)
    .pipe(parseRawData())
}

If there is an error parsing the data, the returned stream emits an error. This is what you want. If there is an error reading the file (e.g. the file does not exist), the error will crash the application. This is not what you want. The behavior you would want is something like:

function read(path) {
  var src = fs.createReadStream(path)
  var dest = src.pipe(parseRawData())
  src.on('error', dest.emit.bind(dest, 'error'))
  return dest
}

That’s a far too convoluted way to do what is almost always what you want with transform streams. Until node.js has something native built in, I’m going to use my extension library barrage. It adds a new method called syphon which acts like pipe, except it forwards our errors. This lets me re-write the fixed version of read as:

function read(path) {
  return barrage(fs.createReadStream(path))
    .syphon(parseRawData())
}

This is way closer to the original method we wanted to write, but it handles errors properly. I’d really like syphon to be added to node natively :)

The Kind of Obvious

Once you’ve fixed the error forwarding so you can choose between pipe and syphon depending on whether the errors need to be forwarded or thrown, there are a few other things that really should be more convenient:

It should be easier to buffer the output of a read stream when you no longer nead streaming for maintaining low memory usage. It’s surprisingly tricky to get back to a proper callback API.
When you’re writing to something there isn’t anything to buffer, but it still needs to be easier to get back to a nice callback API.

I solve both these methods in barrage as barrage(readable).buffer(callback) and barrage(writable).wait(callback).

Both methods offer the guarantee that the callback will never be called more than once and both these methods return promises if you omit the callback, because that’s my personal preference.

SemVer 0.y.z

Semantic Versioning is extremely useful when you are attempting to manage your dependencies. Module authors are not allowed to make breaking changes in the minor or patch version numbers. If you make a breaking change then you must increment the major version (where the version numbers are of the form major.minor.patch).

This is important because it lets you keep most of your dependencies up to date without continually, manually updating the version numbers. It’s a problem though if you want to go with the strategy of releasing early.

Releasing early prototypes is a great way to get feedback and speed the development of really useful tools. The problem though, is that the API is likely to change very frequently when you are still in the early stages of development. This could lead to really high version numbers very quickly if you’re not careful. I’m not keen to be releasing version 100.0.0 as the first stable release of any products.

The Lost Feature of SemVer

Fortunately, this problem was foreseen by the people who designed SemVer. Point 4 of the Semantic Versioning 2.0.0 spec states:

Major version zero (0.y.z) is for initial development. Anything may change at any time. The public API should not be considered stable.

What this means, is that you can develop your packages without excessive increments of version numbers and just release version 1.0.0 once you consider the product relatively stable. Having released version 1.0.0 you then need to stick to Semantic Versioning rules. I also recommend that you always aim to release a version 1.0.0 before too long. You don’t want to leave packages that are actually pretty much locked at version 0.y.z

My Views

For my own personal use I like to attach some semantics to the versions even within version 0.y.z. As such, I increment the minor version for any changes which I expect to be backwards incompatible or which introduce new features and I increment the patch version when I expect the change to be largely backwards compatible and it’s a bug fix.

I don’t stick to this religiously though, and I’m not required to by SemVer. If I have a package that you want to use, and it looks stable, and it’s still not at version 1.0.0 yet, please open an issue and I’ll release version 1.0.0 if I feel it’s reached a stable point (or I might transfer ownership to you if I’m not longer interested in maintaining it)

Static Websites with Express, Stop and Mandate

Deploying a static website to Amazon S3 with express stop and mandate.

There are two key points I want to get across in this blog post.

Dynamic websites are always easier to build than the equivallent static ones
Automate everything that has to be done more than once

STOP

Dynamic Websites are Easier to Build

There are lots of tools around that help you build static websites. They work on the principle of you putting your source files in one folder and then they generate some ouptut in another folder. They aren’t dynamic so they don’t tend to support things like connecting to a database or making web requests while rendering a page. Many have some form of plugin system (e.g. docpad) which lets you extend their functionality into really complex beasts.

All these systems are arguably also dynamic websites though. Most even come with a built in server so you can see your edits live rather than having to manually re-compile. What this means is that you can use any static site generator as it it’s dynamic. Using a dynamic website generator gives you many more options though. Dynamic website libraries like express get a lot more users and have carefully crafted APIs, whereas the purely static site generators almost always have a tiny handful of users and poorly thought out APIs.

Static Websites are Easier to Host

Amazon S3 means that it’s now possible to host websites with vast amounts of traffic for a few cents per month. It also offers fantastic performance and uptime. This means that static websites are almost always the best way to go if you don’t have any content that needs to be dynamic (or if your dynamic content can be provided from systems like Disqus).

Conclusion

I’ve created stop which is a small command line application that downloads an entire website into a static folder. It does this by taking a starting point you provide and then parsing any HTML it downloads to find links to follow. This lets you develop and test a dynamic website, before downloading it into a local, static folder.

Stop also does a few other helpful things on the side. If you pass the minify-css and minify-js options it will minify CSS and JavaScript respectively. Minifying as you download the site and make it static is a really clean and simple way to do this and saves you cluttering up your application logic with minification.

Stop can take a URL, a port number or a JavaScript file (that uses node.js to launch a website) as its source. If you add a .stop.toml file in the root of your project with the following contents then all you will need to do to make your static site is type stop in a command line.

source="./server.js"
destination="./out"

[options]
minify-js=true
minify-css=true

MANDATE

Having got our static website in the out folder, we need to upload it to Amazon S3. If there are a few files and it’s a one off you can do that through the AWS Management Console. I like to keep everything automated though, so I use mandate. To use mandate you just have to create a file in the root of your project called .mandate.toml with content that looks something like:

source="./out"

[aws]
bucket="htmlparser.forbeslindesay.co.uk"
key="<YOUR AWS KEY HERE>"
secret="<YOUR AWS SECRET HERE>"

This takes everything from the source folder and uploads it to the output bucket. If you use IAM from Amazon you can create an individual user just for that one bucket by giving it a user policy of:

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::htmlparser.forbeslindesay.co.uk",
        "arn:aws:s3:::htmlparser.forbeslindesay.co.uk/*"
      ]
    }
  ]
}

where htmlparser.forbeslindesay.co.uk is replaced with your bucket name.

Having done all that, you can deploy a new version of your static website just by typing

stop
mandate

The Final 10%

We now have a very nearly fully automated system. The only additional thing that I like to do is version my static assets so that I can set very long cache times on them. What that means is that everything except the html files will have a url like example.com/static/0.0.0/foo.js. That won’t work if I upload two versions with the same version number. As such I have to make sure the version is updated each time. I use versionify for this, which prompts me to update if I haven’t already.

In order to avoid uploading every version of every static asset every time I deploy, I like to delete the output folder after I’m done. If I was only working on unix based OSes I could just use rm -rf but that doesn’t work on windows, so I use rimraf which does the same thing using node.js to be cross platform.

Pulling that all together, each release should run:

obliterate out
versionify
stop
mandate
obliterate out

Clearly I don’t want to type that every time I do a release, so instead I use npm’s scripts feature to automate this:

package.json:

{
  "name": "htmlparser",
  "version": "1.0.0",
  "versionify": "1.0.0",
  "private": true,
  "dependencies": {},
  "devDependencies": {
    "stop": "~2.1.1",
    "rimraf": "~2.2.0",
    "mandate": "~0.1.1",
    "versionify": "~1.0.1"
  },
  "scripts": {
    "prerelease": "versionify && rimraf out",
    "release": "stop && mandate",
    "postrelease": "rimraf out"
  }
}

Then I can install everything using npm install and release a new version just by typing npm run release (and it will prompt me to update the version automatically.)

12 years ago

Releases

Different people seem to go with vastly different release strategies for software. When and how often you release has knock on affects on how you test and how you develop or consider new features.

Assumptions

Releasing a new version of your software is fast and easy.
Downloading an updated version of your software is fast and easy.

Both these points should almost always be under your control. If they aren’t true, do everything in your power to make them true. If you’re going through an App Store, point 2 should be implicitly true. Point 1 is usually true in that submitting is pretty quick and easy. It might take a few weeks to actually get reviewed and published, but there’s nothing to stop you following up with a new release in the mean time: it’s a pipeline.

If you manage the whole software stack, make downloading updates fully transparent (think Google Chrome). If it’s a library, then use and rely on SemVer. Make uploading a new version a single command line message: foo publish v1.0.0

Bugs & Regressions

The problem with frequent releases is that it’s difficult to prevent bugs slipping into your software. You can help that with automated tests. A module is never finished unless it has a series of automated tests. I don’t care how big or small your software product is; running a manual test twice is doing it wrong (Golden Rule No. 1).

Some things are legitimately difficult to test. If you have a complex project run on many platforms in many different situations there will be bugs slipping through the cracks. People try and solve this by making the release cycles longer to allow for more time to test. The issue is, it still allows bugs through the cracks. Unless your software bugs might cost lives (or millions of dollars) people will forgive the occasional bug. In fact, they’ll forgive a lot of bugs, provided those bugs are fixed quickly.

Always fix bugs before writing new features (Golden Rule No. 2). This is important for multiple reasons. Nothing will annoy your users more than having something that doesn’t work remain broken while being told about exciting new features they’re not going to use.

Users will also normally forgive bugs if they know that releases typically come out every few days and always fix bugs before adding features. This brings me to my closing argument.

Conclusion

Someone is always waiting for your bug fix or new feature. If you wrote code this week, you should be planning to release this week (Golden Rule No. 3).

The Importance of SemVer

SemVer or Semantic Versioning is an important tool to help us build large modular systems. Most modern programming languages have a package manager of some description to go with them. This lets you install dependencies easilly, and it allows those dependencies to have their own dependencies and so on. The advantage of this is that modules can share code, which saves time and speeds up bug fixing.

For the Library Consumer

OK, so lets imagine your building a large web-app and you’ve got lots of dependencies because you know that’s better than re-writing or copy and paste programming. You have broadly 3 options for how you lock down your dependency versions:

Use * to always get the latest
Use an exact version number to always stay on the same version
Use some kind of range specifier.

Option 1 is bad news. If you use option 1 then every time someone changes their module, yours might break. It means you automatically always get the latest version and you don’t keep any information about what version your app was actually tested on.

Option 2 has two issues. Firstly there’s a false sense of security. You may have used exact specifiers for your dependencies, but the dependencies of your dependencies probably haven’t, so this doesn’t really fully protect you. The other problem is that you won’t automatically get bug fixes. You’ll only get bug fixes if you explicitly upgrade.

Option 3 is usually the best option. If you write your version specifier as ~1.0.0 you will get the latest version that’s been released and begins 1.0. This means that if the versions that have been released were:

0.9.0
1.0.0
1.0.5
1.1.0

You would get version 1.0.5. The key to this semantic versioning is that the first number represents large or breaking changes, the second version represents new features that might cause minor backwards incompatabilities, but probably wont. The third number represents bug fixes. As such, this specifier leads to you getting all bug fixes automatically, but forces you to manually update if you want new features.

For the Library Author

As a library author the rules are simple:

Increment the third number for bug fixes. Never introduce any breaking changes without incrementing another number.
Increment the second number for new features that don’t change the API. This should mean that 99% of users won’t see any breaking changes, but if someone’s relied on an odd corner case, or done something a little wierd, it might break.
Increment the first number every time there’s a major breaking change. If you expect most people will need to do a little bit of manual work to upgrade, then increment this number.

What could go wrong?

You might ask yourself, “Does it really matter that much?”. The answer is that it does. If you break things without incrementing the correct version number, then other software that relies on semantic versioning will break. If you frequently increment too high a number, your bug fixes will be slow to propagate and people will run into errors that have been fixed for ages.

Safety First

There’s one problem I keep seeing from people who normally follow SemVer. What happens when you realise your back room project has become hugely popular and that your hastily concoted API just isn’t cutting it anymore? You decide it’s time for a total redesign. You’re going to create a totally new version and everyone’s code will break unless they manually change things to be compatible with the new API.

When you’re faced with doing this, you have to be prepared. You will get users who get stung by the change. There will be people who used * as their version specifier when they depended on your library. These people have never learned about SemVer so they just feel cheated that you broke their module. Expect to have to patiently explain SemVer to a few people in a couple of GitHub issues. Perhaps link to this, or another, blog post.

When faced with this prospect, I see lots of library authors going “why not release it under a new name?”. This is what @mishoo decided to do with uglify-js. Instead of publishing it as uglify-js@2.0.0 he published it as uglify-js2. The long term problem with starting out this way, is that there are still a number of modules that depend on the uglify-js2 version, even though he did ultimately switch back to publishing as uglify-js@2.x.x. You might argue that this was a problem that could have been avoided if he just stuck to his original plan of calling it uglify-js2.

There are 2 key problems with this:

Discoverability
The future’s a whole lot longer than the present.

Discoverability is a problem because all the people using your old version may never find out about the new version. Tools like gemnasium and david won’t report the version as out of date, because it won’t know about the new version. This means people will miss out on awesome features. Perhaps worse though are the newcommers. They may find the old version and stop there. For example, a google search for uglifyjs displays the original github repository for uglify-js before the new one for uglify-js2. This is because @mishoo still hasn’t fixed the mistake of creating a whole new GitHub repo for the new version (rather than making use of branching). Many newcommers would just see that first result and use it. Even the ones who see the second result, might see that the first is more popular and assume that the second is some botched attempt at creating a better copy of UglifyJS and just use what appears to be the original, more popular, official version.

The second problem is that I envisage uglify-js being one of the most popular JavaScript minifiers for at least another couple of years. It’s fully possible that if we keep using JavaScript, and uglify-js stays up to date, it could end up being the most popular minifier for another 10 or 20 years. That’s a long time for us to live with adding a 2 to the end of the name. It might not seem like much, but everyone adding a 2 to the end forever more is a definite pain, and can definitely be avoided. The small pain of breaking a few apps/code briefly now vs. slowing down software development forever is, in my opinion, not a tough choice.

Teaching

It’s important to use SemVer, even if those around you sometimes get it wrong. If we don’t, then people won’t learn. If you’re writing a library, and you don’t let people get stung because they didn’t follow SemVer, then those people will never learn the valuable lesson about how to specify version numbers. Most of the people who get stung probably won’t be big professional developers, they’ll be people just starting to learn, and it’s better that they learn now than later.

If your consuming a library, you should use SemVer, and trust your unit tests (make sure you always write some). This way you’ll find out, and can teach the library authors about SemVer when they publish versions with breaking changes.

Every rule is made to be broken

I break this rule myself, for two reasons.

If I know a version is broken, I may use absolute version numbers to avoid the new version until it’s fixed. If I do this, I’ll always open an issue on the library author’s GitHub repo, or send them an e-mail if I can’t find the GitHub repo.
For my development dependencies I often just use *. This rarely results in breaking changes; It’s always me, or another library author, who discovers them, not an end user; and it’s a pain to always have to worry about updating them manually.

Other than this, if you see a library of mine that doesn’t use SemVer, it’s because I wrote it before I knew about SemVer.

A Tip for Debugging Stack Overflow Exceptions in C#

Debugging a stack overflow exception is usually pretty simple, providing you have a stack trace. Just look for where the loop is and your sorted. That’s no help of course if you don’t have a stack trace. The gods of C# didn’t see fit to provide you with a stack trace when you get a Stack Overflow Exception.

The workaround is to somehow put a break-point just before stack overflow is triggered. This can be done with a conditional break-point. Start by creating a new break-point then right click on it:

Click Condition

Click Condition, and then in the box that appears, type something like:

(new StackTrace()).GetFrames().Length > 60

Enter The Condition

You may also need to add the following using directive to the top of your page:

using System.Diagnostics;

There you have it, that break-point will only be triggered if the stack depth is over 60. You can increase it if that doesn’t give you enough information. The maximum length (the one that causes a stack overflow) is pretty enormous.

The Importance of UTF8

Text files may seem like the most fanastic, ubiqutous and simple format. Almost all programmes seem to be able to cope with reading and writing them interchangeably, which is great. They aren’t quite a simple as you might think. Back in the dark ages of computing, when space was at a premium and even text files seemed large, most people in compuging spoke English, so they thought that a single Byte would be enough to store all the letters. This lead to a nice, efficient, simple encoding called ASCII.

What happened to just ASCII

It turned out that one byte doesn’t store enough characters (it only gives you 256 and that includes lowercase and uppercase letters and all punctuation). To fill the gap, hundreds of different formats sprang up. They were all slightly different. Many of these will work fine even if you interpret them as ASCII but some won’t.

This was no good, you had to try and guess the format of any text file before you could read or understand it. A sollution was needed that could match ASCII for small file size most of the time, but could also encode all the many characters of foreign cultures (and ancient ones).

Enter UTF8

A format was concieved that used one byte for the most commonly used symbols, but used multiple bytes when more were needed. This format was called UTF8. The 8 stands for the fact that 8 bits are used most of the time. One bit in that first byte is used to indicate whether or not the character is a two big character. If it is, a second byte will need to be read to determine what that character is.

UTF8 is simple, high performance, and can encode practically any character just fine. It’s by far the most popular, so the web is using it almost exclusively.

Almost

Yes, I said almost. Web browsers don’t default to UTF8 mode, they instead try and guess by default. They guess wrong, so you have to tell them. Always add the following to the head of any HTML file you ever write, and you’ll be fine:

<meta charset="utf8" />

If you forget to do that, or think you don’t need to use UTF8, bad things will happen to you. If you (or someone you know) is responsible for writing software which either reads or writes text files, tell them to do so in UTF8 by default, and only use other formats when explicitly told they have to.

The BOM

There is one little problem with using UTF8 (usually on windows). It’s called the BOM (Byte Order Mark). In order to deal with the difficulty of guessing the file type, some windows users of UTF8 decided to add a couple of special characters to the start of each document in order to indicate that the file was UTF8. Most text editors will understand these characters and not display them. Most other computer programmes will fail when they see them. If you get a couple of wierd looking characters at the start of your document, that’s why. If you’re building a text editor, please get rid of those characters.

12 years ago

Standalone Browserify Builds

Browserify now supports standalone builds thanks to integration with my umd (universal module definition) library. Universal module definition is a simple tool to help you (as a library author) deal with the fact that all your users are probably stuck working on legacy systems with legacy build managers (such as AMD).

The goal on your part is to write your code once, in the easiest module system, and have all your users get a copy that’s “optimised” for them. You want to have one file that will work in all the build systems out there. By using the umd library or the standalone option in browserify, you get just this.

Creating a Package

Consider an example package made of two files:

beep.js

var shout = require('./shout.js');
module.exports = function beep() {
  console.log(shout('beep'));
}

shout.js

module.exports = function shout(str) {
  return str.toUpperCase() + '!';
};

If you’re in the browserify/CommonJS world, you can just require('./beep.js')(); and 'BEEP!' will get logged. If you’re not though, you’ll want a standalone build:

Using the Command Line

$ browserify beep.js --standalone beep-boop > bundle.js

Using the API

var fs = require('fs');
var browserify = require('browserify');
var b = browserify('./beep.js');
b.bundle({standalone: 'beep-boop'}).pipe(fs.createWriteStream(__dirname + '/bundle.js'));

You’ve now generated the bundle and can provide it to your users.

Consuming a bundle

Depending on the environment you’re in, there are a number of ways you can consume the package:

CommonJS

If you’re in a CommonJS like environment that’s just not quite fully browserify compatible (perhaps you’re using clever transforms), you can just require('bundle'):

app.js

require('bundle.js')();

Logs:

BEEP!

RequireJS

If you’re using RequireJS it works just as well:

index.html

<script src="require.js"></script>
<script src="app.js"></script>

app.js

require('bundle.js', function (beep) {
  beep();
});

Logs:

BEEP!

SES (Secure Ecma Script)

If you’re using Secure Ecma Script you can just call ses.makeBeepBoop:

ses.makeBeepBoop()();

Logs:

BEEP!

Global/Window

If you’re not using a module system at all, you can still access the package. If you’re in a browser it will be at window.beepBoop and if you’re in another environment it’ll be at global.beepBoop. You don’t need to be explicit about he window prefix though:

index.html

<script src="bundle.js"></script>
<script src="app.js"></script>

app.js

beepBoop();

Logs:

BEEP!

Bonus Feature

This UMD implementation is highly robust. It’s clever enough to prevent any UMD definitions inside files you’ve browserified from getting confused. They won’t see define, ses, bootstrap etc. so they’ll just use the CommonJS option (assuming they have one).

Extending

If we don’t support your chosen module system yet, I’m happy to extend things so that UMD does. Just submit a pull request for umd/template.js that adds support for your library and I’ll be sure to accept it (providing it’s not going to break any of the other sollutions.