• How to use Node.js Streams (And how not to!)

    Oct 30 2019

    When I first started to understand Node.js streams, I thought they were pretty amazing. I love JavaScript Promises, but they only resolve to one result. Streams, however, can provide a constant stream of data, as you might expect!

    Functional Reactive Programming is all the rage these days. Libraries like MobX, RxJS and Highland.js make it easy to structure your front-end application as data flowing in one direction down through a chain of pipes.

    You can pipe a stream to another stream so that the output of the first becomes the input to the next. Sounds like a really neat way to structure an application, right?

    I've already rewritten a lot of my JavaScript code to use Promises. Are streams the next step in the evolution? Is it time to rewrite all our applications to use Node streams? (Spoiler: NO!)

    Unix pipes are the best

    I love working with pipes in Linux (or Unix). It's really nice to be able to take a text file, pipe that into a command, pipe the output to another command, and pipe the output from that into a final text file.

    Here's an example of using the power of pipes on the command line. It takes a text file with a list of words, sorts the list, counts how many times each word appears, then sorts the counts to show the top 5 words:

    $ cat words.txt | sort | uniq -c | sort -nr | head -n5

    It's not important for you to understand these commands, just understand that data is coming in to each command as "Standard Input" (or stdin), and the result is coming out as "Standard Output" (or stdout). The output of each command becomes the input to the next command. It's a chain of pipes.

    So can we use Node.js in the middle of this chain of pipes? Of course we can! And Node streams are the best way to do that.

    Going down the pipe

    Node.js streams are a great way to be able to work with a massive set of data, more data than could possible fit into memory. You can read a line of data from stdin, process that data, then write it to stdout.

    For example, how would we make a Node CLI application that capitalizes text? Seems simple enough. Let's start with an application that just takes stdin and pipes directly to stdout. This code does almost nothing (similar to the cat unix command):

    process.stdin.pipe(process.stdout);

    Now we can start using our Node.js application in the middle of our pipeline:

    $ cat words.txt | node capitalize.js | sort | uniq -c | sort -nr | head -n5

    Pretty simple, right? Well, we're not doing anything useful yet. So how do we capitalize each line before we output it?

    npm to the rescue

    Creating our own Node streams is a bit of a pain, so there are some good libraries on npm to make this a lot easier. (I used to heavily use a package called event-stream, until a hacker snuck some code into it to steal bitcoins!)

    First, we'll use the split package, which is a stream that splits an input into lines, so that we can work with the data one line at a time. If we don't do this, we might end up with multiple lines, or partial lines, or even partial Unicode characters! It's a lot safer to use split and be sure we are working with a single, complete line of text each time.

    We can also use a package called through which lets us easily create a stream to process data. We can receive data from an input stream, manipulate the data, and pipe it to an output stream.

    const split = require('split');
    const through = require('through');
    
    process.stdin
    	.pipe(split())
    	.pipe(
    		through(function(line) {
    			this.emit('data', line.toUpperCase());
    		})
    	)
    	.pipe(process.stdout);

    There is a bug in the code above, because the newline characters are stripped out by split, and we never add them back in. No problem, we can create as many reusable streams as we want, to split our code up.

    const through = require('through');
    const split = require('split');
    
    function capitalize() {
    	return through(function(data) {
    		this.emit('data', data.toUpperCase());
    	});
    }
    
    function join() {
    	return through(function(data) {
    		this.emit('data', data + '\n');
    	});
    }
    
    process.stdin
    	.pipe(split())
    	.pipe(capitalize())
    	.pipe(join())
    	.pipe(process.stdout);

    Isn't that lovely? Well, I used to think so. There's something satisfying about having the main flow of your application expressed through a list of chained pipes. You can pretty easily imagine your data coming in from stdin, being split into lines, capitalized, joined back into lines, and streamed to stdout.

    Down the pipe, into the sewer

    For a few years, I was really swept up in the idea of using streams to structure my code. Borrowing from some Functional Reactive Programming concepts, it can seem elegant to have data flowing through your application, from input to output. But does it really simplify your code? Or is it just an illusion? Do we really benefit from having all our business logic tied up in stream boilerplate?

    It's worse than it looks too. What if we emit an error in the middle of our pipeline? Can we just catch the error by adding an error listener to the bottom of the pipeline?

    process.stdin
    	.pipe(split())
    	.pipe(capitalize())
    	.pipe(join())
    	.pipe(process.stdout)
    	.on('error', e => console.error(e)); // this won't catch anything!

    Nope! It won't work because errors don't propagate down the pipe. It's not anything like Promises where you can chain .then calls and throw a .catch at the end to catch all the errors inbetween. No, you have to add an error handler after each .pipe to be sure:

    process.stdin
    	.pipe(split())
    	.pipe(capitalize())
    	.on('error', e => console.error(e))
    	.pipe(join())
    	.on('error', e => console.error(e))
    	.pipe(process.stdout);

    Yikes! If you forget to do this, you could end up with an "Unhandled stream error in pipe." with no stack trace. Good luck trying to debug that in production!

    Conclusions and recommendations

    I used to love streams but I've had a change of heart recently. Now, my advice is to use data and error listeners instead of through streams, and write to the output instead of piping. Try to keep the number of streams to a minimum, ideally just an input stream and an output stream.

    Here's a different way we can write the same example from above, but without all the hassle:

    const split = require('split');
    const input = process.stdin.pipe(split());
    const output = process.stdout;
    
    function capitalize(line) {
    	return line.toUpperCase();
    }
    
    input.on('data', line => {
    	output.write(capitalize(line));
    	output.write('\n');
    });
    
    input.on('error', e => console.error(e));

    Notice I'm still piping to the split library, because that's straightforward. But after that, I'm using a listener to the data event of the input to receive data. Then, I'm using write() to send the result to the stdout output.

    And notice that my capitalize() function no longer has anything to do with streams. That means I can easily reuse it in other places where I don't want to use streams, and that's a really good thing!

    I still think Node streams are interesting but they are not the future of JavaScript. If used carefully, you can make pretty powerful command-line tools with Node.js. Just be careful not to overdo it!


    Want to keep in touch? Subscribe to the Coding with Jesse newsletter!

  • Keeping a Live Eye on Logs

    Oct 22 2008

    In web development, you can get really used to getting good error messages. If you're coding in PHP, you come to rely on the web server telling you if a variable is undefined or a file is missing.

    Likewise, when you're debugging, sometimes the easiest sanity check is to output some text. You'll find me using echo "wtf?"; die; quite often when I'm trying to figure out why a block of code isn't being executed.

    But sometimes, even these rudimentary methods of developing aren't available to you. Maybe error messages are turned off on the web server you're stuck debugging. Or maybe you have a lot of Ajax requests flying around and are tired of peeking at the result of each request.

    Log files to the rescue. Nearly every framework and server has logging functionality. Apache usually logs every request, so you can see when images, static files, and other requests are made. CodeIgniter has several levels of logging, including DEBUG, ERROR and INFO. Whatever framework you use, or if you're using one you built yourself, you'll probably find some way of writing log messages to a file.

    And if you have a log file that's being updated, you have a way of watching things happen in real time. I'm not talking about opening up the log file every now and then and looking back through it, I'm talking about watching it being updated as it happens.

    Let's say you have a web server running Linux and are trying to debug something using CodeIgniter. You SSH into the web server, find the log file, and tail -f it. tail is a command which will dump the end of a file out for you. If you add the -f parameter, it will "follow" the file, printing out any new lines as they are added to the log file.

    $ cd thefutureoftheweb.com/system/logs
    $ tail -f log-2008-10-23.php
    

    If you're working locally on Windows, make sure you have Cygwin installed. (Cygwin gives you a Linux-style console in Windows so you can do awesome stuff like this.) If so, you should have or be able to use tail in the exact same way. Mac and Linux users should have tail installed already.

    Back at my old job, we used to keep console windows open using tail to watch the live web servers, especially after an update. This way we could see errors appearing to users live and could investigate and solve problems that would otherwise a) get buried in the logs, or b) go completely unnoticed.

    It can really help to get in the groove of having one or two terminal windows open tailing the logs of the servers you're currently working on. And then if you get the urge to write echo "wtf?"; die; you can instead write log_message('info', 'wtf?'); (or equivalent in your framework of choice).

    For further information, check out:

    Happy tailing!

  • Update a Dev Site Automatically with Subversion

    Jan 19 2008

    If you're using Subversion during development (and you really should be using some kind of version control system), you can wire it up so that your development site will be updated automatically every time you commit a file. And it's easy!

    Well, it's really easy if your subversion server and development web server is the same. If it's not, it's still possible, but outside of the scope of this article. You'll also want to be familiar with the command line, shell scripting and Subversion before attempting this stuff.

    The first thing is to make sure your development server is a Subversion working copy, or in other words, that you can go into the dev site folder and run "svn update" to update the site. So if you've been using "svn export" or something painful like FTP, you may need to replace the dev site with a folder created using "svn checkout".

    Okay, once you can update the dev site using Subversion, all you need to do is edit or create a file called "post-commit" inside the subversion repository, inside the "hooks" folder. If you look in that folder, there will probably be a bunch of example files like "post-commit.tmpl". These are examples of what you can do. Create the post-commit file by copying over the example, like "cp post-commit.tmpl post-commit", then edit this post-commit file.

    Inside that file, there will be some example code like:

    /usr/lib/subversion/hook-scripts/commit-email.pl "$REPOS" "$REV" commit-watchers@example.org
    

    You'll want to remove or comment out this line and stick in your own scripting. You can put any commands in here that you want to run after each commit. For example, to update your dev site, you might have something like this:

    cd /var/www/path/to/website
    svn update >> /path/to/logfile
    

    That's it!

    If you run into problems and you used the logfile like in the example, you can have a look in there are see if there are any error messages. I often have problems with permissions, so you may want to change the permissions in the dev folder (eg. chmod 770 -R *).

    This works really well when more than one person is working on a set of files. Instead of 7000 files like "file.html.backup_jesse_19-01-2008" you can just commit and see the changes instantly. It might seem annoying to have to commit files every time you make a change, but it's the same if not easier than uploading files over FTP every time.

  • Redirecting after POST

    May 22 2007

    When working with forms, we have to think about what will happen when someone clicks back or forward or refresh. For example, if you submit a form and right afterwards refresh the page, the browser will ask if you want to resend the data (usually in a pretty long alert box talking about making purchases).

    People don't always read alert boxes, and often get used to clicking OK all the time (I know I fall in this category), so sometimes comments and other things get submitted more than once.

    To solve this, you can simply do an HTTP redirect after processing the POST data. This is possible with any server-side language, but in PHP it would look something like this:

    if (count($_POST)) {
        // process the POST data
        add_comment($_POST);
    
        // redirect to the same page without the POST data
        header("Location: ".$_SERVER['PHP_SELF']);
        die;
    }
    

    This example assumes that you process the form data on the same page that you actually want to go to after submitting. You could just as easily redirect to a second landing page.

    On this site, on each blog post page, I have a form that submits to the same blog post page. After processing the comment, I send a redirect again to the same page. If you add a comment and then refresh, or click back and then forward, the comment won't be submitted twice. (However, if you click back and then click Add Comment again, it will. I really should filter out duplicates, but that's another topic.)

    This works because you essentially replace a POST request with a GET request. Your browser knows that POST requests are not supposed to be cached, and that you should be warned before repeating a POST request. After the redirect, the page is the result of a simple GET request. Refreshing the page simply reloads the GET request, leaving the POST request lost between the pages in your browser history.

  • Helping visitors with .htaccess

    May 21 2007

    When I changed all my URLs, I put in place something to email me whenever there was a 404 (page not found). This way, if I screwed up something with my forwarding, I'd know.

    It turned out that people were getting 404s mostly for one of two mistakes. Either there was spaces in the URL (an error of copying and pasting perhaps?), or there was a trailing period at the end of the URL (probably from it being part of a sentence, and the period becoming part of the URL when auto-linked).

    The two broken URLs look like this:

    http://www.thefutureoftheweb.com/blog/helping-visitors-with-htac cess
    
    http://www.thefutureoftheweb.com/blog/helping-visitors-with-htaccess.
    

    I thought I'd make things easier for people by auto-correcting these two mistakes. I added a few lines to my .htaccess file like so:

    RewriteEngine on
    
    # remove spaces in URL as a favour to visitors
    RewriteCond %{REQUEST_URI} .+\s+.+
    RewriteRule ^(.+)\s+(.+)$ /$1$2 [L,R=301]
    
    # remove trailing periods on URL as a favour to visitors
    RewriteCond %{REQUEST_URI} \.+$
    RewriteRule ^(.*)\.+$ /$1 [L,R=301]
    

    Unlike my major changes the other day, I'm not obliged to maintain these rewrites forever — after all, they are still mistakes in the URL, and I'm not giving out URLs with spaces and dots in them on purpose. However, I'd rather bring people to the correct page if I can.

  • PHP vs. Ruby on Rails: Update

    May 15 2007

    It's now been six months since I announced I had switched from PHP to Ruby on Rails. Reader Richard Wayne Garganta wrote me to ask:

    So, now that you have worked with rails a while - are you still in love with it? I tried it a while back and found out my biggest problem was deployment. I have considered returning to it. Your opinion?

    Originally, I decided to start using rails because I was getting bored doing server-side development. I asked myself, why was it so much fun to program in JavaScript, but so boring to program in PHP? I figured it might be the programming language itself that was boring me, so I took a stab at learning Ruby on Rails.

    Ruby is a really great dynamic language that is fun to work with (though somewhat tricky to get used to). Rails is a great framework, especially when it comes to using ActiveRecord to simplify working with complex data models. I think that Ruby on Rails is a great way to build a complex web site.

    Rails also makes it much easier to set up tests (which I was pretty lazy about) and have separate development and live environments. There's a lot of great solutions to common web development problems that makes Rails a lot more fun to work with, especially on big projects.

    Eventually I realised that server-side programming in Rails, while a bit easier and more fun, was still server-side programming. Even though I didn't have to worry about writing SQL and building forms, the types of problems and challenges were basically the same as in any server-side language. I realised that even a new language and framework wouldn't change server-side programming altogether. At the end of the day, I still have a lot more fun coding with JavaScript, Ajax, CSS and HTML.

    Lately, though, I've been having a lot more fun coding simple templates using PHP. There's something kind of simple and sweet about making a page dynamic by just by putting a few lines of code in a standalone template. MVC is the only way to build a large site or application that is easy to manage, but if you're doing something simple, it's definitely overkill. It's no surprise that the 37signals and Ruby on Rails web sites run on PHP.

    So the moral of the story? Use Ruby on Rails for applications, use PHP for simple web sites, and don't use either of them if your passion is client-side development. :)

  • Vanilla on Rails: The Coexistence of PHP and Ruby

    Dec 4 2006

    I'm going to debunk another myth that might keep you from trying out Ruby on Rails (or any other new server language). MYTH: Once you start using Rails, you have to do everything in Rails.

    I wanted to integrate a forum into my new Rails site. So I took a look at the Rails forums out there and found a whopping three: Beast, RForum and Opinion. Unfortunately, they all suck. Ok, to be fair, they're all rather new, and are still in development. But they still suck.

    At first, I was okay with using a crappy forum. But I didn't just need a standalone forum — I needed to totally integrate the forum into my existing site, users and all. But rails apps want to run on their own server. I don't know why, they're all claustrophobic or something. When you try to get two of them to share a server, one tends to peck the other's eyes out like it's a rails-powered cockfight.

    So after way too much time spent cleaning up feathers and blood, I decided to give one of the classic PHP forums a chance. I decided Vanilla was going to be my forum of choice, and I got to work.

    It turned out that it took me less time to integrate Vanilla with my Rails app than all the time I spent recoding routes.rb to make Beast work. Here's some tips to get a PHP app to coexist with rails:

    1. Install the forum inside /public/

      PHP apps are very happy to live inside a subdirectory inside another site. They love it. So what better place to drop one than inside a rails app's public directory. (Try doing that with a rails forum).

      This also has the benefit of having a single domain, which makes sharing cookies slightly easier.

    2. Let the rails app and php forum share a database

      This isn't a requirement, but it just simplifies things slightly. Vanilla uses a table naming convention like LMU_*, and I'm sure most other (non-rails) forums do the same, so the two can coexist fairly easily. This way you can do stuff like joins across your own tables and the forum tables.

    3. Manipulate the forum tables from Rails

      Rather than share a single 'users' table (which is pretty much the only way you'd be able to get two rails apps to coexist in a single database), just add rows to the forum's user table every time a user signs up on your rails app. And of course, don't forget to use a foreign key to relate the two.

      Vanilla makes a single sign-on feature easy through a "Remember me" feature. Basically, if 2 cookies are set ('lussumocookieone' for the UserID, and 'lussumocookietwo' for the VerificationKey by default), then the user is automatically logged in. So all you have to do is look up (or write) the VerificationKey in LMU_Users and set these cookies whenever a user logs in to your rails app.

    4. Delete PHP's cookies to destroy a PHP session

      This is a really easy hack. Just erase the 'PHPSESSID' and '_session_id' cookies, and the user will be logged out from the forum. This way you can have a single sign-out too.

    Okay, this post got pretty specific. But the lessons can be applied to the coexistence of any server-side apps. Communicate via the database and cookies, and you can pretty much do anything.

  • Switching from PHP to Ruby on Rails

    Nov 30 2006

    Normally, I try to avoid server-side programming topics. But this time, I thought I'd share my story to perhaps inspire some of you to try something new.

    I switched to working in Ruby on Rails this month. Lots of people have done the switch, and even more have written about how Ruby on Rails makes coding web sites a lot more fun and easy (it's true!). But that's not what this story is about.

    Until I switched, I was a PHP programmer. It's not just that I did a lot of work in PHP, it's that this is what clients expected of me, or so I thought.

    Unfortunately, I was getting really bored of doing PHP programming. I've never been passionate about PHP, it's just something I know that I use to get the job done. Code Igniter had gotten me excited about coding in PHP again, but it just wasn't enough.

    I've had a lustful eye on Ruby on Rails since I first heard about it a year or two ago. Yet it stayed on my long-term To Do List, never quite becoming a reality. "When I have more time, I'll figure it out and start using it," I thought. "One of these days I'll do a small project for myself in Rails so I can learn it," I told myself for months.

    I felt like I was stuck. PHP was what I knew, what I had used for years, and what I was best at. I had never used Rails, so I certainly didn't feel qualified enough to sell myself as a Rails developer. On top of this, I had three major projects coming up that were all supposed to be done in PHP.

    Then one day, I asked myself why I was so willing and excited to work in JavaScript but not PHP. Why is one fun and the other painful? I had always thought of the difference as just client versus server, but then I figured out it might just be the language itself. So I decided the only way to keep my sanity was to switch. There was no way I would keep doing this PHP thing.

    So, I got in touch with my clients and asked if they'd be willing to build the projects with Ruby on Rails instead of PHP. They couldn't care less. All they wanted was the finished project, and it didn't matter to them how it was done. One client even said the only reason she had mentioned PHP was because it seemed like the most common, but really didn't care. They didn't even mind that I was just starting to learn, because they knew it would make the project more fun for me, and they trusted me.

    So I did it. I've been coding in Rails since the start of the month, and it's been a great time. Sure, there was a learning curve. It took me some time to figure out how to do the simplest of things. But I read through the book, I experimented, I searched the web for answers, and now I'm cruising. I'm about 80% as good in Rails as I am in PHP, except with Rails everything takes half the time so in the end it's actually faster.

    So what's the moral of the story? If there's something new you want to start doing, or if you're getting bored, just go change things. Today. Create your own opportunities. And stop finding excuses in those around you for your inability to change, because there's a good chance they will totally support you.

  • Java UI Toolkits (for the Web)

    May 22 2006

    One thing that was big at JAX were these toolkits that allow Java developers to program user interfaces the way they're used to, by using libraries similar to SWT or Swing.

    Two big ones I saw were Rich AJAX Platform (RAP), based on SWT, and wingS, based on Swing.

    Now, Google has released the Google Web Toolkit (GWT). Not tightly based on a previous Java UI framework, GWT seems to be another good option for developing complex web user interfaces in Java.

    These toolkits make things a lot easier for Java developers to make web user interfaces without having to master CSS and JavaScript. Java developers can stay in Java and have web interfaces generated for them. The result will be more rich web-based applications, something we will all benefit from.

    Personally, I still have a lot of fun working with JavaScript and CSS by hand. I don't know if I'll ever start using one of these code-generating frameworks. I suspect those of us who have the patience and passion to put up with this stuff are in the minority.

  • Code Igniter

    May 14 2006

    I'm absolutely in love. While I bored was at JAX, I searched around for a PHP framework like Ruby on Rails. I already knew about CakePHP, but I wasn't convinced. I looked at a few others, but nothing caught my eye. Then I discovered Code Igniter.

    Code Igniter comes from the people who make Expression Engine. I had already heard great things about that, and I had even considered purchasing a license. Code Igniter, however, is free and open source. It's quite new (first beta was released in February) but it is incredibly professional and already very stable.

    Code Igniter does absolutely everything I want it to, and nothing I don't want it to. It's incredibly simple and clean, so there are no surprises or weird tricks. It forces you to organize your code using an MVC structure (actually, a VC structure — using a model is optional). This keeps your code cleaner and easier to maintain. It also comes with a number of libraries that help with common web development things like email and uploaded files.

    This weekend, I rewrote my whole custom-made blog code for this site. It only took about 4 or 5 hours, and it was actually fun to do. It also reduced the amount of code I had, and makes it much, much easier to maintain and change in the future. For example, until now I was too lazy to add contact pages properly, so I just added blog articles for Contact Me, etc. and pointed links at these. Now, I've changed the pages to use /contact/me and /contact/hire, and I could easily reuse my blog template. This change took about 10 minutes.

    By default, URLs are of the form /class/function/parameters. But if you want to do something different (I use /blog/2006/5/article-name), you can set up routing rules for anything you want. Actually, Code Igniter is totally flexible to let you do whatever you want. Anytime I got stuck, I poked around in the documentation and found that there was something in place specifically for my problem.

    Also wonderful: only the minimal amount of PHP is loaded to create each page. You can load classes globally, if you need them, but by default, you only load what you need when you need it. This keeps every page as fast as possible, something I was worried about with other frameworks like CakePHP.

    Okay, that's enough ranting. If you use PHP, check out Code Igniter. There are some videos you can watch to see just how easy Code Igniter really is. The user guide is also a pleasant read and explains everything really well.

  • XAMPP Review

    Apr 21 2005

    I installed XAMPP Version 1.4.13 on Windows XP. The first thing that impressed me about XAMPP was that the installation didn't require a reboot. Every other lazy program seems to need a reboot after installation, but this one doesn't. And it's server software!

    XAMPP is started in a console window. This way it's not running when you don't need it. When you first view localhost in the browser, there is an interface to let you test your installation and view demo applications. There are also links to phpMyAdmin, so it's easy to go in and start setting up your database. I'm also impressed at the performance. My laptop didn't seem to mind running the server.

    If you're doing PHP and MySQL development, try this out. It's only 25 megs, it takes no time to get up and running, and its easy to uninstall -- just delete the folder. You have nothing to lose.

  • XAMPP

    Apr 21 2005

    Though it came out a few years ago, I just heard about XAMPP, a free package combining PHP, Perl, MySQL and Apache. It comes with PHPMyAdmin, FileZilla FTP Server, Webalizer and lots more.

    The best part is, it's designed to be run on development computers. It installs into a single directory, and makes no changes to the registry or anything. It's perfect. I'm really impressed that this exists. I've always avoided working locally on my desktop, but now I'm going to give it a try.