the Future of the Web
  • Articles
  • Contact
  • Saving data to a file with PHP

    Feb 24 2008

    Lately, I've been skipping using MySQL in situations where I just want to store a few variables, like configuration options, and don't necessarily want the hassle of setting up a database.

    You can easily store data to a file using serialize and unserialize to turn a PHP object into a string, and then read and write the string in a file.

    Here are a few functions that do just that:

    function get_data($filename) {
        // create file if it doesn't exist
        if (!file_exists($filename)) {
            touch($filename);
        }
    
        return unserialize(file_get_contents($filename));
    }
    
    function get_option($filename, $key) {
        $data = get_data($filename);
        return $data[$key];
    }
    
    function set_option($filename, $key, $value) {
        $data = get_data($filename);
        $data[$key] = $value;
    
        // write to disk
        $fp = fopen($filename, 'w');
        fwrite($fp, serialize($data));
        fclose($fp);
    }
    
    // probably should put somewhere off the web root
    $config = '../config.dat';
    
    set_option($config, 'width', 1024);
    echo get_option($config, 'width'); // will echo 1024
    

    So there you have it. Feel free to use or modify this code as much as you like. If anyone has an idea for rewriting it to be cleaner, please share in the comments.

    View 6 Comments | Add a comment
  • Easy web scraping with PHP

    Feb 17 2008

    Web scraping is a technique of web development where you load a web page and "scrape" the data off the page to be used elsewhere. It's not pretty, but sometimes scraping is the only way to access data or content from a web site that doesn't provide RSS or an open API.

    I'm not going to discuss the legal aspects of scraping, as it may be considered copyright infringement in some situations. However, there are also perfectly legal reasons to need to scrape, like if you have permission.

    To make things really easy, we're going to let the power of regular expressions do all the work for us. If you're not familiar with regular expressions, you may want to google for a tutorial. Here is the documentation for PHP regular expression syntax.

    First, we start off by loading the HTML using file_get_contents. Next, we use preg_match_all with a regular expression to turn the data on the page into a PHP array.

    This example will demonstrate scraping this web site's blog page to extract the most recent blog posts. This is just for demo purposes - of course, the RSS feed is much better suited for this.

    // get the HTML
    $html = file_get_contents("http://www.thefutureoftheweb.com/blog/");
    

    Here is what the HTML looks like for the blog posts:

    <ul id="main">
        <li>
            <h1><a href="[link]">[title]</a></h1>
            <span class="date">[date]</span>
            <div class="section">
                [content]
            </div>
        </li>
    </ul>
    

    So we will use a regular expression that looks for all the li elements and capture the content using parentheses at the appropriate places (link, title, date & content).

    preg_match_all(
        '/<li>.*?<h1><a href="(.*?)">(.*?)<\/a><\/h1>.*?<span class="date">(.*?)<\/span>.*?<div class="section">(.*?)<\/div>.*?<\/li>/s',
        $html,
        $posts, // will contain the blog posts
        PREG_SET_ORDER // formats data into an array of posts
    );
    
    foreach ($posts as $post) {
        $link = $post[1];
        $title = $post[2];
        $date = $post[3];
        $content = $post[4];
    
        // do something with data
    }
    

    There's a lot going on inside that regular expression, but there are really only a few "tricks" that are used. Anytime I want to say "skip over whatever is between" I use .*?. And any time I want to say "match whatever is in here" I use (.*?). And lastly, the s at the end tells PHP to allow the dot . to match newlines. That's about all there is to it.

    The regular expression will only match blog posts, because they are the only <li> elements that contain an <h1>, <span class="date"> and <div class="section">.

    Web scraping is highly unreliable - if the HTML structure were to change this code would break instantly. However, it's often quite easy to write this code, and usually produces a perfectly usable hack solution.

    View 113 Comments | Add a comment
  • See all the articles

    Feb 12 2008

    I've just added a new page where you can see a listing of all the articles I've written (this article is my 181st). This might be an easier way to see older articles than going page by page or month by month. Check it out: All Articles

    View 4 Comments | Add a comment
  • IBM: Where and when to use Ajax

    Feb 6 2008

    My second IBM developerWorks article is now online: Where and when to use Ajax in your applications.

    It's not a very technical article, so you can read it even if you've never programmed before. I talk about the benefits of using Ajax, and point out some problem areas that need special attention so that Ajax doesn't end up ruining your web site. It's essentially a summary of my Unobtrusive Ajax book.

    The article was fun to write and I hope you enjoy reading it!

    View 5 Comments | Add a comment
  • Code Igniter 1.6.0 Released

    Feb 4 2008

    The long-awaited new Code Igniter version just came out last week - check out the announcement, the download, the Change Log, and (if you're updating) the update instructions.

    I'm just installing it now, but looking at the Change Log it seems it'll be a really stable release, considering how stable the last release was.

    View 2 Comments | Add a comment

  • Jesse Skinner

    Jesse Skinner
    • About Me
    • Email Me
    • RSS Feed RSS Icon
    • @JesseSkinner
  • Recent Articles

    • Free eBook: Unobtrusive Ajax
    • Official jQuery Templating Plugin
    • jQuery Live Events
    • buttons need type="submit" to submit in IE
    • Use Arrays in HTML Form Variables
    • 5 Reasons Freelancers Can Succeed in a Shrinking Economy
    • Keeping a Live Eye on Logs
    • Using PHP's empty() Instead of isset() and count()
    • Testing Web Pages with Lynx
    • Stop CSS Background Flickering in Internet Explorer 6
    • See All...
  • Categories

    • javascript (41)
    • about (17)
    • links (17)
    • web (14)
    • html (12)
    • server (11)
    • css (8)
    • browsers (8)
    • carnival (7)
    • work (5)
    • design (4)
    • seo (4)
    • ads (4)
    • standards (4)
    • events (4)
  • Older Articles

    • October 2010
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • August 2005
    • April 2005
    • See All...
Copyright © 2013 The Future of the Web