Archive for January, 2010

The Hype Surrounding Apple’s iPad

Thursday, January 28th, 2010

Geez, Apple’s hype machine is ridiculous. Pretty much every other news story and tweet I’ve been reading had some sort of reference to the iPad. Just take a look at TechCrunch and Techmeme over the last day and a bit – I’m hoping no company was stupid enough to release anything that day. :P

Building a Web Scraper Using JavaScript and jQuery

Monday, January 25th, 2010

When building web applications, sometimes there’s a need to fetch data from other sources. Perhaps you’re building a custom RSS feed of news items based on your interests or you want to aggregate data from several sites. In any case, it’s not always possible to do this elegantly; you may not have direct access to the raw data and an existing API may not exist. For these situations there’s one general (albeit fragile) solution: manually parse the end result when a page is loaded in a browser.

There are many different ways to build a web scraper. A server-side language such as PHP will have a much easier time as there are less limitations and existing libraries (such as cURL) to accomplish the task. I’ll be using JavaScript (with the jQuery library), however, as I want a standalone client independent of server technology.

Let’s start with a random snippet of code:

1
2
3
4
5
6
7
8
9
10
11
12
function fetchPage(url) {
    $.ajax({
        type: "GET",
        url: url,
        error: function(request, status) {
            alert('Error fetching ' + url);
        },
        success: function(data) {
            parse(data.responseText);
        }
    });
}

The above function attempts to make an AJAX call to a specified web page, fetch the HTML of said page and pass it to a parsing function. Simple enough, right?

Unfortunately there’s a little problem with this. Client-side programming languages have to deal with something called the same origin policy which basically restricts scripts from accessing external domains. From a security standpoint this is obviously a good thing; however it can be a headache for web app. developers (indeed, there are ideas such as the origin HTTP header to solve this). (The following bit can be ignored if the scraper is on the same domain as the data source – but in such a case why is said scraper even necessary?)

In this case, there are a couple of solutions (that I can think of). Both of them take advantage of the fact that JavaScript can read JSON even if it is located on a different domain.

  • The first one is writing a complementary server-side script that takes a few arguments (such as the target URL), makes the call, parses the result into JSON format and passes it back to the calling function. It’s a simple idea, but I never really liked it because it introduced a major dependency (which, in my case, can be a pain to deal with). However, it’s definitely a viable solution which works extremely well.
  • The second is using an existing system such as Yahoo’s YQL to fetch the required data and return it in a structured form. This is the method that I’ll be using.

YQL is an interesting little beast. From their site:

The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint.

I haven’t had too much time to look into it so I do what all programmers do: Google the living crap out of a problem to find a solution. :P In this case, Chris Heilmann has a great post on how to load external content via. various methods including YQL. To simplify things, James Padolsey wrote a plugin that detects an external AJAX call and passes it to YQL automatically.

Anyway, that solves the problem of fetching data from an external source. All that’s left is extracting the relevant pieces for whatever application is being built. Consider the following example:

1
2
3
function parse(data) {
    alert($(data).find("h1").text());
}

The parse() function takes the responseText data from the earlier fetchPage() function and then proceeds to pick at it slowly and painfully. What’s really cool about it is that pretty much all of jQuery’s selectors can be used to select relevant data. In the above case, I’m trying to extract text inside the first <h1> tag found on a page and outputting it as an alert to the browser. Obviously there are more complex uses for this but they are outside the scope of this post. :P

And…that’s it! :D Those two code blocks combined pretty much handles all of the grunt-work in extracting data from external sources. Again, this is a rather fragile solution – it can break if the target page’s HTML changes (one possible solution is to pick unique identifiers or classes). In addition, some sites may have restrictions on their data – but I’m sure everyone reads those long-winded documents. ;)

“What’s Your Fastest Text Input Tool?”

Sunday, January 24th, 2010

Here’s an interesting post on Lifehacker: What’s Your Fastest Text Input Tool?.

I’ve been using full-sized keyboards for a *very* long time now so that blows everything else out of the water for me. When I’m motivated I can type at speeds exceeding 100 words per minute, although that’s typically not necessary with the stuff I do.

I’d rank pen and paper as second. It’s pretty much a necessity in university when you’re taking down notes and whatnot. Although some people can get away with using laptops in class, I find that it’s extremely hard to copy out diagrams and mathematical equations. Perhaps this is some indication that I should learn LaTeX or something? :P

Next up would probably be a close race between my Wacom tablet (using Window’s handwriting recognition software) and my iPod Touch. I’m still rather slow at both, although that might just be because of a lack of practice.

In last place is a standard phone with a three-letters-per-number keypad. Yeah. I don’t use my phone very often and I text even less. :P

Fairy Tail (Chapter 167)

Tuesday, January 19th, 2010

Looks like the latest chapter of Fairy Tail came out a day or so ago (geez, MangaStream is ridiculously fast when it comes to releasing stuff – it didn’t appear on One Manga until much later).

Links:

Anyway, I really like how Gajeel is desperately looking for a cat. He must have gotten pwned pretty badly with all those scratches on his face. ;) Something tells me there’s more to them than meets the eye; the conversation between Charle and Happy raises a few bells.

We see Mystogan reveal some of his history with Wendy as well. I’m pretty sure this is the first time he’s done so in the series. :P It kind of reduces the impact of Gildarts’ arrival in the previous chapter though.

Speaking of Gildarts, I find it interesting to see that he has Ivan Deyar listed under “dislikes” on his guild card at the beginning of the chapter. There must have been some interesting things going on between them during their time together at Fairy Tail.

It looks like we’re in for a new plot arc as well, although it seems kind of sudden. What’s with this “Anima” that Mystogan was talking about and how can it destroy Magnolia before everyone can escape? Eh, we’ll see…

The “A” in “AJAX”

Saturday, January 16th, 2010

Consider the following code snippet using JavaScript (with the jQuery library):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function blah() {
    var pagedata;
    $.ajax({
        type: "GET",
        url: "test.php",
        error: function(request, error) {
            alert("Error: " + error);
        },
        success: function(data) {
            pagedata = data;
        }
    });
    return pagedata;
}

Looks reasonable enough. The purpose of the function is to make a GET request to some page (“test.php” in this case), store the result in a temporary variable and return said variable (of type string).

However, it turns out that this doesn’t work too well. When the function is called, chances are pretty good that the returned string is empty. Why? At first I thought it was because of some weird variable scoping rules in JavaScript, but that didn’t really make too much sense.

It turns out that I forgot about the “A” in “AJAX” (i.e. Asynchronous JavaScript and XML). The function above doesn’t execute in a top-bottom manner; the $.ajax() call is made in parallel with the rest of the script. Generally this isn’t too much of a problem (indeed, it’s a valued property by those making web applications) but in this case it isn’t what I want.

Anyway, there’s a simple solution to this. What I want to do is make the call synchronous with the rest of the script. Fortunately, jQuery makes this extremely easy in the $.ajax() method by providing an option (async: false) to turn this off. Thus:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function blah() {
    var pagedata;
    $.ajax({
        async: false,
        type: "GET",
        url: "test.php",
        error: function(request, error) {
            alert("Error: " + error);
        },
        success: function(data) {
            pagedata = data;
        }
    });
    return pagedata;
}

This locks up the browser while the call is being made so it’s probably not the best solution. It shouldn’t matter too much with small requests though (or requests made with the user’s consent via. the UI or something).

Avatar

Monday, January 11th, 2010

Throughout the last month or so, I’ve been hearing a lot about James Cameron’s Avatar (words such as “revolutionizing”, “awesomesauce”, “epic”, etc. come to mind). So I finally got to see it last Friday at West Edmonton Mall’s Scotiabank Theatre. I’d say the movie lives up to its hype. :P

The first thing I found out was that the entire movie is in 3-D. I was never a big fan of wearing those funky glasses (and indeed, they’re pretty annoying for anyone wearing actual glasses) but it’s a minor issue for me. Perhaps one day we’ll have true 3-D in the theatres (think holo-projectors). I can dream, right?

Anyway, the movie starts off with Jake Sully – a paraplegic (former) marine – joining the avatar program on Pandora, an Earth-like moon with its own unique environment. The goal of the program is to improve relations with Pandora’s natives so that a human mining operation can continue in peace. Throughout the movie we see Jake slowly preferring his avatar’s lifestyle to his own (Jake’s avatar is a biologically-engineered version of one of the natives, the Na’vi). One major recurring theme is how damaging human tendencies can be on an environment (in this case, making war on the natural inhabitants all in the name of profit); in this aspect I was drawing comparisons between Avatar and Princess Mononoke.

The visuals and animation were mind-blowing. I expected nothing less from a movie whose official budget of $237M makes it one of the most expensive ever made. The 3-D aspect was interesting and really brought out Pandora’s natural beauty; throughout the movie the atmosphere seemed realistic, the native species lifelike. I’d say Avatar is one of the few movies that can truly be comparable to a high fantasy novel. It definitely had an impact on my imagination in any case. ;)

Most of the major characters were well-developed. They each had their own beliefs and act accordingly no matter the consequences.  Trudy – a fighter pilot – best summarizes this when she abandons the first human-Na’vi conflict with a “fuck this” and subsequently aids Jake and the rest of the avatar team in the second. It was very possible to relate to them (which adds to the emotional factor when some of them die).

All in all, I’d say Avatar is one of those movies that everyone should watch. Heck, I wouldn’t mind watching it a second time if I can find an IMAX theatre near me. ;)