Sunday, December 19, 2010

The Rights and Wrongs of Dynamic Pages

I was recently reading this article over at The Economist. The content of the article aside, it made one other thing come to mind: with great power comes great responsibility.

As I began reading the article, and thus scrolling, all of a sudden my screen began to look a cluttered mess. A full-screen-width bar dropped down from the top with please-for-the-love-of-God-share-this-article-on-all-your-social-networking-sites buttons and a search box. Besides being redundant since those features are all already embedded on the page, it was distracting. About the same time a square box slides up from the bottom telling me I need to subscribe to the magazine, again covering up the content of the article. Again, there is already an advertisement-like area near the top of the page offering four free issues and telling you to subscribe.

The passive versions of these features I am fine with, but the two slide-in boxes are too much because they happen as one has already begun to read, thus distracting your attention and covering up the content you're there to see in the first place. It is a very in-your-face type of pressure that most people do not approve, just like extremely loud commercials.

Okay, so if The Economist is the Comcast of internet news, what's an example of dynamic pages done right in that area? I think the New York Times does it right. Go read an article (this one I chose at random) or simply scroll through it. Nothing pops up to annoy you, everything happens on page load. The one exception is when you reach the bottom of the article a box slides in--and not over the content you're reading!--letting you know of related articles you may be interested in. This is actually helpful rather than self-serving like The Economists's dynamic content.

So, while we are all enamored with the eye candy of modern Ajax development remember to take a critical eye to it and note those who are using it well. I think I'll go read some more NYT.

Wednesday, December 1, 2010

AJAX Uploads in Django (with a little help from jQuery)

*** April 4, 2010: This post is a bit outdated and does not work with Django 1.3 final's stricter CSRF enforcement. I have a new post that is much more up-to-date, cleaner, and easier to follow, especially because it uses the github repo I created that holds the changes that need to be made to the file uploader's javascript.***

Part of my current project is creating an area where files need to be uploaded in a snazzier way that the normal "browse for a single file" sort of forms. The idea is to have a download button with the same OS chrome file dialog, but one that allows multiple files to be selected. Additionally, the even snappier part, is that HTML5's drag-and-drop functionality should also be available where supported.

After some searching I found a couple of sites that pointed me in the right direction.

  • AJAX Upload handles the client-side end of the upload process, including the required multiple-select and drag-and-drop (for browsers that support it). It even handles graceful fallback to iframe uploads for browsers that do not support those advanced features (Opera, IE, etc.).
  • On the Django side, I found this site, which gives some pointers to get it working with Django 1.2's CSRF token. The problem I encountered is simply passing the CSRF token to Ajax Upload via its params will not work, because Ajax Upload sends it in the querystring and Django expects it as POST data.

Because neither of those gave me the whole picture, I had to piece things together on my own. Here is the straight-skinny on my findings for getting Ajax Upload to work with Django.

Overview

Ajax Upload handles the client-side very seamlessly and only gives one challenge to the programmer: it passes the file either as the raw request, for the "advanced" mode, or as the traditional form file for the "basic" mode. Thus, on the Django side, the receiving function must be written to process both cases. In the raw request version, reading the request without Django blowing memory was a bit of a challenge. I at first tried reading the data into Django's SimpleFileUpload. It is an in-memory class though, so it runs into issues with large files. Next I tried reading/writing the data through Python functions, which had similar problems. The "works all the time" solution requires Django 1.3, which is to use its new "http request file-like interface" to read from the request. If you're using Django 1.2 and figure out another way to read the request data for all file sizes please comment! I discuss these solutions a little more in depth at this Stack Overflow question of mine.

Setup

First is to get AJAX Upload installed by getting the JS and CSS files wherever is appropriate and linking to them in your Django templates. Also make sure you have set up Django's file upload handlers. Next is setting it up on the site.

The Web (Client) Side

HTML

This is the HTML code that will house the upload button/drag area so place it appropriately.

<div id="file-uploader">       
    <noscript>          
        <p>Please enable JavaScript to use file uploader.</p>
        <!-- or put a simple form for upload here -->
    </noscript>         
</div>

Javascript

You probably want to dump this in the same HTML/template file, but it is up to you.

function generate_ajax_uploader( $url, $csrf_token, $success_func )
{
  var uploader = new qq.FileUploader( {
    action: $url,
    element: $('#file-uploader')[0],
    onComplete: function( id, fileName, responseJSON ) {
      /* you probably want to handle the case when responseJSON.success is false,
         which happens when the Django view could not save the file */
      if( responseJSON.success )
        $success_func( responseJSON ) ;
    },
    params: {
      'csrfmiddlewaretoken': $csrf_token, /* MUST call it csrfmiddlewaretoken to work with my later changes to Ajax Upload */
    },
  } ) ;
}

A little explanation is probably needed here.

  • I have wrapped the code inside of a function that generates the uploader because I use it on a couple different pages that require different URLs. You can easily strip off the function part and simply place it in a regular <script> block for use on a single page.
  • It is probably simplest to use the url template tag to fill in the action, which is the URL that gets the ajax data and does the server-side processing. I use the url template tag to construct the $url parameter to this function, but if you are removing the function part put the tag directly with action: or you can hard-code the URL you want as a string.
  • I use the onComplete callback to pass returned json to another "success function" that parses the json and add information to a table on the page. Again, this is not necessary, but I thought it would be useful to show how this could work. The upload plugin itself will say whether the file upload was a success based on the returned json.
  • jQuery is used to grab the appropriate part of the div. If you are not using jQuery use whatever method is appropriate for your system to get the file-uploader DOM element. Using regular Javascript you could do document.getElementById('file-uploader'), as Valum uses in the examples on his site.

Ajax Upload Modifications

I found the easiest way to get the CSRF token piece going was to modify Ajax Upload itself, unfortunately (I hate editing libraries since they need to be re-updated at each new release). Around line 1100 of fileuploader.js you will find the line "var form = ..." within UploadHandlerForm's _createForm method. Replace this line with the following:

var form = null ; 
if( params.csrfmiddlewaretoken )
{
  var csrf = '<div style="display:none"><input type="hidden" name="csrfmiddlewaretoken" value="' + params.csrfmiddlewaretoken + '" /></div>' ;
  form = qq.toElement('<form method="post" enctype="multipart/form-data">' + csrf + '</form>');
  delete params.csrfmiddlewaretoken
}
else
  form = qq.toElement('<form method="post" enctype="multipart/form-data"></form>');

All this code does is search for the CSRF token, and if it is present insert it into the form in the way Django expects to receive it.

The Server (Django) Side

Django URLs

It is best to have two views for this setup to work: one to display the upload page and one to process the upload file. First, the URLs

url( r'/project/ajax_upload/$', ajax_upload, name="ajax_upload" ),
url( r'/project/$', upload_page, name="upload_page" ),

Views

First is the upload_page view, which is going to display the page with which the user interacts. This is a simple skeleton, add whatever your template needs.

from django.middleware.csrf import get_token
def upload_page( request ):
  ctx = RequestContext( request, {
    'csrf_token': get_token( request ),
  } )
  return render_to_response( 'upload_page.html', ctx )

Next is the view to handle the upload. Remember that this code must handle two situations: the case of an AJAX-style upload for the "advanced" mode and a form upload for the "basic" mode.

def save_upload( uploaded, filename, raw_data ):
  ''' raw_data: if True, upfile is a HttpRequest object with raw post data
      as the file, rather than a Django UploadedFile from request.FILES '''
  try:
    from io import FileIO, BufferedWriter
    with BufferedWriter( FileIO( filename, "wb" ) ) as dest:
      # if the "advanced" upload, read directly from the HTTP request 
      # with the Django 1.3 functionality
      if raw_data:
        foo = uploaded.read( 1024 )
        while foo:
          dest.write( foo )
          foo = uploaded.read( 1024 ) 
      # if not raw, it was a form upload so read in the normal Django chunks fashion
      else:
        for c in uploaded.chunks( ):
          dest.write( c )
  except IOError:
    # could not open the file most likely
    return False

def ajax_upload( request ):
  if request.method == "POST":    
    # AJAX Upload will pass the filename in the querystring if it is the "advanced" ajax upload
    if request.is_ajax( ):
      # the file is stored raw in the request
      upload = request
      is_raw = True
      try:
        filename = request.GET[ 'qqfile' ]
      except KeyError: 
        return HttpResponseBadRequest( "AJAX request not valid" )
    # not an ajax upload, so it was the "basic" iframe version with submission via form
    else:
      is_raw = False
      if len( request.FILES ) == 1:
        # FILES is a dictionary in Django but Ajax Upload gives the uploaded file an
        # ID based on a random number, so it cannot be guessed here in the code.
        # Rather than editing Ajax Upload to pass the ID in the querystring, note that
        # each upload is a separate request so FILES should only have one entry.
        # Thus, we can just grab the first (and only) value in the dict.
        upload = request.FILES.values( )[ 0 ]
      else:
        raise Http404( "Bad Upload" )
      filename = upload.name
    
    # save the file
    success = save_upload( upload, filename, is_raw )

    # let Ajax Upload know whether we saved it or not
    import json
    ret_json = { 'success': success, }
    return HttpResponse( json.dumps( ret_json ) )

And that's it, go have some fun!

***Edit: A few errors in the source have been fixed, thank you for your comments!

Monday, November 29, 2010

Accessing guest's Apache server from the host in Virtualbox

To clarify the title of this post, I am running the latest Virtualbox (3.2.6) on a Windows 7 host with a linux (Kubuntu) guest.  I am using the guest as a development box for a website I am working on and would like to access it from Win7 to test with Internet Explorer (yuck).  This is something I've been wanting to get to work for a while but never really looked into it. Today I decided to break down and investigate.  Turns out it is a fairly simple process.

Option 1 - Port Forwarding

I first found this blog post that discusses setting up the proper port settings for Virtualbox.  I tried this and ran into some errors, including the one from the first comment.  Just copying and pasting from the site I started up VB and immediately got the errors.  I realized this is because I am using one of the Intel network adapters, not PCNet as in the example.  The secret here is figuring out the appropriate device name to use in the paths given by the post.  The way I found these was to look at the VBox.log file for the appropriate machine.  In it you will find many lines, but you're looking for level 2 devices that look something like this:

[/Devices/i8254/] (level 2)

In this case it is the device for the Intel network adapters.  So, adjusting the code from the blog, you would run each of these at the command prompt...

VBoxManage setextradata  YourGuestName "VBoxInternal/Devices/i8254/0/LUN#0/Config/apache/HostPort" 8888
VBoxManage setextradata YourGuestName "VBoxInternal/Devices/i8254/0/LUN#0/Config/apache/GuestPort" 80
VBoxManage setextradata YourGuestName "VBoxInternal/Devices/i8254/0/LUN#0/Config/apache/Protocol" TCPShutdown

One thing to be careful of here is the already mentioned errors.  These will prevent you from starting up the VM at all.  The most common error is verr_pdm_device_not_found, which stems from having the incorrect device for the current settings, so using pcnet in the above code but having and Intel adapter chosen in the settings, or having an incorrect name in the path, say you missed the 'i' in i8254.  The way to fix this is close out of VirtualBox--VMs and the program itself--and edit the xml configuration file for the machine.  It will have <ExtraDataItem> entries near the top to correspond to the settings you changed above.  Delete these three lines, startup VB, and your VM should now be able to start itself.

The blog goes on to say you should then use localhost:8888 to access the web server but I never got this to work.  I admit I did not try very hard, so the solution here could be a simple fix.  I'm guessing it is some issue with Windows 7 networking handling the localhost, which is not something I felt like messing with, so I found a different solution.

Option 2 - Bridged Adapter

The network settings on my guest were to set it to connect via NAT.  Changing to bridged, I was then able to browse to the guest's IP in my host (192.168.1.5 in this case) and everything worked fine.  Much simpler and no messing with VBoxManage.

Option 3 - Host-only Adapter

The Host-only Adpater will also work but it will not give outside internet access to your guest.  The plus side is that no physical network device is required, but for me it is hard to imagine a computer without a network device in this day and age.



Overall Option 2 is the best as I see it: simple and effective.

Monday, November 8, 2010

Github: too many forks?

I recently created a Github account (check me out!) and have become familiar with the site because it hosts a lot of Django-based modules.  At first blush it seems like a programmer utopia:  the code is open source and easily viewable, forking is encouraged and there is a somewhat intuitive interface for following them, wikis and issues are automatically created for each project, and, thanks to git, forks are (usually) easy to merge.  There's a lot of users, a lot of activity, and a lot of prolific and talented programmers on github. 

So what's not to love?

Really, Github does seem to be the cat's meow of the open source world at the moment, with a nod to BitBucket, where Mercurial-based projects are hosted.  As I've said I've only recently begun to use Github, so I've discovered it has some significant oddities for someone just joining the ecosystem.  I'm surprised there hasn't been more discussion of this, a quick search of the interwebs only really turned up this post by Andrew Wilkinson.

I'll summarize his points and then add some spice.
  • Coders are "rock stars" that are emphasized over their projects and the interface is designed from a contributor point of view.  He goes on to later point out that projects of any decent size typically have more users than contributors, so the interface is a bit quirky for someone just wanting to use the project.
  • Each fork gets its own issues and wiki, making it confusing where to discuss the project.
  • Determining which fork to use is not trivial
The interface issues of the first I don't have much of an issue with because it sounds like Github is working on it; they've even made some recent improvements on this front.

The second is definitely a problem, but I think is a necessary one.  A fork potentially has its own features and bugs, so these need to be put somewhere.  However, anything not specific to the fork should be put in the wiki/issue tracker/whatever of the main project.  I think this would be less of an issue if the next point were fixed.

Finally, I come to the heart of my Github confusion:  which fork does one use?  By 'use' I mean either fork and contribute to or, as a user, download and install.  The developers have said they are working on a way to better identify the "main" project, but the solution is yet to be seen.

I'll show how I find the "main" project, and then examine the issue itself.  There are two "find main" methods that probably need to be used together.  The Network graph (read up on it here, you really need to understand these graphs to understand the my later figures) is not the place I start because it is relative to the current project, it will be used in a bit.  What I mean by "relative to the current project" is, if you're looking at a project that is say the fork of the original project, all the commits for the original up to the point that the fork occurred are put into the forked project's timeline.  Thus, I first try to find the "grandfather" or original project that started the chain.  I do this by following the "forked from" links until I get up to the one that is not a fork of another.

Currently looking at dcramer's version of the project, a fork of robhudson's, which happens to be the grandfather.




After getting to the grandfather, I then look at the Network graph, which I consider clearer now because all forks show only their own commits.  Typically the grandfather is the "main" version of the project, but occasionally a grandfather will become dormant and a fork will become the main line of development.  Look to see whether the grandfather is still being updated, forked, and other forks are merging back.  If not, see if another fork has taken over this position.  If not, find the fork with the bug fixes and improvements that seem best, as it will probably become the "main" version as other people make the same conclusion as you (hopefully).  Another thing that may be helpful is checking the number of watchers of a project/fork, this can give you an idea of its popularity.

Okay, so hopefully you can see this is about as clear as mud and a rather inexact science.  Wilkinson's "rock star" description is apt:  projects are first identified first by the coder--robhudson's django-debug-toolbar--rather than the project itself.  Admittedly this makes sense based on how git and forks work, but it leaves the interface muddied.  Whom do I trust?  robhudson or dcramer?  Side note: I'm glad people generally use their names or sensible nicknames as identifiers, if "l33tskillz393" had a fork I don't think I'd even give it the time of day.

To further clarify lets look at some pictures of the Network graphs for a couple of projects I've looked at recently.

(django-pagination)

What I have identified as the grandfather and main branch is the line on the top.  There are more forks not shown here, but none below hgrimelid's have any "recent" commits.  It is pretty clear here that the grandfather branch is the "main" version of the project:  past forks have either died or merged their changes back in (merges can be seen on the blue and neon green lines in the upper left).  There are some recent forks off the latest grandfather commit, possibly with important bug changes or features, so it makes the decision a bit less clear.  Go with the main branch and assume important changes will be merged in a future version, or go with a fork and hope it doesn't turn into a dead end?

Lets look at another with a slightly different situation.

(django-sorting)
The grandfather is again on top.  But this time the grandfather looks a bit outdated, no updates in months!  Looks like a very active project though, plenty of forks--and forks of forks!--being made and updated.  I do not think there is a clear choice.

Github has highlighted an unforeseen problem with distributed version control when the participants aren't under some guiding light, such as working for the same company.  The traditional model of a project is that there is some entity--a person, a committee, a company--that determines a project's versions, features, etc.  Distributed source control may be used to develop the project, but at some point someone says "this is the next version" and everyone trusts that authority.  This can be seen commercially in, say, how Microsoft releases new versions of Windows every so often.  In the more complicated world of distributed version control, look at what git was created for developing in the first place: the kernel.  It gets all kinds of forks and such but in the end the idea is that they get merged back in to the mainline kernel, Torvalds baptizes it, and distributions push this authoritative version out to users.  In this traditional model, users and programmers really only care about the project as a whole, not the forks that went into it.

Github flips this on its head, I assume because of the "rock star" approach.  Each programmer is given equal stage and there is no definitive project.  Without an authority people go on their merry way and we get these spider webs of Network graphs.

I'm going to eagerly await Github's "find the main branch" solution.  My admittedly rather warm-and-fuzzy suggestion is to strongly encourage merging forks.  Traditionally forks have been a big deal because they split a project's development due to legal reasons or vision disagreements or whatever.  But the assumption is there is no other recourse and no reconciliation.  Perhaps this trained behavior is one reason for the lack of merges?  Anyway, forks usually make significant changes to a project that are not necessarily meant to play nice with the original project.  In the Github world forks are THE way to update projects.  Thus forks are not splits, they are the way you do even small-scale things like fix bugs and add minor features.  These are things that should bubble up to the main project, not languish in a soon-to-be-forgotten fork.  It seems from my own experience that people fork, fix the bugs they need for their personal use, and then forget about the project altogether.  If you look at the figures I've provided you see an abundance of forks, but merges are rare.

If Github could convince all the forkers to be mergers it would be a much happier place.

Thursday, June 10, 2010

Traders don't know jack (about technology)

As I was driving home today I caught a story on NPR about the practice of high frequency trading, or HFT. For those of you who don't know, HFT in a nutshell is some (supposedly, we'll get to this in a second) clever guys writing custom software that makes stock trades in microseconds based on trading volumes and such. The idea is to make fractions of a penny billions of times. It is an interesting topic and for more you should listen to the story.

What caught my ear though is they begin talking about the new server farm the NYSE is building in New Jersey that will handle all its transactions. The story then goes on to talk about how the HFT people are falling over themselves trying to get office space as close to the NYSE building as possible or, even better, inside the building itself. It is explained that the closer to the NYSE servers these guys get the faster their programs can communicate with the NYSE and thus lead to better informed trades.

If you're a computer scientist something should sound amiss there. Closer to the NYSE servers makes your trades go faster? Buzzzz wrong! This is most definitely true if you have a direct connection to the NYSE but I am assuming these guys don't, they're using the internet like the rest of us. This means their information packets have to go from their computers, out to their service provider to get routed, to bouncing around the interwebs, and finally come shooting back to the NYSE which is only a few feet away from the original computer. Cell phones work in a similar manner, so if you're not a tech person and want a more concrete example get someone in the same room as you and call their cell phone from yours. Note the delay between when the person says something and when you hear it coming through the phone. It takes time for the voice to travel from the phone, to a tower, go through some routing and other towers, and then to the other phone, which is analogous to how internet traffic bounces around.

So, Wall Street proves its stupidity yet again. I think I'll invest with the guys that are building their office next to their ISP.

Monday, May 10, 2010

Simple Page Navigation -- why is it so hard?

I am a big fan of Men's Health but have come to really loathe their online articles. I began looking at this article and the first page required me to scroll a bit to read the text. One would think the "next page" sort of links would be at the bottom thanks to the unofficial convention amongst major web sites. This is not the case however, the next/previous buttons are at the TOP of the article, requiring you to first scroll to read the page (this is okay in my book despite the no-scrolling zealotry) then scroll back to where you were to move to the next page. Terrible HCI design. There needs to be plenty of room for creativity on the web, but I'm afraid the art departments are still winning the battle over the (probably non-existent) user experience folks.

I've been noticing poor navigation on plenty of other sites too so I'm not trying to pick on MH. The worst offenders seem to be "top 10" sort of lists. If I get the time, I'd like to dive into this a little more.

Sunday, March 28, 2010

Have Games Gone Anywhere in the Last Decade?

I was just browsing through the newest issue of PC Gamer and came across this month's "10 Years Ago" feature. It is a small column that gives a little overview of what was going on in the PC Gamer issue from a decade ago. The thing that struck me is this particular issue had the "PC Gamer Readers' All-Time Top 50" list. And guess what were the top five?

Half-life, StarCraft, Diablo, Warcraft II, and Civ II.

Other than the fact that 3 of the top 5 are Blizzard games (time to buy some Activision stock) something struck me: these are pretty much the games everyone is playing TEN YEARS LATER. These games are still played heavily and their sequels (including World of Warcraft from the Warcraft series) are some of the most played and highest selling games in recent gaming history. And what are some of the most hotly anticipated games in the pipeline? Starcraft II, Diablo III, the next WoW update, Civ IV, and I'm sure there will be more Half-life 2 updates and sequels at some point.

So what does this say about the games industry and we as gamers? It seems the games industry has become pretty much the same landscape as Hollywood: unimaginative blockbusters are rolled out based on existing franchises to bring in the dough while smaller indie titles that show that spark, magic, and imagination that makes a good game great rarely get the attention they deserve.

Friday, January 8, 2010

Security Done Wrong

If you know much at all about computer security, and more specifically authentication and passwords, you know that the problem is immensely difficult. You have to deal with the play between impossible-to-remember strong passwords and easily-remembered-but-easily-cracked user created ones. It is not an easy problem to solve and it seems no traction has been made on this in recent years.

I did come across a case where someone has taken security to a ridiculous extreme. I was trying to order some transcripts from the University of Louisville and I haven't been a student there since 2002. At first it didn't seem so bad: I was able to drudge up my student ID number to get my username and then got my password reset. Great, that was easy, now just login and request a transcript... Nope. I have to have a PIN to login to the registration system, not my regular password that is used for everything else. No recovery option available either. Ugh. Grabbed the phone and called the registrar to get it straightened out. The man was very friendly, took my student ID and some authentication information. Good, sounds like he can help.

"Now the PIN is a 6-digit number that you made up when you became a student. If you were going to make a 6-digit number what would it be?"

My mind went blank. Ummmm?! "Maybe it was my birthday?"

"No sir, that is a date." This is when I knew I was in trouble. I tried the 6-digit form of my birthday and no dice.

"I really have no idea what it would be."

"Well then you will have to mail in the transcript request form."

"There's no way of retrieving or resetting it?!"

"No sir it was set by you and we cannot do anything to it."

What a terrible system. If you are a student using the system every semester to register and such (assuming registration uses the PIN) this would be fine. But is it a realistic expectation that I will remember a number I made up 8(!) years ago from a totally different place in my life?

"Sorry Mr. President we can't launch the Earth saving device until we get your identity authentication passphrase! You set it 45 years ago when you turned 18 and it's impossible to recover. Now what is it before humanity is destroyed?"