footnotes

HTML5!1

1. Has more footnotes than any other implementation. Evar.

(screencap from pluploader)

Drags a droppin' and events a bubblin'...

In fiddling more and more with whiz bang HTML drag and drop (in safari 4.x and Firefox 3.5), some things caught me by surprise, primarily because I had already had an idea about "how drag and drop works" that wasn't from the web world. Specifically, in BrowserPlus we invented a very simple model for a web developer to express interest in capturing desktop sourced file drags. Our model was motivated more by ease of implementation and simplicity than by deep adherence to the "precedent" set by browser vendors. At that point there wasn't all that much in the way of precedent....

Anyhow, I wanted to document the way both browser native DnD works and BrowserPlus, if for nothing else as a note to myself. Let's begin with a live sample that you might enjoy if you're on a late model Safari or Firefox (untested elsewhere, YMMV).

A DnD Sample

So pick up a file from your desktop and hover it on over the sample. Notice a couple things:

  • Drag events propagate along the node hierarchy - if you hover over blue, the drag handle attached to yellow (his parent) is receiving the drag. Blue himself has no handler.
  • Overlapping non-descendants can block events - Red is yellow's sister. She has her own drag handler set in order to update the status display, but if she did not, we wouldn't see an event in yellow when hovering over the area in red that overlaps with yellow.
  • Your JavaScript must handle bubbled enter/leave events from children - If you have a node that you wish to be a drop target with any number of visible children, you must handle the fact that a transition from the target (yellow) to the child (blue) would result in an enter in the latter followed by a leave in the former.

What would BrowserPlus do?

Again, realize that the Drag and drop implementation in BrowserPlus, for better or worse, was driven by two key goals:

  1. satisfy the real world requirements of our users.
  2. develop something possible to implement in some sane fashion (from the other side of a plugin API, btw).

Given this, it's pretty simple to explain the BrowserPlus model - Any node that is designated as a "drop target" can receive drop events, any node that isn't is 100% transparent. So if you were to designate only yellow above as a drop target, then the drop behavior would be identical regardless of the existence of blue and red.

Unorganized thoughts...

With HTML DnD you have to think too hard. In every implementation I've seen that leverages drag and drop, there's a *lot* going on inside the "drop area". So each "typical" web app that uses DnD needs to understand how to effectively make a node "drop transparent". Restated: the HTML interface to drag and drop is very powerful and flexible, but, unless I'm missing something, it makes the simple case way too hard. Here's a good example of world class UI that leverages drag and drop (yeah, I'm biased):

There's a containing div and a bunch of stylized descendants contained within. Essentially what we'd want is some simple way to mark all of them "drop transparent"...

Allowing file drops defies user expectation. For years now we've been dropping files on our browser to *load* them (Safari is a great PDF reader). Now with the ability for web-pages to capture our drops, we've got to work harder to prevent poor usability and confusion... What do I mean when I drop 'Hilaiel L 2008 Taxes.pdf' on my browser window? Specifically, 1mm can separate attaching a photo to your email and displaying that photo and discarding your email (most sites will ask for user confirmation, that's one simple way to mitigate this confusion). There isn't a great answer here, I can see.

Finally, I realize not all of this is new, but it's new to me. I eagerly welcome simple code samples which robustly implement the dumb and simple BrowserPlus model, using HTML DnD...

--ll

Would the real "HTML5" File please stand up?

In fiddling around with HTML5 desktop sourced drag and drop, present in Safari Version 4.0.3 (6531.9), I’m faced with the interesting challenge of understanding when I can trust that a drop is really a drop – that a File is the result of user interaction. For a little context, here’s a bit of code cobbled up by Gordon Durand that’ll let us capture desktop sourced drops in the latest snow leopard:

<html>
<head>
<script>

function dodragenter(event)
{
  document.getElementById("output").textContent = "Drop it!  I dare you!";
}

function dodragleave(event)
{
  document.getElementById("output").textContent = "";
}

function dodrop(event)
{
  var files = event.dataTransfer.files;
  var uris = event.dataTransfer.getData("text/uri-list").split("\n");
  var msg;
  msg = "File Count: " + files.length + "\n";

  for (var i = 0; i < files.length; i++) {
    msg += (" File " + i + ":\n");
    msg += ("\tfileData.fileName: " + files[i].fileName + ", fileData.fileSize: " + files[i].fileSize + "\n");
    msg += ("\turi: " + uris[i] + "\n");
    msg += "\n";
  }
  document.getElementById("output").textContent = msg;
}

</script>
</head>
<body>

<div id="output" style="min-height: 100px; white-space: pre; border: 1px solid black;"
     ondragenter="event.stopPropagation(); event.preventDefault(); dodragenter(event);"
     ondragover="event.stopPropagation(); event.preventDefault();"
     ondrop="event.stopPropagation(); event.preventDefault(); dodrop(event);"
     ondragleave="dodragleave(event)">

</body>
</html>

What’s a file look like?

So the first question is, in what ways can we introspect a javascript object? Here’s an interesting start:

// examine a supposed File object and return its vitals
function getObjectInfo(o)
{
  var msg;
  msg = "typeof(o):     " + typeof o + "\n";
  msg += "o.toString():  " + o.toString() + "\n";
  msg += "String(o):     " + String(o) + "\n";
  if (o) msg += "o.constructor: " + o.constructor + "\n";
  if (o) msg += "o.prototype:   " + o.prototype + "\n";
  if (o)  msg += "o members:     ";
  var comma = "";
  for (var m in o) { msg += comma + m; comma = ", "; }

  msg += "\n\ncontents: (" + o.fileName + " is " + o.fileSize + " bytes)\n";

  return "<pre>"+msg+"</pre>>";
}

That is, we’re checking .constructor, .prototype and the members of this magical file object. Here’s what a “real” file object looks like, one caught by a drag:

typeof(o):     object
o.toString():  [object File]
String(o):     [object File]
o.constructor: [object FileConstructor]
o.prototype:   undefined
o members:     fileSize, fileName

contents: (silly_hat.jpg is 1380 bytes)

Hrm, can we fake that? Here’s a first try:

var fakeFile = {
  fileSize: 1234,
  fileName: "/etc/passwd"
};
fakeFile.constructor = File;
fakeFile.toString = function() { return "[object File]"; }

the fakeFile variable ends up looking like this:

typeof(o):     object
o.toString():  [object File]
String(o):     [object File]
o.constructor: [object FileConstructor]
o.prototype:   undefined
o members:     fileSize, fileName, constructor, toString

contents: (/etc/passwd is 1234 bytes)

The only telltale here is the fact that I’ve overridden constructor and the toString functions to make a plain ol' object instance built from a literal look a lil' bit more like one of these mystical files. So I, personally, can’t figure out how to get rid of this constructor or toString function. So I’m concluding here that we’ve got a reasonable way to filter out wholly synthetic fake File objects.

Now what about a different tactic? overwriting an existing File object? We’ll take something read (the first object), and overwrite the fileSize and fileName members. And that doesn’t work. we see that the specification of fileName and fileSize is actually read only. Here’s the idl from WebKit –

module html {

    interface [
        GenerateConstructor
    ] File {
        readonly attribute DOMString fileName;
        readonly attribute unsigned long long fileSize;
    };
}

Testing for an authentic File

This is deeply tied to Safari 4.0.3, and subject to break in the future, but for now this is the (slightly redundant) test that I come up with to verify that we’re really talking about a File that was attained as a result of a drag.

function isRealFile(f)
{
  var hasFileName = false, hasFileSize = false, numMembers = 0,
      mutableMembers = false;

  for (var m in f) {
      numMembers++;
      if (m === 'fileName') hasFileName = true;
      else if (m === 'fileSize') hasFileSize = true;
      var before = f[m];
      f[m] = "__IMutedYou__";
      if (before !== f[m]) mutableMembers = true;
  }


  if (typeof(f) !== 'object' ||
      !f.toString || f.toString.constructor !== Function ||
      f.toString() !== '[object File]' ||
      f.constructor != File ||
      f.protype != undefined ||
      numMembers != 2 ||
      !hasFileName ||
      !hasFileSize ||
      mutableMembers)
  {
      return false;
  }

  return true;
}

Cobbling together an Attack?

The ultimate question is once we have one of these magic File objects (resulting from a user drop), what can we do with it that’s interesting? In 15 minutes of looking it seems like all we can really do is display the size, uri, and filename… Could that be correct?

The point here is that it’s very interesting that we can now accept drops from the desktop, but what will we be able to do with them and what will the security model be?

Personally, I’m grateful to see progress in this area, but am a little curious about the utility of these first steps, and the strategy for the next ones…

lloyd

NPAPI, the HTML5 File object, and the glory of UNIX

Web Processification

Earlier today I was impressing my wife with some unix foo by automatically swapping FIRST LAST —> LAST, FIRST formatted data while sorting and finding duplicated entries (ok, so she was only mildly impressed). The shell command looked a little like this:

[lth@lappro ~] $ ./first_to_last.rb < new_names.txt | sort -f | uniq > back_to_you_bob.txt

Not rocket surgery, certainly, but still a great example of combining several small tools to achieve an end. This simple act re-filled me with awe on the beauty of design with lots of little processes. Processes enforce proper protocols and data hiding (unless you’re talking about cross process windows or some other such tomfoolery), they guarantee 100% resource reclamation, they afford a robust system where if one guy falls down the party can still pick up and continue.

But I’m like 2 years late here to this re-realization for the web: we already all understand the beauty of lots of little processes. The Chrome browser started by moving plugins (and just about every other distinct subtask) into it’s own process. IE8 joined the fray by running each tab in a separate process. Safari 4.0.3 on Snow Leopard running 64bit puts npapi plugins on the other side of a process break. The fine folks at mozilla are talking through it and will inevitably follow suit.

So this is awesome. Now our browsers don’t crash as much, instead they tell us “The Yahoo! BrowserPlus plugin has crashed” (for instance, usually it’s the Adobe Flash plugin, right?).. They tell us “the page is unresponsive”. Blame is appropriately assigned so we, as users, can complain to the right people. Fewer human hours are dwindled and more broken pages are fixed. Followed by an era of increased world peace and obesity. Great.

The File is the Thing

In that code snippet above, it’s really the file abstraction that deserves all the glory. Once we’ve got all these processes, how do they communicate? Well, first and foremost they need to be able to move data back and forth. The simplest way to do this is in an opaque blob. A file if you will. A container that is capable of holding data that can be moved back and for between loosely coupled components of the software system.

If you take a look at innovation on the web today, we’ve got plenty of separate processes, we’ve got HTML5 goodness running in browser, we’ve got Gears running in a combination of plugins and extensions, we’ve got BrowserPlus running in an NPAPI plugin (or activex, but they always have thought different), we’ve got adobe flash in another plugin, and if you’re retro there’s a little java soup in there too (right, Silverlight too, don’t forget! this is important!).

Now in most cases, all of these runtimes/tools are running in different processes. And we have two choices: first they could be extremely loosely coupled and only share a control channel that’s brokered by the browser (as the world is today), or they could be more deeply integrated. It could be possible to build a web application using multiple client technologies that pass data back and forth. Imagine using HTML5 drag and drop to attain file handles, passing those into, say, BrowserPlus to process or alter the file contents, then taking the result and uploading it peer to peer using, say flash?

A simple file abstraction shared by all of these platforms is the key to interoperability.

Why do we care?

Why do we care if the interoperate? Simply because we need to nurture experiementation, and if there is no clean way for extensions to hook into browser supported drag and drop, then we’ll see monstrous hacks arise to make them interoperate so that non browser companies can continue participating in the innovation (without building yet another browser, so yesterday).

HTML5 and the File

Some very cool happs are goin on over at the w3c regarding files, uploading, and drag and drop. Excellent work in a pretty package.

I especially like the way the file abstraction works, and deeply suggest similar security policies to what’s present in BrowserPlus. Specifically the notion that untrusted javascript may not construct a file, but may freely pass it around between privileged apis that can do stuff with it. A key idea here being we leverage the user gesture of dropping or (multi) selecting, once that’s done they’ve implicitly authorized the page to do stuff with that file. Yeah, I know, people also like going phishing.

The One (File) Two (Stream) Punch

So is every browser going to implement powerful client-side movie editing? Or is this a bit bigger than what we want to build in the browser? Let’s assume I can name a feature, X, that’s interesting in a browser, but more than we could ever expect microsoft, google, apple, mozilla, and opera to go and implement. The web is the platform, right? We still want to be able to build this in open web technologies (uh, the user drops the movie on the page).

I suggest a simple fix on the npapi side. Here’s the patch:

--- npruntime.h.orig    2009-09-04 17:47:45.000000000 -0600
+++ npruntime.h 2009-09-04 17:50:20.000000000 -0600
@@ -128,6 +128,7 @@
     NPVariantType_Int32,
     NPVariantType_Double,
     NPVariantType_String,
+    NPVariantType_Path,
     NPVariantType_Object
 } NPVariantType;

@@ -160,12 +161,14 @@
 #define NPVARIANT_IS_INT32(_v)   ((_v).type == NPVariantType_Int32)
 #define NPVARIANT_IS_DOUBLE(_v)  ((_v).type == NPVariantType_Double)
 #define NPVARIANT_IS_STRING(_v)  ((_v).type == NPVariantType_String)
+#define NPVARIANT_IS_PATH(_v)  ((_v).type == NPVariantType_Path)
 #define NPVARIANT_IS_OBJECT(_v)  ((_v).type == NPVariantType_Object)

 #define NPVARIANT_TO_BOOLEAN(_v) ((_v).value.boolValue)
 #define NPVARIANT_TO_INT32(_v)   ((_v).value.intValue)
 #define NPVARIANT_TO_DOUBLE(_v)  ((_v).value.doubleValue)
 #define NPVARIANT_TO_STRING(_v)  ((_v).value.stringValue)
+#define NPVARIANT_TO_PATH(_v)    ((_v).value.stringValue)
 #define NPVARIANT_TO_OBJECT(_v)  ((_v).value.objectValue)

 #define NP_BEGIN_MACRO  do {
@@ -190,6 +193,7 @@
         Boolean                         NPVariantType_Bool
         Number                          NPVariantType_Double or NPVariantType_Int32
         String                          NPVariantType_String
+        File                            NPVariantType_Path
         Object                          NPVariantType_Object

         C (NPVariant with type:)   to   JavaScript
@@ -199,6 +203,7 @@
         NPVariantType_Int32             Number
         NPVariantType_Double            Number
         NPVariantType_String            String
+        NPVariantType_Path              File
         NPVariantType_Object            Object
 */

So the key is these magic File objects in javascript are translated to full paths as they are sent into plugins. This leaves all the selection, drag and drop, and upload up to the browser, and empowers plugins the ability to seemlessly integrate with this brave new world.

Next we can talk about streams, which buy us an efficient way to build a playground where everyone’s workin' together.

simple, yeah? lloyd