Jul 21

2014

Comparison of Cloud Storage HTTP APIs

I will compare and review the different APIs of end user cloud storage providers. I will only look at the HTTP API aspect, not how this API is implemented for various languages. The motiviation behind this is research for my common cloud storage API.

Is Cloud Storage a Remote Filesystem?

Most Cloud Storage providers either try to be a filesystem in the cloud or use their own concepts of files and folders.

One feature of a traditional filesystem is that is has a notion of:

File name Identify a storage location by using path components like host, directory, name and type (through extension).
Directory A hierarchical filesystem is organized by having parent-child relationships between directories and subdirectories

Provider	File Identifier	File Type	Hierarchy
Dropbox	Path	File Extension	Path
Google Drive	File ID	Mime-Type	Parent ID
Box	File and Folder ID.	File Extension	Parent ID
One Drive	Path or Folder ID	File Extension	Parent ID
Sugar Sync	File and Folder ID	Mime-Type	Parent ID

Only Dropbox and One Drive have an API that comes close to a filesystem. The others work with the IDs of files or folder. Hierarchy is established by saving the Parent ID on the File Ressource. Children of a Parent ID can usually be requested through the Folder Ressource. Dropbox is a very good example in keeping the learning curve low by mimicing a filesystem. Google Drive does not even have folders (only a specific folder mime-type), which makes Google Drive difficult to understand at first sight.

Authentication

The good news is that everyone uses OAuth 2 nowadays!

Provider	OAuth 1	OAuth 2
Dropbox	Yes	Yes
Google Drive	Yes	Yes
Box	No*	Yes
One Drive	No*	Yes
Sugar Sync	Yes	Yes

Listing files

Provider	Method and URL
Dropbox	`GET /metadata/dropbox/{path to folder}?list=true`
Google Drive	`GET /files/{folder id}/children`
Box	`GET /folders/{folder id}/items`
One Drive	`GET /{folder id}/files`
Sugar Sync	`GET /folder/{folder id}/contents`

We see that Dropbox does have a separate Metadata Ressource, which makes the separation between the file metadata and file data obvious. Like mentioned above Google Drive does not know about folders and therefore uses the File Ressource to access folders. Box, Sugar Sync and One Drive operate on a property of the Folder Ressource (items, files, contents, children).

Provider	Specify Fields	Paging
Dropbox	Not supported	Not supported
Google Drive	Include fields	Url Param `maxResults` and `pageToken`
Box	Include fields	Url Param `limit` and `offset`
One Drive	Not supported	Not supported
Sugar Sync	Not supported	Url Param `start` and `max`

Paging is done by using tokens (Google Drive) or a given offset and limit (Box and Sugar Sync). One Drive and Dropbox lack the ability to do paging. Dropbox does not allow you to list more than 25k ressources.

Google Drive and Box let you specify which fields of the listed Ressource you want included while the others just include everything.

Download File

Provider	Method and URL
Dropbox	`GET /files/dropbox/{path to file}`
Google Drive	`GET {download link}`
Box	`GET /files/{file id}/content`
One Drive	`GET /{file id}/content`
Sugar Sync	`GET /file/{file id}`

When using Google Drive, one has first to obtain the download link by issueing a metadata request: GET /files/{file id}. The response contains the download link. If the requested file is a Google Document it has to be exported into a file first. It seems that using the HTTP Range header for specifying partial downloads is best practice.

Provider	Partial download	Metadata included
Dropbox	HTTP Range header	HTTP `x-dropbox-metadata` header
Google Drive	HTTP Range header	Metadata request is required anyway
Box	Not supported	Not supported
One Drive	Not supported	Not supported
Sugar Sync	Not supported	Not supported

Dropbox let’s you include metadata about the file (even though metadata is a separate ressource, which is a bit inconsistent). Every provider returns the raw file data (without mixed metadata) so consumers don’t have to worry about encoding.

Upload File

Provider	Method and URL
Dropbox	`PUT/POST /files_put/dropbox/{path to file}`
Google Drive	`POST /files?uploadType={ media, multipart or resumable }`
Box	`POST /files/content`
One Drive	`PUT/POST /{folder id}/files/{file name}`
Sugar Sync	`PUT /file/{existing file id}/data`

The APIs differ quite a bit for uploading content. Dropbox does not use a RESTful url for the uploading part (but otherwise uses the REST approach quite strict).

Provider	Metadata Response	Partial upload	Request Body
Dropbox	Full metadata	Chunked Upload	File contents
Google Drive	Full metadata (unnecessary)	Three options	File contents, Multipart
Box	Full metadata	Not supported	Filename, Parent ID, Timestamps or Filepart (for POST multipart upload)
One Drive	Partial metadata	Not supported	File contents, Multipart
Sugar Sync	Not supported	Not supported	File contents

Dropbox and Google Drive provider methods to upload huge files in partial requests. Most of the Providers return the full metadata for the created object. This is a bit unnecessary for Google Drive as we already have the metadata, because we have to create an object in advance.

File Metadata

Provider	Size	Time
Dropbox	`bytes`	`modified`, `client_mtime`
Google Drive	`fileSize`, `quotaBytesUsed`	`createdDate`, `modifiedDate`, `modifiedByMeDate`, `lastViewedByMeDate`, `markedViewedByMeDate`, `sharedWithMeDate`
Box	`size`	`created_at`, `modified_at`, `trashed_at`, `purged_at`, `content_created_at`, `content_modified_at`
One Drive	`size`	`created_time`, `updated_time`, `client_updated_time`
Sugar Sync	`size`	`timeCreated`, `lastModified`

Dropbox does not provide all the time information that might be interesting. Google Drive provides alot of information about time related actions (they have to be explicitely included in a metadata request though). Box differntiates between actions performed on the content or on the metadata.

Provider	Thumbnail	Hash	Deleted
Dropbox	`thumb_exists`	`hash`	`is_deleted`
Google Drive	`thumbnailLink`, `thumbnail.image`	`md5Checksum`	`explicitlyTrashed`
Box	Not supported	`sha1`	`item_status`
One Drive	Not supported	Not supported	Not supported
Sugar Sync	Not supported	Not supported	Not supported

Some providers expose the hashes, which makes the developers life a bit easier because he can compare hashes instead of timestamps. One Drive and Sugar Sync do not have the notion of a deleted ressource, while the others let you request deleted ressources until they are finaly purged.

Provider	Support revisions	Image metadata	Permissions
Dropbox	`rev`	`photo_info`, `video_info`
Google Drive	`headRevisionId`	`imageMediaMetadata`	`userPermission`, `permissions`, `shared`
Box	`version_number`	Not supported	`shared_link`, `owned_by`, `permissions`
One Drive	Not supported	Not supported	`shared_with`, `access`
Sugar Sync	`versions`	`image`	`publicLink`

Some kind of versioning is common among the providers (as usual with the exception of One Drive). They normally use a moving version number. Dropbox, Google Drive and Sugar Sync know based on the mime type of a ressource that it is an image and provide you with information (width, height, encoding) about it. Everyone implements permissions but this is highly dependent of the provider. One can say however that everyone offers you to share the file via a public link.

Conclusion

All cloud storage APIs are doing a good job. They are all trying hard to help you as a developer to understand their concepts (through examples or documentation). Alot of the core operations are basically the same just with different naming of the attributes and parameters (sad that no standard evolved yet). I personally think that Box provides the most elegant API, Dropbox is the easiest one to use and Google Drive has all the features you want.

Dropbox

I am really fond of the Dropbox API because using a path instead of an ID proved to be easier to use as a developer. However accessing a ressource via its ID makes more sense from a RESTful standpoint. Dropbox uses ressources but is not that consistent about it (/files_put or /commit_chunked_upload). The API of Dropbox is feature rich but still easy to use.

Google Drive

Google Drive exposes alot of metadata and supports even more features (three different way to upload file content!) and actions than Dropbox does. It provides excellent documentation as well. The API however is not that easy to use because there are alot of counter-intuitive things (no folder ressource, listing files by using a query, create ressource first before uploading content to it, Google Docs/Spreadsheets are not downloadable).

Box

The Box API makes everything right. They provide a well structured and elegant REST API that behaves like you expect it. They don’t provide all the features Dropbox and Google Drive do but this is not really a problem because the basic and most used operations are all there.

One Drive

One Drive has a spartanic documentation und only provides a minimal set of features. This is not necessarily bad as it makes it easy to grasp the structure at one glance. The file and folder IDs look a bit confusing at first sight (file.a6b2a7e8f2515e5e.A6B2A7E8F2515E5E!184). To directly compete with the others One Drive needs to implement more features but it has a solid API one can easily use today.

Sugar Sync

Sugar Sync has a very similar API to Box and this is a good thing! Their product is not all about storage which and you can see that reflected in their API. They explicitely rely on XML which might put off many of nowadays JSON purists. But they have a very good documentation and solid set of features.

Common denominator

If one would write an middleware to provide a common interface for those APIs (like kloudless or Cloud Elements) we have to choose the lowest common denominator for the APIs.

This means that only the most basic form of the features can be used and only features that can be emulated should be implemented. I’ll probably write more about that in a later post.