Introducing BRIC (Bunch of Redundant Independent Clouds)

The Plan

Online storage providers are handing out free storage like candy.  Add them all up and soon you’re talking about a serious amount of space.  So let’s have some fun by turning ten different online storage providers into a single data grid which is secure, robust, and distributed.  We call this grid a BRIC (Bunch of Redundant Independent Clouds).


It’s pretty cheap to get storage.  For example, Google Drive offers 5GB for free and you can upgrade to 100GB for just $5/month. However you might still prefer the BRIC approach as online storage providers can:

By adopting an open-source BRIC you avoid putting all your eggs in one basket and have improved transparency to understand exactly what is happening to your data.

Introducing Tahoe-LAFS

The BRIC solution presented here will use the open source project Tahoe-LAFS to perform the RAID like function of striping data across different storage providers.  Here’s how Tahoe-LAFS describes itself:

Tahoe-LAFS is a Free and Open cloud storage system. It distributes your data across multiple servers. Even if some of the servers fail or are taken over by an attacker, the entire filesystem continues to function correctly, including preservation of your privacy and security.

More detail from these links:

We’ll create a private Tahoe grid where each node’s storage will be backed by an online storage provider.  Tahoe doesn’t directly support online storage providers as remote back-ends, but we can work around this problem by using sync folders, at the expense of local disk space.

Storage Providers

Here are the providers used. They each offer free plans with desktop apps or daemons which sync local folders. In total we have 92GB of free storage and we could easily obtain more by referring friends or using secondary email addresses.

  • Asus Web Storage – 5GB
  • Dropbox – 25GB
  • Google Drive – 5GB
  • Jottacloud – 5GB
  • Skydrive – 25GB
  • Spideroak – 2GB
  • Sugarsync – 5GB
  • Symform – 10GB
  • Ubuntu One – 5GB
  • Wuala – 5GB

Before setting up Tahoe, launch the desktop apps and set up the sync folders.  For this project, it’s a good idea to put them all under the same folder such as ~/syncfolders to make file management simpler.

Patching Tahoe

Originally this solution was only supposed to run on Linux but some providers only offer sync software for OS X and Windows. So we’ll be adventurous and install Tahoe on two computers on the same internal network, one computer running OS X and the other Ubuntu.

To get started, download the latest version of Tahoe (1.9.2 at time of writing).  Before building we’ll apply some patches.  The patches do two things.

  1. Add a maximum_capacity configuration option to set the storage capacity of each node. Without the patch, nodes will keep storing data until the local hard drive is full. Instead, we want to restrict each node to storing only the amount of data offered by its associated online storage.
  2. Compute the available space by subtracting the actual number of bytes used by the storage folder from the maximum capacity.  Without the patch, the node will simply compute the free space of the local hard drive which is not the behaviour we want.

I’m not a Python programmer and have never looked at the Tahoe-LAFS source code before, so the changes are most likely sub-optimal, but do seem to work fine.

Patch for allmydata-tahoe-1.9.2/src/allmydata/

---      2012-10-21 19:47:56.000000000 -0700
+++   2012-10-22 12:35:12.000000000 -0700
@@ -220,6 +220,19 @@
% data)
if reserved is None:
reserved = 0
+       data = self.get_config("storage", "maximum_capacity", None)
+       capacity = None
+       try:
+               capacity = parse_abbreviated_size(data)
+        except ValueError:
+               log.msg("[storage]maximum_capacity= contains unparseable value %s" % data)
+       if capacity is None:
+               capacity = 0
discard = self.get_config("storage", "debug_discard", False,

@@ -247,6 +260,7 @@

ss = StorageServer(storedir, self.nodeid,
+                           maximum_capacity=capacity,

Patch for allmydata-tahoe-1.9.2/src/allmydata/storage/

---      2012-10-21 19:22:14.000000000 -0700
+++   2012-10-22 20:56:16.000000000 -0700
@@ -38,7 +38,7 @@
name = 'storage'
LeaseCheckerClass = LeaseCheckingCrawler

-    def __init__(self, storedir, nodeid, reserved_space=0,
+    def __init__(self, storedir, nodeid, reserved_space=0, maximum_capacity=0,
discard_storage=False, readonly_storage=False,
@@ -58,6 +58,7 @@
self.corruption_advisory_dir = os.path.join(storedir,
self.reserved_space = int(reserved_space)
+        self.maximum_capacity = int(maximum_capacity)
self.no_storage = discard_storage
self.readonly_storage = readonly_storage
self.stats_provider = stats_provider
@@ -167,6 +168,8 @@
# remember: RIStatsProvider requires that our return dict
# contains numeric values.
stats = { 'storage_server.allocated': self.allocated_size(), }
+        stats['storage_server.BRIC_available_space'] = self.get_available_space()
+        stats['storage_server.BRIC_maximum_capacity'] = self.maximum_capacity
stats['storage_server.reserved_space'] = self.reserved_space
for category,ld in self.get_latencies().items():
for name,v in ld.items():
@@ -205,7 +208,16 @@

if self.readonly_storage:
return 0
-        return fileutil.get_available_space(self.sharedir, self.reserved_space)
+        #
+        total_size = 0
+        for dirpath, dirnames, filenames in os.walk(self.sharedir):
+            for f in filenames:
+                fp = os.path.join(dirpath, f)
+                total_size += os.path.getsize(fp)
+        return self.maximum_capacity - total_size
+#        return fileutil.get_available_space(self.sharedir, self.reserved_space)

def allocated_size(self):
space = 0

Setting up Tahoe

The next step is for you to read the Tahoe documentation. Setting up Tahoe can be tricky so it’s best to become familiar with the concepts and terminology before proceeding.  A good starting point is:

1. Linux Configuration

On the Linux machine, create a project directory and then create an introducer and get the introducer’s URL from the file introducer.furl.  An introducer is the starting seed of our grid.

tahoe create-introducer introducer

Next, create client nodes for each of the Linux friendly storage providers: Ubuntu One, Dropbox, Wuala and Symform.

tahoe create-client node1
tahoe create-client node2
tahoe create-client node3
tahoe create-client node4

For each client node, modify the configuration file tahoe.cfg and set the introducer URL, nickname and the maximum_capacity to match that of the storage provider. Here’s an example for Ubuntu One.

nickname = node1-ubuntuone
web.port = tcp:3456:interface=
introducer.furl = pb://p6dkdpfdvh3vpwf5yafir7tv4lwizsid@ubuntu.local:42136,
enabled = true
maximum_capacity = 5G

Next, in each client node’s directory, create a symbolic link storage which links to the sync folder of a provider.

storage -> ~/syncfolders/wuala

You might prefer to use a designated folder in each provider’s sync folder.

storage -> ~/syncfolders/Google Drive/bricstuff

If you intend to share the online storage for non-BRIC purposes, this will impact the computation of available space as the Tahoe node won’t know how much space is taken up by non-Tahoe data. It’s best not to.

If you prefer, you can create symbolic links in the other direction e.g.

~/syncfolders/dropbox -> ~/projects/bric/tahoe/node2/storage

2. Launching Tahoe on Linux

To launch Tahoe you specify the directory of the created Tahoe node, or run tahoe from within that node’s directory.  Start the introducer node first, and then the other clients:

tahoe start ./introducer
tahoe start ./node1
tahoe start ./node2
tahoe start ./node3
tahoe start ./node4

To check your grid is up and running, connecting to a client node’s web interface via your browser: http://localhost:3456

3. Connecting the Mac

You repeat the same setup on the Mac for the Linux unfriendly storage providers, but you don’t need to create an introducer. Just create Tahoe client nodes as normal and configure then with the already obtained introducer URL.  If all your Tahoe nodes are up and running, you should see that you have one active Tahoe introducer and ten active Tahoe clients.

If you check your online storage, you should see the desktop apps have already started syncing Tahoe’s house-keeping files and folders:

Storing Data in Tahoe

Uploading a single file is easy via the web interface of a Tahoe node.  Tahoe will split up the file, encrypt, and store the chunks of data redundantly across the connected network.  You can confirm this by checking the sync folders and your online storage accounts.

You can also transfer folders and files in bulk by using Tahoe from the command line.  It’s strongly advised you consult the Tahoe documentation to avoid frustration!

Keep in mind that you have to keep track of any file or folder URLs (what Tahoe calls FILECAP or DIRCAP) you created otherwise you won’t be able to retrieve your data.  Creating aliases for frequently used URLs makes things easier.

Here is an example of what you might do.

cd node1
tahoe add-alias -d ./ home URI:DIR2:blahblahblah
tahoe cp -d ./ /tmp/test.txt home:
tahoe backup -d ./ ~/Documents home:docs
tahoe deep-check -d ./ home:
tahoe ls -d ./ home:

How is data stored across the BRIC?

Tahoe will store data using erasure coding rather than simple replication.  A 2002 paper “Erasure coding vs. Replication a quantitative comparison” [PDF] demonstrated why:

We show that systems employing erasure codes have mean time to failures many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements. More importantly, erasure-resilient systems use an order of magnitude less bandwidth and storage to provide similar system durability as replicated systems.

The Windows Azure team received a 2012 USENIX Best Paper Award for detailing how they use erasure coding in their storage service.

Tahoe will use encoding parameters defined in the configuration file tahoe.cfg:

# What encoding parameters should this client use for uploads?
# default is 3,7,10
#shares.needed = 3
#shares.happy = 7 = 10

What do these default values mean? As explained in the FAQ:

The default Tahoe-LAFS parameters are 3-of-10, so the data is spread over 10 different drives, and you can lose any 7 of them and still recover the entire data. This gives much better reliability than comparable RAID setups, at a cost of only 3.3 times the storage space that a single copy takes. It takes about 3.3 times the storage space, because it uses space on each server needs equal to 1/3 of the size of the data and there are 10 servers.

So from our 92 GB of total online storage space, we can expect to store about 28GB of data. You could store more data by tweaking the parameters, but at the expense of redundancy.

Retrieving Data

Getting and verifying data is done via the command line.  Here’s what you might do.

cd node1
tahoe add-alias -d ./ home URI:DIR2:blahblahblah
tahoe deep-check -d ./ home:
tahoe ls -d ./ home:
tahoe cp -d ./ home:test.txt ~/recover/test.txt
tahoe cp -d ./ --recursive home:docs ~/recover/Documents

If you’ve managed to store data, verify it’s been striped across the providers, and retrieve the data without any errors, you’ve successfully built a BRIC.  Congratulations!

Some Thoughts

With the help of Tahoe-LAFS we’ve turned a hodge-podge mix of free online storage into a secure, fault-tolerant, distributed store for data. It might be fair to say that a BRIC is truly greater than the sum of parts.

Obviously there’s a huge scope for improvement as the BRIC solution proposed here is not easy to set up, configure or use.  A few things to ponder are:

  • Duplicity supports a Tahoe-LAFS backend so you can use Duplicity for backups to the BRIC and avoid the Tahoe command line.
  • It might be possible to save local disk space and avoid the use of sync folders by making use of storage providers who support open protocols (only a few do and often only as a paid option). For example, upon login you could auto-mount via WebDAV, with the Tahoe storage symlinked to the mounted volume. Data is now stored directly on rather than local disk.
  • You could avoid having to patch the Tahoe source by linking storage to a virtual file system, with the size set to match the online storage provider. On OS X you could use disk images. However, the images have to exist somewhere so you don’t really save any local disk space with this approach.
  • Running multiple closed-source background apps that scan your hard drive is not ideal on your day-to-day computer. There’s a performance hit and a real security issue: ‘Google Drive opens backdoor to Google Accounts’. The current approach is probably best suited to a dedicated backup computer.
  • You’ll need decent upstream speed as you now have multiple sync apps competing for available bandwidth.
  • Tahoe-LAFS is complicated to use and configure correctly. There’s a lot to consider, such as  lease time of data and maintenance of the grid e.g. rebalancing data when storage providers change. More here:
  • If rolling your own BRIC is too much effort, you might want to look at Symform and Bitcasa, two user friendly storage providers based upon a P2P / community storage model.  The creators of Tahoe-LAFS also have a commercial offering which supports a S3 backend and the code remains open-source.
  • Cloud providers won’t like being reduced to commodity storage by the BRIC approach, so one day they might explicitly forbid such usage in the terms of service.  Thankfully due to redundancy, even if a provider closed your account you shouldn’t suffer any real data loss and you would have plenty of time to add an alternate storage provider.
  • Some folk at Cornell wrote a paper on using a proxy to stripe data across multiple cloud storage providers and call their proxy a RACS (Redundant Array of Cloud Storage). I’ve never seen an array of clouds in the sky, but I have seen a bunch of them, so I prefer the term BRIC 🙂

If you’ve created your own BRIC (or similar) let’s hear about it!


Posted in cloud, linux, mac | Tagged , , , , , , , , , , | 24 Comments

PandoraJam 2.0 Update

Update: Please upgrade to 2.1 to resolve most of the issues. Launch PandoraJam and select the menu item PandoraJam –> Check for updates.  This will then start the upgrade process. The next time you launch PandoraJam you can check the version number by selecting the menu item PandoraJam –> About PandoraJam

PandoraJam is broken as of Fri Sep 28 due to recent changes on the Pandora music website (outside of our control).  Thanks for your tweets and emails.  We’ve been investigating and are working on some fixes.  We hope to be able to start rolling out some updates to restore functionality (where possible) in the very near future.  Thanks for your patience.

Posted in pandorajam | Tagged | 14 Comments

The Node.js cpu blocking thing

UPDATE: TLDR “Because nothing blocks, less-than-expert programmers are able to develop fast systems” says Node’s About page, yet it is actually easy for programmers to block Node’s event loop, thereby reducing concurrency.  Go’s approach to concurrency makes it easier for programmers to take full advantage of multi-core processors. 

Recently a budding entrepreneur told me they were using CoffeeScript, Node.js and MongoDB to create their server application.  I asked if they were aware that by design Node was single-threaded, so the server would block on cpu intensive code. The response was a puzzled look followed by some mumbling about how Node doesn’t block because it uses an event loop and asynchronous callbacks.

This reminded me of Ted Dziuba’s highly inflammatory post Node.js is Cancer where many on HackerNews completely missed the issues raised. With JGC’s recent benchmarking adventure To boldly Go where Node man has gone before, and history somewhat repeating itself in the HackerNews comments, I figured it was time to create my own example to demonstrate that Node.js cpu blocking thing.

Building on previous examples, consider a simple server which upon receiving a request, performs an entirely contrived piece of work (since “fibonacci related benchmarks should die in a fire” :-).
UPDATE: Any work that consumes cpu cycles is real work, so feel free to insert your own file encryption or photo filtering code.

Here’s the Go server code:

Here’s the Node server code:

To benchmark, we’ll run ab on Ubuntu and ask it to connect to the Go (1.0.1) or Node (0.6.18) server running on OS X 10.7.3 with quad-core i7 processor and 4GB ram.  Here are the results of making 1,000 requests.

ab -n 1000

Time taken for tests:   54.994 seconds
Requests per second:    18.18 [#/sec]
Time per request:       54.994 [ms]
Transfer rate:          1.70 [Kbytes/sec]

Time taken for tests:   58.182 seconds
Requests per second:    17.19 [#/sec]
Time per request:       58.182 [ms]
Transfer rate:          1.28 [Kbytes/sec]

Let’s make another 1,000 requests, this time with 100 concurrent requests.

ab -n 1000 -c 100

Go (process launched with environment variable GOMAXPROCS=4)
Time taken for tests:   17.416 seconds
Requests per second:    57.42 [#/sec]
Time per request:       17.416 [ms]
Transfer rate:          5.38 [Kbytes/sec]

Time taken for tests:   50.601 seconds
Requests per second:    19.76 [#/sec]
Time per request:       50.601 [ms]
Transfer rate:          1.47 [Kbytes/sec]

As you can see, when requests are sent to the server concurrently, the Go server speeds up dramatically and is much faster than the Node server. It’s almost as if the Node server can’t handle concurrent requests and is simply processing them one at a time.

So what’s happening? Well, for each incoming request, the Go server kicks off a goroutine, which takes advantage of the quad-core processor to perform processing in parallel. Goroutines are presumably multiplexed over multiple OS threads, but that’s a runtime implementation detail.  By contrast, the Node server only has a single thread of execution, so the event loop is blocked while it’s processing.

If your brain is starting to think about algorithms, real-world latency and optimisation, stop!  The benchmarks don’t really matter.  The important thing to take away here is that a Node server’s event loop can be easily blocked.  This might come as a shock to some Node beginners but it shouldn’t. Quoting from Tom’s introductory Node book :

This single-threaded concept is really important. One of the criticisms leveled at Node.js fairly often is its lack of “concurrency.” That is, it doesn’t use all of the CPUs on a machine to run the JavaScript.”

“Because Node relies on an event loop to do its work, there is the danger that the callback of an event in the loop could run for a long time. This means that other users of the process are not going to get their requests met until that long-running event’s callback has concluded.” 

As we’ve mentioned, Node is single-threaded. This means Node is using only one processor to do its work. However, most servers have several “multicore” processors, and a single multicore processor has many processors. A server with two physical CPU sockets might have “24 logical cores”—that is, 24 processors exposed to the operating system. To make the best use of Node, we should use those too. So if we don’t have threads, how do we do that?”

Fortunately, there’s no need to fret. To take advantage of multiple processors, simply use a package called Cluster, which forks child processes to run as individual Node servers.  So here’s the code for the Node server again, this time using Cluster.

As expected, when four Node servers are launched, the results improve substantially.

ab -n 1000 -c 100

Go (1 process launched with environment variable GOMAXPROCS=4)
Time taken for tests:   17.416 seconds
Requests per second:    57.42 [#/sec]
Time per request:       17.416 [ms]
Transfer rate:          5.38 [Kbytes/sec]

Node Cluster (4 processes launched)
Time taken for tests:   26.967 seconds
Requests per second:    37.08 [#/sec]
Time per request:       26.967 [ms]
Transfer rate:          2.75 [Kbytes/sec]

Using Cluster looks simple, but if you’re a conscientious programmer, you now have a few things to worry about.  For example “Forking a new worker when a death occurs” or “Monitoring worker health using message passing”.  Digging deeper, you might even begin to question the very use of Cluster!  Node add-ons like WebWorkersFiber (non-preemptive scheduling) and Threads a Go Go (real threads) offer alternative approaches. Confused yet?
UPDATE: Here’s an example!topic/nodejs/RS5Whcqbgq4/discussion

By contrast, Go has concurrency baked-in. Goroutines and channels provide a simple and elegant approach to writing fast server applications. A single Go server offers a level of concurrency matched only by a cluster of Node servers. There are many reasons to like Go which is why it’s my language of choice.

Node is great for Javascript developers but for everyone else there’s probably already a comparable solution close to home. As for buzzword entrepreneurs, there are always greener pastures!

UPDATE: Node.js developers recently experimented with threads instead of processes.!msg/nodejs/zLzuo292hX0/F7gqfUiKi2sJ

Reddit/Golang Comments
HackerNews Comments

Posted in go, golang, javascript, node | 7 Comments

50% Discount – Happy Holidays!

Here’s a crazy holiday offer to go with the eggnog and mince pies!

Use coupon MADSANTA in the Bitcartel Online Store for 50% OFF everything! (Valid 24-26 Dec)

To give a gift, simply use the web store as normal, entering your payment details, but under ‘License Options’ enter the recipients name. You can then print out and glue the registration code into their Christmas card!

Posted in Uncategorized | Leave a comment

PandoraJam – latest info on Twitter feed

Before sending emails to PandoraJam support, please check the following resources.

To get the latest official news about PandoraJam, check the Twitter feed.

Looking at what others are saying about PandoraJam on Twitter is a useful way to find out about potential problems and solutions.!/search/%40pandorajam

More answers, tips and helpful information can be found here:

Thank you!

Posted in Uncategorized | 1 Comment

Dear PandoraJam Fans…

Update 3 (2 Dec 11) PandoraJam 2 works with’s HTML5 player, so this post can be disregarded. If you are using PandoraJam 1.x please upgrade for free to PandoraJam 2.

Update 2 (24 Sep 11) Good news. We’ve been able to restore most features and continue to make progress. Please keep your PandoraJam updated via the menu option ‘Check for Updates…’ or download from this link:

Update 1 (22 Sep 11) Some cheerful news! We have released a test version of PandoraJam 2 which has basic support for’s new HTML5 player. Please update the app via the menu option ‘Check for Updates…’

Dear PandoraJam Fans…
It’s probably been coming ever since Pandora announced their new HTML5 player… bad news.

Over the last few weeks we’ve been working hard at keeping PandoraJam working with’s classic Flash player.  In fact, a few more bugs had been fixed and issues resolved this week, and like car enthusiasts, there was some satisfaction in keeping the motor running and tuned up.  In the end though, it never rains, it pours.

This Wednesday evening, PandoraJam was chugging along as usual. Sweet. However as of around midnight EST, it seems turned off the classic Flash version and are now pushing all listeners to use their new HTML5 based player.  A hint of that was detected when earlier in the day it was noticed that a link titled ‘Old Site’ had been removed from’s HTML5 player (they have a minimum browser specification where all browsers support the new player).
Overall, this is great news for Pandora fans as the classic Flash player was resource intensive, but bad news for PandoraJam. The classic Flash player no longer loads.  With the classic Flash player turned off, PandoraJam no longer functions.  Kaput!  We must admit, we’re somewhat surprised at how quickly Flash has been removed, as we felt would let listeners who use older operating systems and browsers stick around for say 12 to 18 months (there’s little development cost in keeping the old Flash player around), but alas, no.

Thus, sadly, with immediate effect we are temporarily suspending sales of PandoraJam.  Our refund policy is normally 30 days but we’re extending it out to Aug 1st (just under two months).  If you need a refund, email (purchases and refunds are processed through FastSpring, our payment processor).  For long-time users, we hope you’ve had fun and enjoyed zillions of hours of music pleasure, and all for less than the cost of a beer and pizza 😉
Update: Sales are back online now
Will there be a PandoraJam 2 with HTML5 support?  We don’t have an answer at this point but we’re working on it. We think it should be possible, we might lose some features, but that will still be better than looking at a blank screen.  If there is a PandoraJam 2, it will be a free upgrade for all users.  We’ll keep you posted as to progress (more frequent updates on our twitter feed).
Update: For clarification, we are attempting to restore all features. Note that PandoraJam 2 requires OS X 10.5 or later.

Since the summer of 2007, we’ve been receiving emails from music fans who found PandoraJam, fell in love with Pandora, and ended up becoming Pandora One subscribers.  It’s been quite a journey to watch Pandora, a little startup in Oakland, grow into a publicly listed company on the NASDAQ , and quite remarkable that the Flash based player lasted so long at all.  Now they’re all grown up and HTML5… yet with Google Music, iCloud, Spotify, Rdio, Turntable, and others hitting the scene, it’s like the internet music party is only just getting started.
Stay wired. Namaste.
Posted in Uncategorized | 35 Comments

PandoraJam – Updated for Lion & Classic Pandora

Update: This is a full release, and no longer a beta, so the post has been modified slightly.

There is a new version of PandoraJam (1.7 build 476) available for download:

What’s new:

  • supports Classic Flash version of Pandora website (and not the new HTML5 version) and will try to load the Classic Flash version
  • support for Mac OS X 10.7 Lion (requires Mac OS X 10.5 or later)
  • now streams music to AppleTV
  • fixes a problem streaming music to some Airport Express units
  • auto reloads if the app does not launch properly and thus cannot enhance the Pandora website (this means you no longer have to ‘Empty Cache’, quit and relaunch the app)
  • auto fixing of windowing position (when sometimes Pandora website message banners caused the entire display to be shifted down)

Some help:

You shouldn’t need to do anything special… but if PandoraJam does not load the Classic Flash website properly, try the following:
  1. Launch Safari and visit
  2. If you are taken to the new HTML 5 website, click the ‘Old Site’ link in the top right hand corner
  3. Once the Classic Flash website player loads, close the browser window.
  4. Launch PandoraJam

It seems possible (for now) to run the new HTML 5 player in a browser window logged into one account, while the PandoraJam application uses another account. 

Known issue:

On Mac OS X 10.7 Lion, Adobe Flash Player version 10,3 repeats letter keystrokes. This affects the Flash Player when running in 32 bit mode (which is what PandoraJam runs in), but not in 64 bit mode. Downgrading to Flash Player version 10,2 eliminates the keystroke problem.  This is an Adobe Flash (possibly WebKit) bug and needs to be fixed by Adobe (possibly Apple).

Posted in Uncategorized | 8 Comments