Velocity 2011 (#velocityconf): Big Data / NoSQL "How to Scale Dirty and Influence People"

I had a long and (I'll admint) somewhat ranty post in response to this workshop. Luckily, my blog software lost it.

In any case, there was a lot of talk about how simple components and smaller code bases are more maintainable. However, they also talked a lot about things that are "impossible" with these code bases (i.e. commit conflicts and branching). These aren't strictly true. Anyone who believe this to be true probably isn't writing very interesting software. Their claim of being able to hire "undervalued talent" is kind of weak. They are hiring people who can hack up a short script that doesn't need to be maintained. This isn't a question of whether it can be maintained. They would just throw away any code that needs to be fixed or rewritten. They gave some examples that made me cringe and shudder for reasons that appear to have been overlooked. I won't go into those because they're not something I want to promote.

The most interesting part was actually their processing system. There appears to be a nicely-distributed processing system with data retrieval on one end and data processing and copying on the other. It's nicely decoupled, but there are latency and cost issues with having to store more intermediate data. Still, storage is now more scalable than it used to be, so this could be the right solution for their use case.

The final point that I noticed was that they are doing authorization in parallel with data retrieval on their read-only API. This looks inherently dangerous to me. I could, as a completely unauthorized user, create a very effective DDoS of their back-end systems by performing expensive, unauthorized requests. There are definitely some security implications here.

All in all, the talk felt like an intro to low-scale distributed systems. Larger scale, more mature distributed systems solve so many more problems than they have solved thus far.

web
stats