10 Things About Big Data

I missed Big Data Boot Camp a couple of weeks ago in NYC, but I found this summary. Maybe I'll get to go next year.

10 Things to Know About Big Data:
  1. While big data is a heavily used buzzword, technologies such as Hadoop are now just in the nascent stages of adoption. If you haven’t already dipped your toes in the water, don’t panic; it’s still early. But now is the time to begin exploring the technologies, which are maturing rapidly. 
  2. Big data is not new. The difference now is that there are technologies that are widely available to everyone. In terms of volume, many enterprises have had large data stores for some time. Data variety and velocity has been around too. There just have not been affordable and accessible options for deriving value from it. 
  3. An inevitable trade-off in achieving a best-of-breed approach with regard to big data is that you have to manage complexity. 
  4. In-memory technologies are looming large due to the need for timely access to data. 
  5. The advantage of Hadoop and NoSQL – or schema-less – systems is that you don’t have to prepare the data, making it ideal for data of unknown value. However, Hadoop is not just an archive or a staging area. If you are using it as a staging area, you are missing the point. 
  6. Since Hadoop is an open source technology that can run on commodity hardware, the technology has a fairly low cost-barrier to entry, but the skills required to use it are not readily available. Data scientists are hard to come by and some companies are finding that in order to employ these experts they must open offices in California. Some decide to simply outsource the job. If you don’t have access to staff with the expertise that Hadoop and related technologies require, a pure open source Hadoop implementation is not optimal. You may be better off starting with a third-party distribution or proprietary package that gives you things like bug fixes and training and support. 
  7. SQL is not going away. Relational databases are still important and there are more solutions coming on the scene that provide front-end SQL access to big data stores of schema-less data. 
  8. Ultimately, the value of big data lies in opening it up to more people for analytics, making it more accessible and easier to use. Hadoop doesn't have the open accessibility that it needs yet, but it will soon. 
  9. Be sure you know how you are protecting data in big data systems – but also have an awareness of how difficult it could be to pull data out of a system if necessary for legal proceedings. Lawyers and judges won’t care how hard it is to do it. Ensure that you are thinking about the legal side, because it will impact you. 
  10. Social media is not a threat. But remember that collecting and analyzing social media is more than just racking up “likes.” Involving employees in meaningful one-on-one social media interaction on behalf of your company can provide unique insight into your customers' sentiments and help create a closer relationship with them.
I'm just beginning to learn what a lot of this really means, thanks to my studies in Information Design. I wish I'd learned this stuff back in graduate school, but my focus was Health Communication not Library Technologies.

Comments