The story of big data

What are the revolutionary ways in which big data has been used across history?

In the 1690s, a theologian called Caspar Neumann sent the secretary of the London Royal Society a set of records he had collected on the number of births, marriages and deaths between the years 1687 and 1691 in the city of Breslau, in what was then Lower Silesia.

The secretary gave this material to a mathematician and astronomer called Edmund Halley, and he used it to compile a table of the ages of people when they died, and from that, the probable life expectancy of a person depending on their age. The resulting “life table” – in reality the first actuarial data in history – was used to compute the annuity that ought to be paid for a given sum of money.

In the 17th century, Neumann’s mortality statistics were big data. Breslau had a population of 34,000 people, so collecting enough information to compile an effective database was an immense undertaking. By contrast the mathematical analysis that Halley performed was relatively simple.

These days, huge amounts of data is freely available but the difficult part is extracting the usable information.

When people talk about exploiting big data, they often mean the unstructured data that is captured in forms that require further processing before it can be utilised. The greatest challenges are posed by material in a format that cannot be readily included in a database, such as image, voice and video. What can be done about all that?

The company that pioneered this technique is Hitachi Data Systems, which collects information from its BlueArc network storage units and content depots and indexes all the actual content using XML.

The first-stage solution is data warehousing: That is, aggregating data that you want to store and study, and putting them in a common format. The company that pioneered this technique is Hitachi Data Systems, which collects information from its BlueArc network storage units and content depots and indexes all the actual content using XML. This creates a common format regardless of the form and eliminates the problem of comparing more than one database. The importance of this move is obvious when you consider that computers might process data about 150,000 times faster when it is arranged sequentially.

The second stage is to “virtualise” that data – that is, to put it in a cloud and optimize its storage. This is done by indexing, then moving data from expensive computer servers to “cold storage” solutions that consume no power at all. What makes Hitachi’s approach to this stage stand out is that it can virtualize all a client’s data in the same cloud, regardless of which IT vendor’s storage system it inhabits. That uniformity of format and access has long been one of Hitachi’s key advantages. Other vendors have since copied this strategy, of course, but Hitachi’s lead has allowed it to stay ahead of the curve, and to do what none of its Japanese rivals have been able to, and thrive in markets outside Japan.

Stage three is adapting these solutions to changing demands, introducing faster and cheaper storage and faster analytic techniques. For example, private clouds for mobile data will be taken care of with a scalable central repository that can execute file synching across platforms and the sharing of applications. The fast storage is provided by Hitachi Accelerated Flash solid-state hard drives, which do much to remove the processing bottleneck represented by the old disk-based systems.

This developing field holds out many opportunities, such as the possibility of eliminating traffic jams and improving the safety of road users.

So, what’s stage four? According to Hu Yoshida, Hitachi Data System’s chief technology officer, that’s a continuation of the race to keep up with the 27% annual rise in the quantity of big data being harvested by the world’s server farms, coupled with the need to come up with real-time analysis of data collected by machine-to-machine communication. This developing field holds out many opportunities, such as the possibility of eliminating traffic jams and improving the safety of road users. Added to that is the need for new types of distributed storage systems – “data lakes” as they have been called – which will be accessible to users who want to run analyses. Then there’s the continued growth of cost-efficient “hybrid” cloud computing, which is another technology that Hitachi has championed. This stores data in a partly public and partly private form, and as 70% of organisations are presently evaluating this solution, it looks set to enjoy a rapid growth in the next year or so.

Whether all of these advances ever produce anything quite as revolutionary as Edmund Halley’s life table, is a question that remains to be answered, but what we can say is that this approach to big data is, in its way, as new in the 21st century as Halley’s was in the 17th.

Waste not, want notPrevious

Meet EMIEW3Next

The story of big data

In the 1690s, a theologian called Caspar Neumann sent the secretary of the London Royal Society a set of records he had collected on the number of births, marriages and deaths between the years 1687 and 1691 in the city of Breslau, in what was then Lower Silesia.

Related Stories

Disclaimer and copyrights