Sunday, October 4, 2015

The reference architecture for Big Data

The IT department often puts in trouble when departments of their demand a Big Data solution.


Most stakeholders is not clear at the beginning of such projects for which the anticipated workloads infrastructure is to be aligned. This makes the trade-off between scalability and cost. A new approach on a reference architecture provides a remedy.

Anyone ordering a motor, should know to what this vehicle is used. Depending on the use of either a high-performance diesel or a petrol engine more quickly, or more recently, a more economical hybrid or electric drive is selected. Departments order being reinforced an engine - in other words: a new infrastructure - in the IT department, whose future application is largely unknown.

We are talking about the development and implementation of a business idea in the field of Big Data. A typical use case from the marketing, for example, the realization of a click-stream analysis. A distribution typical application is the creation of an after-sales analysis, which is expected by the IT to detect the systems that consolidates and analyzes, thus let himself be "predictive analytics" make forecasts for the future buying behavior of customers. Also in the production and in many other areas are increasingly the need developed for a variety of applications in the area of Big Data Analytics.

Terra Incognita

Essentially, three goals can recognize those who are persecuted by departments attended:

1. The optimization of existing business processes

2. The development of new products and services

3. The chance of an enhanced service orientation


As companies often do not know at the time of request, how an application is developed, are the IT department prior to insufficient information on how it is to align the infrastructure and what scalability is required. Nevertheless, from the department a sustainable solution "for all cases" expected.


Challenges conventional server and storage configuration under Hadoop

When using particular in Hadoop environment standard servers are often connected in a shared-nothing architecture to a cluster. Depending on the application, however, partly different server and workload profiles are required. So often arise dedicated infrastructure for each use case and faculty. The result is that completely different workloads stand side by side as stand-alone solutions

If you want to use, for example, data in the Analytics field that have been edited in NoSQL, they need to be copied between systems. This increases complexity and presents a challenge for the system. In addition, any interpretation of the large cluster collides with cheap hardware quickly reach their limits. Demands on CPU and memory have changed greatly through the fast-paced development of Hadoop eco-system. In addition, the administration as well as maintenance of the individual point solutions is steadily increasing.

A modern reference architecture

A modern approach to a reference architecture based on Hadoop 2.x comes from HP. The server and storage layer are separated and managed individually. All data is stored in a large storage pool (Data Lake) and the performance so constructed that she covers NoSQL, Hadoop and Analytics and can be accessed over to all data. This complexity is reduced extremely, additionally less data center floor space, less power and cooling required.

The advantage of this configuration is that it allows asynchronous and thus a demand-scaling. This means you do not know in detail where the application is developed and you can store the data in unstructured Data Lake in advance.

Both workloads and memory can be assigned an optimized node. The user depending on the needs of the data Lake from the beginning can be a suitable size and later enlarged as desired. The same is true for the CPU area, which can be dimensioned dynamically at higher analysis requirements. If the requirements change, so can always be adjusted.

Increased availability and faster processing

The advantage of asynchronous scaling is that multiple workloads can run on an infrastructure. A field accepts the refueling data, the other at the same time the analysis. This allows so-called workload tiering, which means it can different workloads dedicated cluster resources are allocated. Here, a standard Hadoop system be used. A virtualization is not required.

This modern reference architecture increases the availability of data and the speed of processing. It allows more flexibility and gives users who cannot accurately determine the use case in advance, more room for development.

Thus, it is suitable for both small and medium businesses, as well as for the enterprise sector to develop new business models cost-effectively and safely. The integrated solution from the server over the network and storage software is individual requirements for Big Data solutions future-oriented requirements. She's like an engine for an SUV, which can be used for versatile purposes.

No comments:

Post a Comment