Friday, November 7, 2008

Data Explosion

Data Explosion is a term given to express the increase in stored data when using MultiDimensional Database Systems. The amount of data stored in these systems is often a multiple of the size of the raw data entered into the systems from the existing operational databases. Hence, the data undergoes an “Explosion” to several times (or many times) its original size.

Data has become the driving force behind businesses today, and as such, it is a highly valued asset. Because of the nature of business that is going towards the internet where huge bulks of data are shared and gathered every second, generating and storing of large volumes of data will certainly make a data warehouse reach a critical mass.
Today's data warehouses are typically implemented using multidimensional databases which are data aggregating systems. In other words, these databases combine data from a variety of data sources. Multidimensional databases also offer networks, hierarchies, arrays and other data formats and which may be difficult to model in Structure Query Langauge (SQL). Also, multidimensional databases come with high degree of flexibility in the definition of their dimensions units, and unit relationships, regardless of data format. Because of the nature to handle huge capacities of data, these databases are expected to handle imminent data explosions.

Data duplication is a practice of many data warehouses wherein the company tries to maintain several back up copies of important and critical data or implement data mirroring so that an added assurance can be had against data loss in case of unforeseen physical database failure. Disaster recover plans include getting data from the duplicated copies to be stored in an alternate location. Another use of data duplication is in application development and testing environments where original production database keep several clones.

Despite the reality of data explosion in this internet age, there are several solutions to hand this. New computing storage technologies and comprehensive software applications have made handling data explosion easier by providing effective mechanisms for creating, collecting and storing all kinds of data.
There are many ways to manage data. They can be stored in structured relational databases or in semi structured files systems as in email files. They can also be stored in unstructured fixed context as in documents and graphics files.
Data growth exploding across industries have brought about software solutions like customer relationship management (CRM, enterprise resource planning (ERP) and other mission critical applications which effectively captures, create and processing exponentially increasing bulks of data to keep the operations of a business profitable against competitors. Most companies depend on high availability of data preferably 24 hours a day, seven days as week 365 days a year. They also try to implement fast network connections to handle data sharing and transmission. In today's setting, it would difficult to find an organization - whether from healthcare, pharmaceutical, insurance, financial services, telecommunications, retail, manufacturing and many other industries – that do not gather and utilize large quantities of data for its business decision making policy.

An example of one sector that needs bulk of data and may be a candidate for data explosion is the federal government where thousands of tax returns are being filed into the Internal Revenue Service systems annually. Each succeeding year, the data grows along with new registrants.
Dealing with data explosions means spending a lot of money. A company needs to purchase high powered computer systems that with large capacity hard disks and random access memory. The network infrastructure should also be able to handle sending and receivable large volumes of data and transmission may be happening very frequently.

Thursday, October 23, 2008

Consumer

Consumer refers to an individual, group, or application that accesses data or information in a data warehouse.

A data warehouse is a very complex database system where every single day, data is gathered, transformed, loaded and shared across multiple platforms. These data come from different sources, whether from different geographic locations with disparate physical database server infrastructure, or from different divisions within the same physical location of the business organization.

Data in the warehouse are accessed by many data consumers. In the internet, people from all walks of life are all data consumers. Young people and teenagers are primary data consumers of multimedia like compressed video files and music files. They also consume large digital photo data files. University professor may consume data related to research on their subject of expertise. Students are consumers of data pertaining to their records within their university.

Software applications are also big consumers of data. Business intelligence is a software application that gets all sorts of disparate from the warehouse. From these data, processes are execute within the business intelligence system so that data can be aggregated and formatted into statistical reports for the company to spot industry trends and patterns. Data consumers can benefit from business intelligence by making use of the intelligence report to decision making.

In the past, the day business organizations gathered data was through non-automated sources. In those times, businesses were lacking in computing resources to get the proper data analysis. Because of this, the decisions of businesses were mostly based on intuition.

But as advancement in information technology dawned, more and more complex data systems capable for of very fast processing and accurate reporting were developed and made available to many companies. But still problems existed in data collection because of the lack of infrastructure for data exchange as well as incompatibilities between systems and data consumers still have frustrations with bad data quality.
With the development of model relational database management system, data warehouses have become more and more efficient not just in gathering, extracting, transforming and loading data into the warehouse but in sharing the appropriate data to the consumers as well. Newer tools like Enterprise Application Integration have greatly increased speed of collecting and distributing data to consumers. Another tool, Online Analytical Processing or OLAP, has also helped in generating new reports which analyze data a lot faster and more efficient. Data warehouses now work closely with business intelligence systems in implementing the art and science of shifting through very large bulks of data, extracting the pertinent information from them and turning the information into knowledge for consumers to take benefit of.
Today's data warehouses service data consumers by executing queries. A query language is a form of communication between an end user and a database so that the computer system can understand. A single query may return thousands of data from the database and present the data to the requesting consumer in a report format for easy interpretation and analysis.
Sharing of data across the network should be given serious precaution. Not all data consumers are good people. In fact, there millions of malicious data consumers all over the internet and they will do all they can to illegally gain access to the database and get or steal private information including bank account details and credit card number. Every year, millions of dollars are lost from people's account because of these malicious data consumers. One way to prevent private data from being stolen is by using a secure connection with the user of encryption tools to conceal confidential data like passwords and credit card numbers.