Friday, November 7, 2008

Data Explosion

Data Explosion is a term given to express the increase in stored data when using MultiDimensional Database Systems. The amount of data stored in these systems is often a multiple of the size of the raw data entered into the systems from the existing operational databases. Hence, the data undergoes an “Explosion” to several times (or many times) its original size.

Data has become the driving force behind businesses today, and as such, it is a highly valued asset. Because of the nature of business that is going towards the internet where huge bulks of data are shared and gathered every second, generating and storing of large volumes of data will certainly make a data warehouse reach a critical mass.
Today's data warehouses are typically implemented using multidimensional databases which are data aggregating systems. In other words, these databases combine data from a variety of data sources. Multidimensional databases also offer networks, hierarchies, arrays and other data formats and which may be difficult to model in Structure Query Langauge (SQL). Also, multidimensional databases come with high degree of flexibility in the definition of their dimensions units, and unit relationships, regardless of data format. Because of the nature to handle huge capacities of data, these databases are expected to handle imminent data explosions.

Data duplication is a practice of many data warehouses wherein the company tries to maintain several back up copies of important and critical data or implement data mirroring so that an added assurance can be had against data loss in case of unforeseen physical database failure. Disaster recover plans include getting data from the duplicated copies to be stored in an alternate location. Another use of data duplication is in application development and testing environments where original production database keep several clones.

Despite the reality of data explosion in this internet age, there are several solutions to hand this. New computing storage technologies and comprehensive software applications have made handling data explosion easier by providing effective mechanisms for creating, collecting and storing all kinds of data.
There are many ways to manage data. They can be stored in structured relational databases or in semi structured files systems as in email files. They can also be stored in unstructured fixed context as in documents and graphics files.
Data growth exploding across industries have brought about software solutions like customer relationship management (CRM, enterprise resource planning (ERP) and other mission critical applications which effectively captures, create and processing exponentially increasing bulks of data to keep the operations of a business profitable against competitors. Most companies depend on high availability of data preferably 24 hours a day, seven days as week 365 days a year. They also try to implement fast network connections to handle data sharing and transmission. In today's setting, it would difficult to find an organization - whether from healthcare, pharmaceutical, insurance, financial services, telecommunications, retail, manufacturing and many other industries – that do not gather and utilize large quantities of data for its business decision making policy.

An example of one sector that needs bulk of data and may be a candidate for data explosion is the federal government where thousands of tax returns are being filed into the Internal Revenue Service systems annually. Each succeeding year, the data grows along with new registrants.
Dealing with data explosions means spending a lot of money. A company needs to purchase high powered computer systems that with large capacity hard disks and random access memory. The network infrastructure should also be able to handle sending and receivable large volumes of data and transmission may be happening very frequently.

Thursday, October 23, 2008

Consumer

Consumer refers to an individual, group, or application that accesses data or information in a data warehouse.

A data warehouse is a very complex database system where every single day, data is gathered, transformed, loaded and shared across multiple platforms. These data come from different sources, whether from different geographic locations with disparate physical database server infrastructure, or from different divisions within the same physical location of the business organization.

Data in the warehouse are accessed by many data consumers. In the internet, people from all walks of life are all data consumers. Young people and teenagers are primary data consumers of multimedia like compressed video files and music files. They also consume large digital photo data files. University professor may consume data related to research on their subject of expertise. Students are consumers of data pertaining to their records within their university.

Software applications are also big consumers of data. Business intelligence is a software application that gets all sorts of disparate from the warehouse. From these data, processes are execute within the business intelligence system so that data can be aggregated and formatted into statistical reports for the company to spot industry trends and patterns. Data consumers can benefit from business intelligence by making use of the intelligence report to decision making.

In the past, the day business organizations gathered data was through non-automated sources. In those times, businesses were lacking in computing resources to get the proper data analysis. Because of this, the decisions of businesses were mostly based on intuition.

But as advancement in information technology dawned, more and more complex data systems capable for of very fast processing and accurate reporting were developed and made available to many companies. But still problems existed in data collection because of the lack of infrastructure for data exchange as well as incompatibilities between systems and data consumers still have frustrations with bad data quality.
With the development of model relational database management system, data warehouses have become more and more efficient not just in gathering, extracting, transforming and loading data into the warehouse but in sharing the appropriate data to the consumers as well. Newer tools like Enterprise Application Integration have greatly increased speed of collecting and distributing data to consumers. Another tool, Online Analytical Processing or OLAP, has also helped in generating new reports which analyze data a lot faster and more efficient. Data warehouses now work closely with business intelligence systems in implementing the art and science of shifting through very large bulks of data, extracting the pertinent information from them and turning the information into knowledge for consumers to take benefit of.
Today's data warehouses service data consumers by executing queries. A query language is a form of communication between an end user and a database so that the computer system can understand. A single query may return thousands of data from the database and present the data to the requesting consumer in a report format for easy interpretation and analysis.
Sharing of data across the network should be given serious precaution. Not all data consumers are good people. In fact, there millions of malicious data consumers all over the internet and they will do all they can to illegally gain access to the database and get or steal private information including bank account details and credit card number. Every year, millions of dollars are lost from people's account because of these malicious data consumers. One way to prevent private data from being stolen is by using a secure connection with the user of encryption tools to conceal confidential data like passwords and credit card numbers.

Friday, September 26, 2008

Consumer

Consumer refers to an individual, group, or application that accesses data or



information in a data warehouse.

A data warehouse is a very complex database system where every single day, data is gathered, transformed, loaded and shared across multiple platforms. These data come from different sources, whether from different geographic locations with disparate physical database server infrastructure, or from different divisions within the same physical location of the business organization.

Data in the warehouse are accessed by many data consumers. In the internet, people from all walks of life are all data consumers. Young people and teenagers are primary data consumers of multimedia like compressed video files and music files. They also consume large digital photo data files. University professor may consume data related to research on their subject of expertise. Students are consumers of data pertaining to their records within their university.

Software applications are also big consumers of data. Business intelligence is a software application that gets all sorts of disparate from the warehouse. From these data, processes are execute within the business intelligence system so that data can be aggregated and formatted into statistical reports for the company to spot industry trends and patterns. Data consumers can benefit from business intelligence by making use of the intelligence report to decision making.

In the past, the day business organizations gathered data was through non-automated sources. In those times, businesses were lacking in computing resources to get the proper data analysis. Because of this, the decisions of businesses were mostly based on intuition.

But as advancement in information technology dawned, more and more complex data systems capable for of very fast processing and accurate reporting were developed and made available to many companies. But still problems existed in data collection because of the lack of infrastructure for data exchange as well as incompatibilities between systems and data consumers still have frustrations with bad data quality.
With the development of model relational database management system, data warehouses have become more and more efficient not just in gathering, extracting, transforming and loading data into the warehouse but in sharing the appropriate data to the consumers as well. Newer tools like Enterprise Application Integration have greatly increased speed of collecting and distributing data to consumers. Another tool, Online Analytical Processing or OLAP, has also helped in generating new reports which analyze data a lot faster and more efficient. Data warehouses now work closely with business intelligence systems in implementing the art and science of shifting through very large bulks of data, extracting the pertinent information from them and turning the information into knowledge for consumers to take benefit of.
Today's data warehouses service data consumers by executing queries. A query language is a form of communication between an end user and a database so that the computer system can understand. A single query may return thousands of data from the database and present the data to the requesting consumer in a report format for easy interpretation and analysis.
Sharing of data across the network should be given serious precaution. Not all data consumers are good people. In fact, there millions of malicious data consumers all over the internet and they will do all they can to illegally gain access to the database and get or steal private information including bank account details and credit card number. Every year, millions of dollars are lost from people's account because of these malicious data consumers. One way to prevent private data from being stolen is by using a secure connection with the user of encryption tools to conceal confidential data like passwords and credit card numbers.

Data Access and Data Access Tools

Data access is the process of entering a database to store or retrieve data. Data



Access Tools are end user oriented tools that allow users to build structured query language (SQL) queries by pointing and clicking on the list of table and fields in the data warehouse.

Thorough computing history, there have been different methods and languages already that were used for data access and these varied depending on the type of data warehouse. The data warehouse contains a rich repository of data pertaining to organizational business rules, policies, events and histories and these warehouses store data in different and incompatible formats so several data access tools have been developed to overcome problems of data incompatibilities.
Recent advancement in information technology has brought about new and innovative software applications that have more standardized languages, format, and methods to serve as interface among different data formats. Some of these more popular standards include SQL, OBDC, ADO.NET, JDBC, XML, XPath, XQuery and Web Services.
Structured Query Language is a computer language used in relational database management systems for retrieving and management of data. Although SQL has been developed to be a declarative query and data manipulation language, several vendors have created SQL DBMS and added their own procedural constructs, data types and other propriety features. SQL is standardized both by ANSI and ISO.
ODBC, which stands for Open Database Connectivity is a standard software application programming interface used for data management systems. Different computer languages can access data into different types and implementation of RDMS using the ODBC.
JDBC which stands for Java Database Connectivity can be to some degree the same as ODBC but is used for the Java programming language.
ADO.NET is a Microsoft proprietary software component for accessing data and data services. This is part of the Microsoft .Net framework. ADO stands for ActiveX Data Object.
XML stands for Extensible Markup Language is primarily a general purpose markup language. It is used to tag data so that sharing of structure data can be done through disparate systems across the internet or any network. This makes data of any format portable among different computer systems making XML one of the most used technologies in data warehousing. XML data can be queried using XQuery. This is almost semantically the same with SQL. XML Path Language is used to address portion of an XML document or other computing values like strings, Booleans, number and others based on any XML document.
Web services are software components that make possible the interoperability of machine to machine interaction over the internet. They are also commonly known as Web API that are accessed over the internet and execute on another remote system.
Many software vendors develop applications that have graphical user interface (GUI) tools so that even non programmers or non database administrators can build queries by just clicking the mouse. This GUI data access tools give users access via data access designer and data access viewer. With the data access designer, an end user can create complex databases even if he does not have intensive background. There templates available that are complete with design framework and sample data. With the data access viewer, the user can run and enter data and make changes and modifications and graphically see what see the commands without having to care the complex process happening in the background.
Data access tools making the tasks of database administrators a lot easier especially if the database being management is a large data warehouse. Having a graphical interface for data access gives the administrator a clearer status of the database because most programmatic query languages may look cryptic on the command line interface.

Tuesday, September 23, 2008

Combined Data

Combine data refers to concatenated individual facts to form another fact or to spot business patterns and trends.

In a company, a database contains millions of atomic data. Atomic data are data information that cannot be further broken down. For example, product name is an atomic data because it can longer be broken down but product raw material can be broken further into raw components depending on the good. An individual products sales is another atomic data.

But business organizations are not just interested in the minute details but they are also interested in the bigger picture. So, atomic data are combines and aggregated. When this is done, the company can already determine regional or total sales, total cost of goods, selling, general and administrative expenses, operating income, receivables, inventories, depreciation, amortization, debt, taxes and other figures.

Data mining, or taking data from the vast repository of data warehouse, uses combined data intensively. Software applications in conjunction with a good relational database management system have been developed to come up efficient ways to store and access data gathered over time and space for statistical analysis.
Data mining is technically described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases". It is a process involving large amounts of data being sorted to pick out relevant data from potentially non-relevant sources.
One of the biggest problems with data mining is level of data aggregation. For example, in an online survey by a private organization on the smoking trends of one region, it can be reflected that one data set contains records of those who currently smoke, another of those who have quit smoking and another data set contains records of those who have never smoked at all. The collection within each data set continuously rises as data from other sources keep coming. The traditional ways to combine these data are done with either using ad hoc method or putting each data set to certain model and them combining them.

Newer methods have been developed to efficiently combine data from various sources. Several data coming various tables and databases can now be combined into a single information table. One method used is a likelihood procedure which provides an estimation technique to address identifiable problems with aggregated data from some tables related to other tables.

Companies find valuable investment in technologies having business intelligence. Business intelligence combines the vast repository of business data warehouse with a software systems that analyzes and reports based on the gathered business data.

An example of a business intelligence technology is Online Analytical Processing, or OLAP. OLAP can quickly provide answers to analytic queries which are in nature multidimensional. It can combine data from different sources and generate reports for sales, marketing, financial forecasting, budgeting, and other related aspects of business. OLAP can make complex ad hoc and analytical queries on a database configured for OLAP use and the execution can be very fast given the fact that a server needs to answer many users at a time from different geographical locations. OLAP combines data to give a matrix output format with dimensions forming rows and columns representing the values and measures.
Combined data is also heavily found in data farming, a process where high performance computers or computing grids run simulations billion times across a large parameters and value space to come up a landscape of output data to be used for analyzing trends, insights and anomalies of many dimensions. It can be compared to a real plant farm and a harvest data comes after some time.