TECHNICAL DEFINITIONS

(The Datawarehousing.com industry standard glossary of terms)

 

These definitions belong to Datawarehousing.com . By providing their definitions, this page is intended to provoke interest in, and point readers to the Data Warehousing.com website.��

 

A

Agent

An application that searches data and sends an alert when a certain situation occurs. (See ALERT)

 

Aggregate data

Individual data items, data groups, arrays, tables etc. that can be assembled to form a whole.

 

Alerts and Alarms

Messages sent automatically by a computer system when an AGENT identifies a certain situation. For example, if stock of an item in a warehouse drops to a certain level, key personnel can be immediately informed.

 

Algorithm

A set of statements organized to solve a problem in a finite number of steps

 

Analytical Processing

The usage of the computer to produce an analysis for management decision, usually involving trend analysis, drill down analysis, demographic analysis, profiling, etc.

 

Architecture Phase

The establishment of the framework, scope and standards and procedures for a data warehouse at the enterprise level.

 

Atomic level data

Data with the lowest level of granularity. Atomic level data sits in a data warehouse and is time variant (i.e., accurate as of some moment in time now passed).

 

Attribute

A property or characteristic of an application entity. For example, the attributes of an EMPLOYEE entity in a business application may be:
IDFirstname
Lastname
Job_Title
Email_ID
An attribute usually represents a column in a table in a relational database, or a field in a file.

 

Audit Trail

Recording of any changes made to specific data. Details can include date and time of change, how the change was detected, reason for the change and before-and-after data values.



  B

Binary Search

A dichotomizing search with steps in which the sets of remaining items are partioned into two equal parts.

 

Bit Map

A specialized form of an index indicating the existence or non-existence of a condition for a group of blocks or records.

 

Bus

The hardware connection that allows data to flow from one component to another.

 

Business Intelligence Tools

Software that allows business users to see and use large amounts of complex data.



  C

Canonical model

A data model that represents the inherent structure of data without regard to either individual use or hardware or software implementation.



  c

cardinality

no of unique rows divided by total no of columns



  C

Cell

A single point in a CUBE.

 

Conceptual Schema

A consistent collection of data structures expressing the data needs of the organization. This schema is a comprehensive, base level, and logical description of the environment in which an organization exists, free of physical structure and application system considerations.

 

Condensation

The process of reducing the volume of data managed without reducing the logical consistency of the data.

 

Connector

A symbol used to indicate that one occurrence of data has a relationship with another occurrence of data. Connectors are used in conceptual data base design and can be implemented hierarchically, relationally, in an inverted fashion, or by a network.

 

Contention

Tthe condition that occurs when two or more programs try to access the same data at the same time.

 

Cooperative Processing

Tthe ability to distribute resources (programs, files and data bases) across the network.

 

Corporate Data

All the databases of the company.This includes legacy systems,old and new transaction systems,general business systems,client/server databases,data warehouses and data marts.

 

Corporate Information Warehouse (CIF)

The architectural framework that houses the ODS, data warehouse, data marts, i/t interface, and the operational environment. The cif is held together logically by metadata and physically by a network such as the Internet.



  C

Cube � (also Hypercube, Multi-dimensional Cube)

The fundamental structure for information in an OLAP system. A structure that stores multi-dimensional information, having one CELL for each possible combination of dimensions.



  D

Data

Facts, concepts, or instructions that a computer records, stores and processes. Used in conjunction with INFORMATION SYSTEMS, �raw data� is organized in such a way that people can understand the results.

 

Data Cleansing

Removing errors and inconsistencies from data being inported to a data warehouse.

 

Data Dictionary

a software tool for recording the definition of data, the relationship of one category of data to another, the attributes and keys of groups of data, and so forth.

 

Data Driven Development

the approach to development that centers around identifying the commonality of data through a data model and building programs that have a broader scope than the immediate application.

 

Data Driven Process

a process whose resource consumption depends on the data on which it operates.

 

Data Mart

A department-specific data warehouse.
A) Independent � fed from legacy systems within the department
B) Dependent � fed from the enterprise data warehouse (preferred)

 

Data Mining

The process of finding hidden patterns and relationships in data. For instance, a consumer goods company may track 200 variables about each consumer. There are scores of possible relationships among the 200 variables. Data mining tools will identify the significant relationships.

 

Data Scrubbing

Removing errors and inconsistencies from data being imported into a data warehouse.

 

Data Transformation

The modification or alteration of data as it is being moved into the data warehouse.

 

Data Type

A data type defines the type of data stored in a specific database column, such as date, numeric or character data. Significant differences in data types exist between different platforms� databases.

 

Data Warehouse

A data warehouse is a subject oriented, integrated, non volatile, time variant collection of data. The data warehouse contains atomic level data and summarized data specifically structured for querying and reporting.

 

Data Warehousing

An enterprise-wide implementation that replicates data from the same publication table on different servers/platforms to a single subscription table. This implementation effectively consolidates data from multiple sources.

 

Database Schema

The logical and physical definition of a database structure.

 

Date/Time Stamp

A stamp added by an application that identifies a task or activity by the date and time it was initiated and/or completed. This can appear as part of a transaction log, message queue content in job logs.

 

Decentralized Database

A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area.

 

Decentralized Warehouse

A remote data source that users can query/access via a central gateway that provides a logical view of corporate data in terms that users can understand. The gateway parses and distributes queries in real time to remote data sources and returns result sets back to users.

 

Decision Support Systems (DSS)

Software that supports exception reporting, stop light reporting, standard repository, data analysis and rule-based analysis. A database created for end-user ad-hoc query processing.

 

Denormalization

the technique of placing normalized data in a physical location that optimizes the performance of the system.

 

Derived Data

Data whose values are determined by equations or algorithms.

 

Dimension

A Dimension is typically a qualifiable and text value, such as a region, product line, and includes date values. It defines the secondary headings or labels that make up the body of the report. Each of the dimensions is repeated within each group. Usually, you use items containing text values (for example, Year or item type) for table dimensions. For example, if you select Item Type to be your table dimension, Item Type is a dimension within each group header. Under the dimension "Item Type," appears the name of each kind of item (for example, CD ROM, or HARD Drive). and corresponds to the . A fact is an quantifiable value, such amount of sales, budget or revenue.

 

Drill Down/Up

The ability to move between levels of the hierarchy when viewing data with multiple levels.
A) Drill down � changing a view to a greateer level of detail
B) Drill up � changing a view to a greater level of aggregation.



  E

EDI (Electronic Data Interchange)

is a standard format for exchanging business data.

 

Encryption

the transformation of data from a recognizable format to a form unrecognizable without the algorithm used for the transformation.

 

ETL (Extract, Transform and Load)

ETL refers to the process of getting data out of one data store (extract), modifying it (transfer), and inserting it into a different data store (load).

 

ETT

ETL is sometimes referred as ETT- Extraction, Transformation and Transportation. It is a series of batch interface between the systems.

 

Executive/Enterprise Information Systems (EIS)

Tools programmed to provide canned reports or briefing books to top-level executives. They offer strong reporting and drill-down capabilities. Today these tools allow ad-hoc querying against a multi-dimensional database, and most offer analytical applications along functional lines such as sales or financial analysis. (Also known as Decision Support System.)

 

Extendibility

The ability to easily add new functionality to existing services without major software rewrites or without redefining the basic architecture.

 

External Schema

a logical description of a user's method of organizing and structuring data.



  F

Fact Table

The tables which are extracted from heterogeneous sources and used in the Data Warehouse

 

Factless Fact

A fact table without any metrics in it

 

Flat File

a collection of records containing no data aggregates, nested repeated data items, or groups of data items.

 

Functional Decomposition

the division of operations into hierarchical functions that form the basis for procedures.



  G

 

Global Business Models

Provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models. This type of data warehouse is difficult to build because it requires users from different divisions to come together to define a common data model for the warehouse.

 

Granularity

The level of detail of the data stored in a data warehouse.



  H

Heterogeneous Environment

Within an enterprise, a network of different types of servers and databases.

 

Heuristic

the mode of analysis in which the next step is determined by the results of the current step of analysis.

 

Hierarchy

The organization of data into a logical tree structure.

 

Homogeneous Environment

Within an enterprise, a network consisting of the same type of servers and databases.

 

Horizontal Distribution

the splitting of a table across different sites by rows. With horizontal distribution rows of a single table residing at different sites in a distributed data base network.

 

Hub and Spoke Configuration

A configuration of interconnected systems where a single system (the hub) acts as the central point for exchanging data with and between the other systems (spokes).

 

Huffman Code

a code for data compaction in which frequently used characters are encoded with fewer bits than infrequently used characters.

 

HyperCube

See CUBE.



I

indexing

fastest searching records

Information

Data that has been processed in such a way that it can increase the knowledge of the person who receives it.

 

Information Systems Architecture

The authoritative definition of the business rules, systems structure, technical framework, and product backbone for business information systems.

 

Instance

a set of values representing a specific entity belonging to a particular entity type.

 

Integrity

a set of values representing a specific property of a data base that ensures that the data contained in the data base in accurate and consistent as possible.

 

Intelligent Data Base

a data base that contains shared logic as well as shared data and automatically invokes that logic when the data base is accessed. Logic, constraints, and controls relating to the use of data are represented in an intelligent data model.

 

Interleaved Data

data from different tables mixed into a simple table space where is commonality of physical colocation based on a common key value.

 

Iterative Analysis

the mode of processing in which the next step of processing depends on the results obtained by the existing step in execution.



  J

Join

an operation that takes two relations as operands and produces a new relation by concatenating the tuples and matching the corresponding columns when a stated condition holds between the two.



  K

Key Compression

a technique for reducing the number of bits in keys; used in making indexes occupy less space.



  L

Latency

is often used to mean any delay or waiting that increases real or perceived response time beyond the response time desired.

 

Lockup

the event that occurs when update is done against a data base record and the transaction has not yet reached a commit point.

 

Logging

the automatic recording of data with regard to the access of the data, the updates to the data, etc.

 

Logical Representation

a data view or description that does not depend on a physical storage device or a computer program.



  M

Main Storage Data Base (msdb)

a data base that resides entirely in main storage. Such data bases are very fast to access, but require special handling at the time of update. MSDB's can only manage a small amounts of data.

 

Maximum Transaction Arrival Rate (MTAR)

the rate of arrival of transactions at the moment of peak period processing.

 

MDDB

Multi Dimensional DataBase

 

Metadata or Meta Data

Metadata is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems.

 

Metadata Synchronization

The process of consolidating, relating and synchronizing data elements with the same or similar meaning from different systems. Metadata synchronization joins these differing elements together in the data warehouse to allow for easier access.

 

Metalanguage

a language used to specify other languages.

 

Methodology

A system of principles, practices, and procedures applied to a specific branch of knowledge.

 

Mid-Tier Data Warehouses

To be scalable, any particular implementation of the data access environment may incorporate several intermediate distribution tiers in the data warehouse network. These intermediate tiers act as source data warehouses for geographically isolated sharable data that is needed across several business functions.

 

Middleware

A communications layer that allows applications to interact across hardware and network environments.

 

Migration

the process by which frequently used items of data are moved to more readily accessible areas of storage and infrequently used items of data are moved to less readily accessible areas of storage.

 

Multilist Organization

a chained file organization in which the chains are divided into fragments and each fragment is indexed. This organization of data permits faster access to the data.



  N

Natural Join

a join in which the redundant logic components generated by the join are removed.

 

Network Model

a data model that provides data relationships on the basis of records or groups of records (ie. sets) in which one record is designated as the set owner, and a single member record can belong to one or more sets.

 

Nonprocedural Language

syntax that directs the computer as to what to do, not how to do it. Typical nonprocedural languages include RAMIS,FOCUS, NOMAD, and SQL.

 

Normalization

Normalization is a step-by-step process of removing redundancies and dependencies of attributes in a data structure. The condition of the data at completion of each step is described as a "normal form." Thus, when normalizing we talk about data as being in the first normal form, the second normal form, etc. Normalization theory identifies normal forms up to at least the fifth normal form, plus an adjunct form known as Boyce-Codd Normal Form (BCNF). The first three forms are sufficient to meet the needs of warehousing data models.



  O

OLAP (On-Line Analytical Processing)

Describes the systems used not for application delivery, but for analyzing the business, e.g., sales forecasting, market trends analysis, etc. These systems are also more conducive to heuristic reporting and often involves multidimensional data analysis capabilities.

 

OLTP (OnLine Transaction Processing)

Describes the activities and systems associated with a company's day-to-day operational processing and data (order entry, invoicing, general ledger, etc.).

 

Operational Data Store (ODS)

the form that data warehouse takes in the operational environment. Operational data stores can be updated, do provide rapid and consistent time, and contain only a limited amount of historical data.

 

Overflow

the condition in which a record or a segment cannot be stored in its home because the address is already occupied.



  P

Parallel Data Organisation

an arrangement of data in which the data is spread over independent storage devices and is managed independently.

 

Parallel Search Storage

a storage device in which one or more parts of all storage locations are queried simultaneously for a certain condition or under certain parameters.

 

Parsing

the algorithm that translates syntax into meaningful machine instructions. Parsing determines the meaning of statements issued in the data manipulation language.

 

Partition

a segmentation technique in which data is divided into physically different units. Partioning can be done at the application or the system level.

 

Performance

the length of time from the moment a request is issued until the first of the results of the request are received.

 

Periodic Discrete Data

a measurement or description of data taken at a regular time interval.

 

Prefix Data

data in a segment or a record used exclusively for system control, usually unavailable to the user.

 

Primitive Data

data whose existence depends on only a single occurrence of a major subject area of the enterprise.

 

Privilege Descriptor

a persistent object used by a DBMS to enforce constraints on operations.

 

Projection

an operation that takes one relation as an operand and returns a second relation that consists of only the selected attributes or columns, with duplicate rows eliminated.

 

Proposition

a statement about entities that asserts or denies that some condition holds for those entities.



  Q

Query Language

a language that enables an end user to interact directly with a DBMS to retrieve and possibly modify data managed under the DBMS.



  R

Record

an aggregation of values of d