TECHNICAL DEFINITIONS
(The Datawarehousing.com
industry standard glossary of terms)
These definitions belong to Datawarehousing.com
. By providing their definitions, this page is intended to provoke
interest in, and point readers to the Data Warehousing.com website.��
An application that searches
data and sends an alert when a certain situation occurs. (See ALERT)
Aggregate
data
Individual data items, data groups,
arrays, tables etc. that can be assembled to form a whole.
Alerts and Alarms
Messages sent automatically by a computer
system when an AGENT identifies a certain situation. For example, if stock of
an item in a warehouse drops to a certain level, key
personnel can be immediately informed.
Algorithm
A set of statements organized to solve a
problem in a finite number of steps
Analytical
Processing
The usage of the
computer to produce an analysis for management decision, usually involving
trend analysis, drill down analysis, demographic analysis, profiling,
etc.
Architecture
Phase
The establishment of the framework, scope
and standards and procedures for a data warehouse at the enterprise level.
Atomic
level data
Data with the lowest level of
granularity. Atomic level data sits in a data
warehouse and is time variant (i.e., accurate as of some moment in time now
passed).
Attribute
A property or characteristic of
an application entity. For example, the attributes of
an EMPLOYEE entity in a business application may be:
IDFirstname
Lastname
Job_Title
Email_ID
An attribute usually represents a column in a table in a relational database,
or a field in a file.
Audit Trail
Recording of any changes made to specific
data. Details can include date and time of change, how the change was detected,
reason for the change and before-and-after data values.
A dichotomizing search with steps in which
the sets of remaining items are partioned into two
equal parts.
Bit Map
A specialized form of an index
indicating the existence or non-existence of a condition for a group of blocks
or records.
Bus
The hardware connection that
allows data to flow from one component to another.
Business
Intelligence Tools
Software that allows business
users to see and use large amounts of complex data.
C
A data model that represents the
inherent structure of data without regard to either individual use or hardware
or software implementation.
c
no of unique rows divided by total no of columns
C
A single point in a CUBE.
Conceptual
Schema
A consistent collection of data structures
expressing the data needs of the organization. This schema is a comprehensive,
base level, and logical description of the environment in which an organization
exists, free of physical structure and application system considerations.
Condensation
The process of reducing the volume of data
managed without reducing the logical consistency of the data.
Connector
A symbol used to indicate that one
occurrence of data has a relationship with another occurrence of data.
Connectors are used in conceptual data base design and can be implemented
hierarchically, relationally, in an inverted fashion, or by a network.
Contention
Tthe condition that occurs when two or more programs try
to access the same data at the same time.
Cooperative
Processing
Tthe ability to distribute resources (programs, files and
data bases) across the network.
Corporate
Data
All the databases of the company.This includes legacy systems,old
and new transaction systems,general business systems,client/server databases,data
warehouses and data marts.
Corporate
Information Warehouse (CIF)
The architectural framework that houses
the ODS, data warehouse, data marts, i/t interface,
and the operational environment. The cif is held
together logically by metadata and physically by a network such as the
Internet.
C
Cube � (also Hypercube,
Multi-dimensional Cube)
The fundamental structure for information
in an OLAP system. A structure that stores
multi-dimensional information, having one CELL for each possible combination of
dimensions.
Facts, concepts, or instructions
that a computer records, stores and processes. Used in conjunction with INFORMATION SYSTEMS, �raw data� is organized in
such a way that people can understand the results.
Data
Cleansing
Removing errors and
inconsistencies from data being inported to a data
warehouse.
Data
Dictionary
a software tool for recording the definition of data, the relationship of
one category of data to another, the attributes and keys of groups of data, and
so forth.
Data Driven
Development
the approach to development that centers around identifying the commonality
of data through a data model and building programs that have a broader scope than
the immediate application.
Data Driven
Process
a process whose resource consumption depends on the data on which it
operates.
Data Mart
A department-specific data
warehouse.
A) Independent � fed from legacy systems within the department
B) Dependent � fed from the enterprise data warehouse (preferred)
Data Mining
The process of finding hidden
patterns and relationships in data. For instance, a
consumer goods company may track 200 variables about each consumer. There are
scores of possible relationships among the 200 variables. Data mining tools
will identify the significant relationships.
Data
Scrubbing
Removing errors and
inconsistencies from data being imported into a data warehouse.
Data
Transformation
The modification or alteration of data as
it is being moved into the data warehouse.
Data Type
A data type defines the type of data
stored in a specific database column, such as date, numeric or character data.
Significant differences in data types exist between different platforms�
databases.
Data
Warehouse
A data warehouse is a subject oriented,
integrated, non volatile, time variant collection of data. The data warehouse
contains atomic level data and summarized data specifically structured for
querying and reporting.
Data
Warehousing
An enterprise-wide
implementation that replicates data from the same publication table on
different servers/platforms to a single subscription table. This implementation effectively consolidates data from multiple
sources.
Database
Schema
The logical and physical
definition of a database structure.
Date/Time
Stamp
A stamp added by an application that
identifies a task or activity by the date and time it was initiated and/or
completed. This can appear as part of a transaction log, message queue content
in job logs.
Decentralized
Database
A centralized database that has been
partitioned according to a business or end-user defined subject area. Typically
ownership is also moved to the owners of the subject area.
Decentralized
Warehouse
A remote data source that users can
query/access via a central gateway that provides a logical view of corporate
data in terms that users can understand. The gateway
parses and distributes queries in real time to remote data sources and returns
result sets back to users.
Decision
Support Systems (DSS)
Software that supports exception
reporting, stop light reporting, standard repository, data analysis and
rule-based analysis. A database created for end-user ad-hoc query processing.
Denormalization
the technique of placing normalized data in a physical location that
optimizes the performance of the system.
Derived
Data
Data whose values are determined
by equations or algorithms.
Dimension
A Dimension is typically a qualifiable and text value, such as a region, product line,
and includes date values. It defines the secondary headings or labels that make
up the body of the report. Each of the dimensions is repeated within each
group. Usually, you use items containing text values (for example, Year or item
type) for table dimensions. For example, if you select Item Type to be your
table dimension, Item Type is a dimension within each group header. Under the
dimension "Item Type," appears the name of each kind of item (for
example, CD ROM, or HARD Drive). and corresponds to
the . A fact is an quantifiable value, such amount of
sales, budget or revenue.
Drill
Down/Up
The ability to move between
levels of the hierarchy when viewing data with multiple levels.
A) Drill down � changing a view to a greateer level
of detail
B) Drill up � changing a view to a greater level of aggregation.
EDI (Electronic Data Interchange)
is a standard format for exchanging business data.
Encryption
the transformation of data from a recognizable format to a form
unrecognizable without the algorithm used for the transformation.
ETL (Extract, Transform and
Load)
ETL refers to the process of getting data
out of one data store (extract), modifying it (transfer), and inserting it into
a different data store (load).
ETT
ETL is sometimes referred as ETT-
Extraction, Transformation and Transportation. It is a series of batch
interface between the systems.
Executive/Enterprise
Information Systems (EIS)
Tools programmed to provide canned reports
or briefing books to top-level executives. They offer strong reporting and
drill-down capabilities. Today these tools allow ad-hoc querying against a
multi-dimensional database, and most offer analytical applications along
functional lines such as sales or financial analysis. (Also
known as Decision Support System.)
Extendibility
The ability to easily add new
functionality to existing services without major software rewrites or without
redefining the basic architecture.
External
Schema
a logical description of a user's method of organizing and structuring
data.
The tables which are extracted from
heterogeneous sources and used in the Data Warehouse
Factless Fact
A fact table without any metrics in it
Flat File
a collection of records containing no data aggregates, nested repeated
data items, or groups of data items.
Functional
Decomposition
the division of operations into hierarchical functions that form the basis
for procedures.
Global
Business Models
Provides access to information scattered throughout
an enterprise under the control of different divisions or departments with
different databases and data models. This type of data warehouse is difficult
to build because it requires users from different divisions to come together to
define a common data model for the warehouse.
Granularity
The level of detail of the data
stored in a data warehouse.
Within an enterprise, a network of
different types of servers and databases.
Heuristic
the mode of analysis in which the next step is determined by the results of
the current step of analysis.
Hierarchy
The organization of data into a
logical tree structure.
Homogeneous
Environment
Within an enterprise, a network
consisting of the same type of servers and databases.
Horizontal
Distribution
the splitting of a table across different sites by rows. With
horizontal distribution rows of a single table residing at different sites in a
distributed data base network.
Hub and
Spoke Configuration
A configuration of interconnected systems
where a single system (the hub) acts as the central point for exchanging data
with and between the other systems (spokes).
Huffman
Code
a code for data compaction in which frequently used characters are
encoded with fewer bits than infrequently used characters.
HyperCube
See CUBE.
I
fastest searching records
Data that has been processed in
such a way that it can increase the knowledge of the person who receives it.
Information
Systems Architecture
The authoritative definition of the
business rules, systems structure, technical framework, and product backbone
for business information systems.
Instance
a set of values representing a specific entity belonging to a particular
entity type.
Integrity
a set of values representing a specific property of a data base that
ensures that the data contained in the data base in accurate and consistent as
possible.
Intelligent
Data Base
a data base that contains shared logic as well as shared data and
automatically invokes that logic when the data base is accessed. Logic,
constraints, and controls relating to the use of data are represented in an
intelligent data model.
Interleaved
Data
data from different tables mixed into a simple table space where is commonality
of physical colocation based on a common key value.
Iterative
Analysis
the mode of processing in which the next step of processing depends on the
results obtained by the existing step in execution.
an operation that takes two relations as operands and produces a new
relation by concatenating the tuples and matching the
corresponding columns when a stated condition holds between the two.
a technique for reducing the number of bits in keys; used in making
indexes occupy less space.
is often used to mean any delay or waiting that increases real or
perceived response time beyond the response time desired.
Lockup
the event that occurs when update is done against a data base record and
the transaction has not yet reached a commit point.
Logging
the automatic recording of data with regard to the access of the data, the
updates to the data, etc.
Logical
Representation
a data view or description that does not depend on a physical storage
device or a computer program.
a data base that resides entirely in main storage. Such data bases are
very fast to access, but require special handling at the time of update. MSDB's can only manage a small amounts of data.
Maximum
Transaction Arrival Rate (MTAR)
the rate of arrival of transactions at the moment of peak period
processing.
MDDB
Multi Dimensional DataBase
Metadata or
Metadata is data about data. Examples of
metadata include data element descriptions, data type descriptions,
attribute/property descriptions, range/domain descriptions, and process/method
descriptions. The repository environment encompasses all corporate metadata
resources: database catalogs, data dictionaries, and navigation services.
Metadata includes things like the name, length, valid values, and description
of a data element. Metadata is stored in a data dictionary and repository. It
insulates the data warehouse from changes in the schema of operational systems.
Metadata
Synchronization
The process of consolidating,
relating and synchronizing data elements with the same or similar meaning from
different systems. Metadata synchronization joins these
differing elements together in the data warehouse to allow for easier access.
Metalanguage
a language used to specify other languages.
Methodology
A system of principles, practices, and
procedures applied to a specific branch of knowledge.
Mid-Tier
Data Warehouses
To be scalable, any particular
implementation of the data access environment may incorporate several intermediate
distribution tiers in the data warehouse network. These intermediate tiers act
as source data warehouses for geographically isolated sharable data that is
needed across several business functions.
Middleware
A communications layer that
allows applications to interact across hardware and network environments.
Migration
the process by which frequently used items of data are moved to more
readily accessible areas of storage and infrequently used items of data are
moved to less readily accessible areas of storage.
Multilist
Organization
a chained file organization in which the chains are divided into
fragments and each fragment is indexed. This organization of data permits
faster access to the data.
a join in which the redundant logic components generated by the join are
removed.
Network
Model
a data model that provides data relationships on the basis of records or
groups of records (ie. sets) in which one record is
designated as the set owner, and a single member record can belong to one or
more sets.
Nonprocedural
Language
syntax that directs the computer as to what to do, not how to do it. Typical
nonprocedural languages include RAMIS,FOCUS, NOMAD,
and SQL.
Normalization
Normalization is a step-by-step process of
removing redundancies and dependencies of attributes in a data structure. The
condition of the data at completion of each step is described as a "normal
form." Thus, when normalizing we talk about data as being in the first
normal form, the second normal form, etc. Normalization theory identifies
normal forms up to at least the fifth normal form, plus an adjunct form known
as Boyce-Codd Normal Form (BCNF). The first three
forms are sufficient to meet the needs of warehousing data models.
OLAP (On-Line Analytical Processing)
Describes the systems used not for
application delivery, but for analyzing the business, e.g., sales forecasting,
market trends analysis, etc. These systems are also more
conducive to heuristic reporting and often involves multidimensional
data analysis capabilities.
OLTP (OnLine
Transaction Processing)
Describes the activities and
systems associated with a company's day-to-day operational processing and data
(order entry, invoicing, general ledger, etc.).
Operational Data Store (ODS)
the form that data warehouse takes in the operational environment.
Operational data stores can be updated, do provide rapid and consistent time,
and contain only a limited amount of historical data.
Overflow
the condition in which a record or a segment cannot be stored in its home
because the address is already occupied.
an arrangement of data in which the data is spread over independent
storage devices and is managed independently.
Parallel
Search Storage
a storage device in which one or more parts of all storage locations are
queried simultaneously for a certain condition or under certain parameters.
Parsing
the algorithm that translates syntax into meaningful machine instructions. Parsing
determines the meaning of statements issued in the data manipulation language.
Partition
a segmentation technique in which data is divided into physically
different units. Partioning can be done at the
application or the system level.
Performance
the length of time from the moment a request is issued until the first of
the results of the request are received.
Periodic
Discrete Data
a measurement or description of data taken at a regular time interval.
Prefix Data
data in a segment or a record used exclusively for system control, usually
unavailable to the user.
Primitive
Data
data whose existence depends on only a single occurrence of a major subject
area of the enterprise.
Privilege
Descriptor
a persistent object used by a DBMS to enforce constraints on operations.
Projection
an operation that takes one relation as an operand and returns a second
relation that consists of only the selected attributes or columns, with
duplicate rows eliminated.
Proposition
a statement about entities that asserts or denies that some condition
holds for those entities.
a language that enables an end user to interact directly with a DBMS to
retrieve and possibly modify data managed under the DBMS.
an aggregation of values of d