|
Database
Transformation,
Synthesis and Playback Technology
by
David M. Schwartz, CEO, ImaginOn, Inc. 24 October
1996
Synopsis
The current level of technology in database software
is hierarchical relational databases and the search
engines that access them. This technology works well
for most of the local data storage, retrieval and
presentation tasks undertaken by businesses and institutions.
Nonetheless, even in these "normal" areas problems
have always occurred at the boundaries between databases
and among databases of different formats. For tasks
outside the normal course of business management,
traditional database structures, methodologies and
tools rarely perform adequately. These areas of inadequate
performance include large chaotic networks such as
the Internet, real-time accessed databases such as
film, video and audio clips, and extremely complex,
re-entrant dynamic data structures such as large computer
programs.
ImaginOn
technology solves the access, retrieval and presentation
problems associated with problematic data structures
and systems such as those mentioned above by applying
the principles of database transformation, network
synthesis and automated adaptive playback.
Database transforms are the processes by which a source
database structure and its contents are converted
into an output database that has a specified relationship
to the source. Network synthesis is the process by
which a multidimensional network is created according
to a set of rules. Automated adaptive playback is
the process by which the data within any network structure
can be presented sequentially on a non-linear path
jointly determined by the user's preferences and the
rules of the process. To date, three distinct commercial
applications of ImaginOn technology have been identified:
interactive film authoring and playback, world wide
web data processing, and automated testing of computer
programs.
Background
The history of mainstream business computer database
technology is the story of the transition from flat-file
databases to relational databases. In the engineering
community, hierarchical databases of both flat and
relational types have been utilized in specialty applications
like computer aided design and project management.
Flat-file databases consist simply of data entries
in the form of records containing a set of fields
delineated from one another by commas (or other character).
Records are either separated by an end-of-record character,
or by the implied boundary of a fixed record length.
Flat-file databases are accessed by reference to the
record number itself or by searching for a user-defined
string (set of characters) within the database, which
will then return the ID numbers of all records that
contain the string.
Relational
databases represent an improvement over flat-file
databases in that the database is addressable down
to the field level. This provides powerful sorting
capabilities to the user. A cross-section of a relational
database can be retrieved based on attributes within
a specific field or by combinatorial logical (Boolean)
operations on all fields. For example, in the California
DMV database, finding all the families with the surname
"Jones" who own 1972 Ford sedans is a simple matter
of sorting the database for "Jones and Ford and sedan
and 1972".
Hierarchical
relational databases have the additional feature of
"nesting". Nesting refers to a multi-level aspect
of the database. In computer aided design, the top
level of a design is frequently a simple block diagram
showing the main sections of the design, such as CPU,
Memory, I/O, Peripherals, and Power Supply. The connections
to and from each block are indicated. The next level
in the hierarchy is the design of each section itself.
Then, within each section there may be custom integrated
circuits that are themselves the subject of another
level down in the hierarchy. With a hierarchy, the
user gains the ability to access data specific to
the level of detail under investigation, without the
additional detail contained in the next level down.
The limits of a database search can then be defined
in terms of depth as well as relationships within
a level.
Independent
of database type and structure, the problem of information
access and retrieval has typically been approached
with a single primary specification: find the data
of interest and deliver it. For extremely large, complex
and far-flung databases this approach is analogous
to the command "visit every apple orchard in the world
and bring back only the sweetest apples". The search
engine dutifully visits every apple orchard, finds
the closest thing to "sweetest" and harvests those.
The result is one big pile of apples. Too big, usually.
The user is then faced with sampling every apple in
the pile. Even then, some useful information remains
hidden. For example, if the very sweetest apples all
came from one region, that could only be determined
by another search or post-processing of the tags on
all the apples. Thus, sometimes it is the connection
among data items, as opposed to the items themselves
that is the most valuable information.
Anyone
who has attempted to use the Internet for research
or entertainment has encountered the inadequacy of
the current generation of software to some extent.
The symptoms are slow access times, erroneous search
results, recirculating linkages, pointers to defunct
data sets or locations, dead-end links and screen
after screen of useless web pages. One approach to
improving the situation has been the application of
"Intelligent Agents". The idea is that these software
robots will assist the user by handling some of the
grunt-work associated with web data access, retrieval
and presentation. At most, these tools will automatically
log onto the net, go to a list of web sites and then
download the data from them to the user's hard disk
drive. In every case, the robot has to be trained;
told what pages to go to and what kind of data to
download. They are not much help, but better than
nothing.
When
the database is digital videotape clips or other real-time
data such as digital audio the nature of the problem
is not so much the scope or size of the database,
but the requirement that the data be retrieved in
real time, at the frame rate or sample rate of the
content. Slow-motion or interrupted playback of visual
and audio material is generally not of much interest.
For databases of video clips on local disk drives
or databases spread across networks, seamless access
from one data item to the next is difficult and in
many cases impossible. The present generation of database
technology is not designed for real-time data processing.
For
a database composed of the executable (assembly language)
code of a computer program, a set of input data, control
vectors (user input), and output space (display memory
or other target), the present generation of database
technology is virtually useless. At most, the frequency
of occurrence of specific words or strings can be
discovered, or some parts of the structure can be
deduced. Little knowledge about the correctness of
the code can be gained.
Specialized
analysis software does exist that can "snoop" while
a program executes, creating a database containing
information about the history of program execution
and dataflow through the program. But, these tools
are intrusive, provide limited functionality and consequently
are not widely used.
ImaginOn
System
ImaginOn's technology is contained in three functional
areas; transformation, synthesis and playback. Although
it is convenient to separate the software operations
of the system into these parts, in actual use the
distinction is not always clear. This fuzziness arises
when some functions are called by more than one part
of the system and when some operations are performed
in parallel.
Transformation
Database transformation is based loosely on the
digital signal processing (DSP) model. Whereas the
data in DSP is usually derived by sampling a real-time
signal of some sort, the input data in the ImaginOn
system is some pre-existing database. Any database
will do; a text file, a directories of filenames,
images, video clips, a network of data servers, whatever.
The input to the transform function is the source
database (or source databases) and the output is a
subset of the source database. Between input and output
is a "black box" containing the transform engine and
its filters. The transform procedure is similar in
theory to the creation and operation of finite impulse
response (FIR) filters on digitally sampled signals
such as audio or video waveforms. Like digital signal
processing with transform functions, database
transformations can be tuned, that is to say filters
can be designed or transform functions specified appropriate
to the input data and optimized to create output suited
to the user's particular needs. Filtering
functions can be performed on the database in the
temporal, spatial and frequency domains.
An
example of a temporal filter is updating an output
database by sampling a source database at specific
time intervals. A spatial filter can delete content
from a database to meet storage requirements. A frequency-based
filter can create an output database by including
items only if they are pointed to (addressed) more
than a specified number of occurrences from other
nodes in the source database. Of course, filtering
can be based on more than one dimension at a time
and to varying degrees or extent in each dimension.
Virtually all of the filtering techniques developed
for DSP can be applied in database transformations.
To name a few: decimating, bandpass, high-pass, low-pass,
comb, shelving, and notch.
In
the web processing application of ImaginOn technology,
"WebZinger", the transform function produces a condensed
format suitable for high-speed access during playback.
The DSP analogy to this is lossy data compression.
Comparing the input versus the output databases, ImaginOn
performs compression in four ways:
1.
Reduction in scope of the database
The number of nodes made available for processing
is less than the number of nodes in the source network.
The extent of the scope is determined by storage availability
and time limitations.
2.
Decimation of the number of nodes within the scope
of the process
Many of the nodes processed are deleted, never appearing
in the output database. The extent to which a selected
node matches the node selection criteria is saved
as a data array in a format like that of the selection
criteria data array itself. Nodes may also be
selected by other filtering functions such as time
or frequency of reference by other nodes.
3.
Bandpass filtering of the data within each node
Only a fraction of the total data within a node is
reformatted and saved. The parameters of the filter
are set by the user or the default parameters are
used. Filtering may be of any type, including by data
class (audio, video, still image, text, etc.), by
time (for example, only take the data pertaining to
specific dates), by frequency of update (for example,
only take data that is updated hourly), or spatially
(for example, take data from twenty geographically
diverse sites), etc.
4.
Link information is reduced or deleted
Since only qualified nodes
are included in the output database, all of the links
representing unused nodes along them path to a selected
node can be replaced with a single vector which is
a synthetic link that never existed in the source
database. Likewise, all unused links pointing out
of a selected node are deleted, except those at terminal
nodes. At terminal nodes (those at the end of a series
of linked nodes), the unused outward pointing links
are saved for future use in a transformation that
is started from that terminal node.
Synthesis
A synthetic network, in the context of this discussion,
is one that does not exist either in nature or as
a pre-existing manmade structure. The process of network
synthesis is, considered by itself, nothing new. Multidimensional
tree-like data structures, their creation and implementation,
have been a part of computing since day one. It is
the close coupling of ImaginOn network synthesis with
ImaginOn database transformation that yields tree-like
networks that are novel and useful. In synthesis,
the output database created by the transformation
process is structured as a network in "n" dimensions
according to a set of rules using parameters
that are set by the user, or the default settings.
The purpose of the structure is threefold: the structure
serves as "rails" that the playback engine can
traverse, the non-linear sequential organization preserves
much of the original connectivity information that
was present in the source database before transformation,
and a time axis is implied. It is the combination
of the "rails" and the implied time axis that provides
the user with the illusion of virtual motion in "dataspace"
during playback of the transformed database.
Networks
synthesized by ImaginOn are variable in size, shape
and complexity. Each of these variables can be set
by the user. Size refers to the total quantity of
data that is contained in the synthetic network. Shape
refers to the relationship between the depth (or height)
of the tree and its width. Complexity refers to how
many dimensions, or branches, radiate from each node
in the tree. While there are no theoretical limits
to any of these variables, there are obvious practical
considerations that pertain to storage limitations,
display limitations and computational time. Similarly,
there is no requirement that structures be symmetrical,
though for ease of conceptualization, symmetrical
trees are preferred.
ImaginOn
synthetic networks are stored in computer memory as
a database of items commonly referred to as "objects".
Each object of a given class in the structure, for
example, trunk node, branch node and data page contains
within it pointers to the actual physical location
of the data it references as well as pointers to any
other object it is linked to in the tree structure.
When such a synthetic network is embodied as a schematic
drawing, the objects are graphic library elements
and the links are defined in a text file known as
a "netlist". Since the graphical elements can also
be represented as sets of x,y coordinates, the entire
synthetic network can be stored as ASCII text in human-readable
files, if so desired.
By
definition, the synthetic network is hierarchical.
The top level of the hierarchy is represented by the
nodes along the trunk of the tree. The next level
down contains the first layer of branches and so on
down through layers of branches to the lowest level
which contains the leaves. Since the actual physical
address of the source data from which the transformed
data was obtained is stored along with the data for
each node in the tree, new synthetic networks can
be created which are continuations of any previous
network by performing a new transform on the source
database beginning at that point.
Playback
Since the present generation of database technology
was developed primarily for business and scientific
uses, the underlying concept behind data presentation
in database products is information transfer; from
computer to user. Static screen displays of complex
data and comprehensive printouts of search results
are the norm. Rapid access to large quantities of
search results is typically supported by graphical
lists of contents, or indexes, from which the user
can select the next page display. When the results
database is extremely large, even indexed displays
may require large numbers of keystrokes and many hours
of viewing time to zero-in on a specific item of interest.
Conventional
data presentation methods such as graphical browsers
and relational database formatted-field displays require
the user to indicate what the next display
screen will be and when it will be displayed.
Typically, the user accomplishes this by clicking
on a "next" button or a line of displayed hypertext
which is a "hot-link". There is no way to avoid this
one-page-at-a-time procedure. ImaginOn offers a completely
different model of data presentation, similar in some
respects to the way consumer entertainment products
work. The concept of "playback" is utilized in the
same sense that VCRs play back videotape, or CD players
play back music.
ImaginOn's
playback technique is based on an adaptive algorithm
that determines both what will be displayed next and
when it will be displayed. This feature allows the
user to be completely passive, or as active as they
want to be during the data presentation process. When
the user decides to be a completely passive viewer,
ImaginOn playback is automated according to it's internal
rules. This capability is fundamental to ImaginOn's
computer software program testing application.
The
analogy between a VCR's playback of videotape and
ImaginOn's playback of an output database appropriately
describes the seamless nonstop continuous nature of
the presentation, but does not encompass the new features
provided by the system. These unique capabilities
include dynamic path selection through the database,
adaptive automation, variable playback resolution,
and hierarchical playback.
Dynamic path selection
During playback, the user traverses a path through
Imaginon's synthetic network organization of the output
database. The path may start anywhere in the network,
though the default is the first Trunk Node. The actual
vector the user follows through the network is dynamically
determined, assuming the user provides some direction.
The user's selection of the path vector is made
with one key, or button, by responding to the alternating
branch choices visually presented (or presented
by audition, in the case of an audio database) at
Trunk Nodes and Branch Nodes. The next data item is
then immediately presented, without an interruption
in the display or audio output. Should the user fail
to make a selection, playback is continued along a
default path, without interruption, after a predetermined
number of alternating choice presentations.
Adaptive
automation, autoplay and adaptive transforms
Every time the user makes a selection, thus determining
the current path vector, the user's profile is updated
to reflect the state of the match between the present
node's selection criteria and the user's profile.
In this way, the playback algorithm "learns"
about the user's preferences. If, after guiding
playback for a while, the user ceases to make selections,
playback may be continued automatically utilizing
those stored preferences. In effect, the playback
algorithm acts on behalf of the user, as adapted from
the history of the user's path through the synthetic
network. Either the adapted user profile, or the initial
user profile may be selected to play back a database
path automatically, without user intervention. This
mode of operation is called "autoplay".
The
ImaginOn playback software maintains a history of
all user preferences and selections since the time
of program initialization. This user history may be
used by the database transform process to set the
selection criteria and filters for creation of a new
output database to be made from any source database.
The new output database can be grafted onto the existing
synthetic network, or a new network can be created.
If "learned" criteria are used to drive the database
transformation, the transformation itself can then
be considered adaptive with respect to the user's
history of preferences.
Variable
playback resolution
By setting the playback parameters for holding data
pages on screen time and the rate at which the alternating
branch choices "flip", the overall speed of path playback
can be controlled by the user. When these hold and
flip times are set very low, a fast-forward effect
results. This provides the user with a kind of
skimming,
low-resolution view of the paths.
Hierarchical
playback
If the user does not select autoplay mode and
never makes a branch choice, playback proceeds by
default from Trunk Node to Trunk Node. Since Trunk
Nodes represent the main "stems" of the n-dimensional
tree, the user will only see data pages along those
main, top-level paths. If the user makes only one
branch choice per Trunk Node, only the first Branch
Node off of each Trunk Node is displayed, thus providing
a view of the data along only the first layer of branches
in the tree. This manner of playback proceeds successively
down through the layers of the tree. Thus, hierarchical
playback in which only paths down to a user-selectable
level of interest in the synthetic network are
enabled.
|