|
Database
Transformation,
Synthesis and Playback Technology
by David
M. Schwartz, CEO, ImaginOn, Inc. 24 October 1996
Synopsis
The current level of technology in database software
is hierarchical relational databases and the search engines
that access them. This technology works well for most of
the local data storage, retrieval and presentation tasks
undertaken by businesses and institutions. Nonetheless,
even in these "normal" areas problems have always occurred
at the boundaries between databases and among databases
of different formats. For tasks outside the normal course
of business management, traditional database structures,
methodologies and tools rarely perform adequately. These
areas of inadequate performance include large chaotic networks
such as the Internet, real-time accessed databases such
as film, video and audio clips, and extremely complex, re-entrant
dynamic data structures such as large computer programs.
ImaginOn
technology solves the access, retrieval and presentation
problems associated with problematic data structures and
systems such as those mentioned above by applying the principles
of database transformation, network synthesis and
automated adaptive playback. Database transforms
are the processes by which a source database structure and
its contents are converted into an output database that
has a specified relationship to the source. Network synthesis
is the process by which a multidimensional network is created
according to a set of rules. Automated adaptive playback
is the process by which the data within any network structure
can be presented sequentially on a non-linear path jointly
determined by the user's preferences and the rules of the
process. To date, three distinct commercial applications
of ImaginOn technology have been identified: interactive
film authoring and playback, world wide web data processing,
and automated testing of computer programs.
Background
The history of mainstream business computer database
technology is the story of the transition from flat-file
databases to relational databases. In the engineering community,
hierarchical databases of both flat and relational types
have been utilized in specialty applications like computer
aided design and project management. Flat-file databases
consist simply of data entries in the form of records containing
a set of fields delineated from one another by commas (or
other character). Records are either separated by an end-of-record
character, or by the implied boundary of a fixed record
length. Flat-file databases are accessed by reference to
the record number itself or by searching for a user-defined
string (set of characters) within the database, which will
then return the ID numbers of all records that contain the
string.
Relational
databases represent an improvement over flat-file databases
in that the database is addressable down to the field level.
This provides powerful sorting capabilities to the user.
A cross-section of a relational database can be retrieved
based on attributes within a specific field or by combinatorial
logical (Boolean) operations on all fields. For example,
in the California DMV database, finding all the families
with the surname "Jones" who own 1972 Ford sedans is a simple
matter of sorting the database for "Jones and Ford and sedan
and 1972".
Hierarchical
relational databases have the additional feature of "nesting".
Nesting refers to a multi-level aspect of the database.
In computer aided design, the top level of a design is frequently
a simple block diagram showing the main sections of the
design, such as CPU, Memory, I/O, Peripherals, and Power
Supply. The connections to and from each block are indicated.
The next level in the hierarchy is the design of each section
itself. Then, within each section there may be custom integrated
circuits that are themselves the subject of another level
down in the hierarchy. With a hierarchy, the user gains
the ability to access data specific to the level of detail
under investigation, without the additional detail contained
in the next level down. The limits of a database search
can then be defined in terms of depth as well as relationships
within a level.
Independent
of database type and structure, the problem of information
access and retrieval has typically been approached with
a single primary specification: find the data of interest
and deliver it. For extremely large, complex and far-flung
databases this approach is analogous to the command "visit
every apple orchard in the world and bring back only the
sweetest apples". The search engine dutifully visits every
apple orchard, finds the closest thing to "sweetest" and
harvests those. The result is one big pile of apples. Too
big, usually. The user is then faced with sampling every
apple in the pile. Even then, some useful information remains
hidden. For example, if the very sweetest apples all came
from one region, that could only be determined by another
search or post-processing of the tags on all the apples.
Thus, sometimes it is the connection among
data items, as opposed to the items themselves that is the
most valuable information.
Anyone
who has attempted to use the Internet for research or entertainment
has encountered the inadequacy of the current generation
of software to some extent. The symptoms are slow access
times, erroneous search results, recirculating linkages,
pointers to defunct data sets or locations, dead-end links
and screen after screen of useless web pages. One approach
to improving the situation has been the application of "Intelligent
Agents". The idea is that these software robots will assist
the user by handling some of the grunt-work associated with
web data access, retrieval and presentation. At most, these
tools will automatically log onto the net, go to a list
of web sites and then download the data from them to the
user's hard disk drive. In every case, the robot has to
be trained; told what pages to go to and what kind of data
to download. They are not much help, but better than nothing.
When the
database is digital videotape clips or other real-time data
such as digital audio the nature of the problem is not so
much the scope or size of the database, but the requirement
that the data be retrieved in real time, at the frame rate
or sample rate of the content. Slow-motion or interrupted
playback of visual and audio material is generally not of
much interest. For databases of video clips on local disk
drives or databases spread across networks, seamless access
from one data item to the next is difficult and in many
cases impossible. The present generation of database technology
is not designed for real-time data processing.
For a
database composed of the executable (assembly language)
code of a computer program, a set of input data, control
vectors (user input), and output space (display memory or
other target), the present generation of database technology
is virtually useless. At most, the frequency of occurrence
of specific words or strings can be discovered, or some
parts of the structure can be deduced. Little knowledge
about the correctness of the code can be gained.
Specialized
analysis software does exist that can "snoop" while a program
executes, creating a database containing information about
the history of program execution and dataflow through the
program. But, these tools are intrusive, provide limited
functionality and consequently are not widely used.
ImaginOn
System
ImaginOn's technology is contained in three functional
areas; transformation, synthesis and playback. Although
it is convenient to separate the software operations of
the system into these parts, in actual use the distinction
is not always clear. This fuzziness arises when some functions
are called by more than one part of the system and when
some operations are performed in parallel.
Transformation
Database transformation is based loosely on the digital
signal processing (DSP) model. Whereas the data in DSP is
usually derived by sampling a real-time signal of some sort,
the input data in the ImaginOn system is some pre-existing
database. Any database will do; a text file, a directories
of filenames, images, video clips, a network of data servers,
whatever. The input to the transform function is the source
database (or source databases) and the output is a subset
of the source database. Between input and output is a "black
box" containing the transform engine and its filters. The
transform procedure is similar in theory to the creation
and operation of finite impulse response (FIR) filters on
digitally sampled signals such as audio or video waveforms.
Like digital signal processing with transform functions,
database transformations can be tuned, that is to
say filters can be designed or transform functions specified
appropriate to the input data and optimized to create output
suited to the user's particular needs. Filtering
functions can be performed on the database in the temporal,
spatial and frequency domains.
An example
of a temporal filter is updating an output database by sampling
a source database at specific time intervals. A spatial
filter can delete content from a database to meet storage
requirements. A frequency-based filter can create an output
database by including items only if they are pointed to
(addressed) more than a specified number of occurrences
from other nodes in the source database. Of course, filtering
can be based on more than one dimension at a time and to
varying degrees or extent in each dimension. Virtually
all of the filtering techniques developed for DSP can be
applied in database transformations. To name a few:
decimating, bandpass, high-pass, low-pass, comb, shelving,
and notch.
In the
web processing application of ImaginOn technology, "WebZinger",
the transform function produces a condensed format suitable
for high-speed access during playback. The DSP analogy to
this is lossy data compression. Comparing the input versus
the output databases, ImaginOn performs compression in
four ways:
1. Reduction
in scope of the database
The number of nodes made available for processing is
less than the number of nodes in the source network.
The extent of the scope is determined by storage availability
and time limitations.
2. Decimation
of the number of nodes within the scope of the process
Many of the nodes processed are deleted, never appearing
in the output database. The extent to which a selected node
matches the node selection criteria is saved as a data array
in a format like that of the selection criteria data array
itself. Nodes may also be selected by other filtering
functions such as time or frequency of reference by other
nodes.
3. Bandpass
filtering of the data within each node
Only a fraction of the total data within a node is reformatted
and saved. The parameters of the filter are set by the user
or the default parameters are used. Filtering may be of
any type, including by data class (audio, video, still image,
text, etc.), by time (for example, only take the data pertaining
to specific dates), by frequency of update (for example,
only take data that is updated hourly), or spatially (for
example, take data from twenty geographically diverse sites),
etc.
4.
Link information is reduced or deleted
Since only qualified nodes are included
in the output database, all of the links representing unused
nodes along them path to a selected node can be replaced
with a single vector which is a synthetic link that never
existed in the source database. Likewise, all unused links
pointing out of a selected node are deleted, except those
at terminal nodes. At terminal nodes (those at the end of
a series of linked nodes), the unused outward pointing links
are saved for future use in a transformation that is started
from that terminal node.
Synthesis
A synthetic network, in the context of this discussion,
is one that does not exist either in nature or as a pre-existing
manmade structure. The process of network synthesis is,
considered by itself, nothing new. Multidimensional tree-like
data structures, their creation and implementation, have
been a part of computing since day one. It is the close
coupling of ImaginOn network synthesis with ImaginOn database
transformation that yields tree-like networks that are novel
and useful. In synthesis, the output database created
by the transformation process is structured as a network
in "n" dimensions according to a set of rules using
parameters that are set by the user, or the default settings.
The purpose of the structure is threefold: the structure
serves as "rails" that the playback engine can traverse,
the non-linear sequential organization preserves much of
the original connectivity information that was present in
the source database before transformation, and a time axis
is implied. It is the combination of the "rails" and the
implied time axis that provides the user with the illusion
of virtual motion in "dataspace" during playback of the
transformed database.
Networks
synthesized by ImaginOn are variable in size, shape and
complexity. Each of these variables can be set by the user.
Size refers to the total quantity of data that is contained
in the synthetic network. Shape refers to the relationship
between the depth (or height) of the tree and its width.
Complexity refers to how many dimensions, or branches, radiate
from each node in the tree. While there are no theoretical
limits to any of these variables, there are obvious practical
considerations that pertain to storage limitations, display
limitations and computational time. Similarly, there is
no requirement that structures be symmetrical, though for
ease of conceptualization, symmetrical trees are preferred.
ImaginOn
synthetic networks are stored in computer memory as a database
of items commonly referred to as "objects". Each object
of a given class in the structure, for example, trunk node,
branch node and data page contains within it pointers to
the actual physical location of the data it references as
well as pointers to any other object it is linked to in
the tree structure. When such a synthetic network is embodied
as a schematic drawing, the objects are graphic library
elements and the links are defined in a text file known
as a "netlist". Since the graphical elements can also be
represented as sets of x,y coordinates, the entire synthetic
network can be stored as ASCII text in human-readable files,
if so desired.
By definition,
the synthetic network is hierarchical. The top level of
the hierarchy is represented by the nodes along the trunk
of the tree. The next level down contains the first layer
of branches and so on down through layers of branches to
the lowest level which contains the leaves. Since the actual
physical address of the source data from which the transformed
data was obtained is stored along with the data for each
node in the tree, new synthetic networks can be created
which are continuations of any previous network by performing
a new transform on the source database beginning at that
point.
Playback
Since the present generation of database technology
was developed primarily for business and scientific uses,
the underlying concept behind data presentation in database
products is information transfer; from computer to user.
Static screen displays of complex data and comprehensive
printouts of search results are the norm. Rapid access to
large quantities of search results is typically supported
by graphical lists of contents, or indexes, from which the
user can select the next page display. When the results
database is extremely large, even indexed displays may require
large numbers of keystrokes and many hours of viewing time
to zero-in on a specific item of interest.
Conventional
data presentation methods such as graphical browsers and
relational database formatted-field displays require the
user to indicate what the next display screen
will be and when it will be displayed. Typically,
the user accomplishes this by clicking on a "next" button
or a line of displayed hypertext which is a "hot-link".
There is no way to avoid this one-page-at-a-time procedure.
ImaginOn offers a completely different model of data presentation,
similar in some respects to the way consumer entertainment
products work. The concept of "playback" is utilized in
the same sense that VCRs play back videotape, or CD players
play back music.
ImaginOn's
playback technique is based on an adaptive algorithm that
determines both what will be displayed next and when it
will be displayed. This feature allows the user to be completely
passive, or as active as they want to be during the data
presentation process. When the user decides to be a completely
passive viewer, ImaginOn playback is automated according
to it's internal rules. This capability is fundamental to
ImaginOn's computer software program testing application.
The analogy
between a VCR's playback of videotape and ImaginOn's playback
of an output database appropriately describes the seamless
nonstop continuous nature of the presentation, but does
not encompass the new features provided by the system. These
unique capabilities include dynamic path selection through
the database, adaptive automation, variable playback resolution,
and hierarchical playback.
Dynamic
path selection
During playback, the user traverses a path through Imaginon's
synthetic network organization of the output database. The
path may start anywhere in the network, though the default
is the first Trunk Node. The actual vector the user follows
through the network is dynamically determined, assuming
the user provides some direction. The user's selection
of the path vector is made with one key, or button, by responding
to the alternating branch choices visually presented
(or presented by audition, in the case of an audio database)
at Trunk Nodes and Branch Nodes. The next data item is then
immediately presented, without an interruption in the display
or audio output. Should the user fail to make a selection,
playback is continued along a default path, without interruption,
after a predetermined number of alternating choice presentations.
Adaptive
automation, autoplay and adaptive transforms
Every time the user makes a selection, thus determining
the current path vector, the user's profile is updated to
reflect the state of the match between the present node's
selection criteria and the user's profile. In this
way, the playback algorithm "learns" about the user's preferences.
If, after guiding playback for a while, the user ceases
to make selections, playback may be continued automatically
utilizing those stored preferences. In effect, the playback
algorithm acts on behalf of the user, as adapted from the
history of the user's path through the synthetic network.
Either the adapted user profile, or the initial user profile
may be selected to play back a database path automatically,
without user intervention. This mode of operation is called
"autoplay".
The ImaginOn
playback software maintains a history of all user preferences
and selections since the time of program initialization.
This user history may be used by the database transform
process to set the selection criteria and filters for creation
of a new output database to be made from any source database.
The new output database can be grafted onto the existing
synthetic network, or a new network can be created. If "learned"
criteria are used to drive the database transformation,
the transformation itself can then be considered adaptive
with respect to the user's history of preferences.
Variable
playback resolution
By setting the playback parameters for holding data pages
on screen time and the rate at which the alternating branch
choices "flip", the overall speed of path playback can be
controlled by the user. When these hold and flip times are
set very low, a fast-forward effect results. This provides
the user with a kind of skimming,
low-resolution view of the paths.
Hierarchical
playback
If the user does not select autoplay mode and never
makes a branch choice, playback proceeds by default from
Trunk Node to Trunk Node. Since Trunk Nodes represent the
main "stems" of the n-dimensional tree, the user will only
see data pages along those main, top-level paths. If the
user makes only one branch choice per Trunk Node, only the
first Branch Node off of each Trunk Node is displayed, thus
providing a view of the data along only the first layer
of branches in the tree. This manner of playback proceeds
successively down through the layers of the tree. Thus,
hierarchical playback in which only paths down to a user-selectable
level of interest in the synthetic network are enabled.
|