Intelligent Software versus Stupid Data

Computers, specifically CPU circuits, perform data processing. Computer systems, which takes into account all of the peripheral devices attached to computers which interface with the outside world, and networks of computers, perform information process management. Data processing includes creation, storage, retrieval, update and basic statistical analysis of digital information. Information process management extends this to include communication, using advanced input and output devices (beyond simple teletype) and networks, entire series of applications used in digital workflow, extensible structured storage like modern databases and directory services, data mining and advanced storage techniques like version-aware backup.

The weakest part of the chain of information processing is still data acquisition from non-digital sources. Most of the work of getting data into a computer system is still manual. The devices used to input data have not changed dramatically, and the work is still mostly done by human beings, who usually create it from scratch while entering it into the computer. The vast majority of data is still probably textual, if you count a data not by storage, but by discrete data elements that a person would recognize, like facts. In the latter context, even a 100GB image taken from a super-LCD camera is just a single datum. The reason for this distinction is that the datum is recognized by software as such. In the case of image processing, it is the sequence of steps used to transform an original input image into a final output image that matters most. While the various filters applied to an image are “aware” of the pixel values, those values are repeatedly destroyed and forgotten as part of the process, and are never truly considered as distinct in their own right.

Artificial intelligence has been described in many ways, but generally it seems to come down to the ability of a computer system to relate significance (if not actual meaning) to elements of data. Image recognition is one of the most significant areas of research for attempting to see more data elements within the raw content of an otherwise monolithic data element. Language processing is the other. In the first, the datum is a grid of encoded colour values (generally), while in the latter, it is a sequence of codes representing character values. Their main structural differences between images and strings of text are that images are two-dimensional, and the encoding for colour values are typically larger than those for character values. The structure implicit in digital image and string data is completely imposed from the outside, as a result of the way the human brain interprets raw images and language (in either visual or auditory form).

The work of artificial intelligence, then, must involve the encoding of techniques of structural analysis which can be applied to raw data, first and foremost being the process of recognizing data based on its inherent structure. Human beings, however, do not need to distinguish data based on its structure. We distinguish it based on its input source: eyes, ears, skin, etc. The equivalent of input source tagging in computers is metadata, which includes things like file suffixes and header data, which usually start with a magic number of some kind in case the file suffix has been lost or changed. Without the metadata, the data is not necessarily recognizable based on its structure, or at least, not always, with the software we currently have.

To be meaningful, data must first be differentiable. I think that artificial intelligence will see more progress if it concentrates on the metadata structures which are used for this differentiation. Markup languages are supposed to be a step towards this, but they seem to in fact merely push the problem away by one more remove. At some point, the metadata needs to be fixed, much as the identification of visual data as that which originates from the optic nerve is fixed in the animal brain.

The tagging of metadata must be done in hardware. It is already done son in most cases, but the approach has been somewhat haphazard, and I don’t think that the efforts of data generation have been especially well coordinated with those of structural data analysis. Search technologies like Apple’s Spotlight represent a commercial effort in this direction. Search, however, is a relatively stupid function, and even still it works poorly, mostly because the language of search is both obscure and too generic at the same time, requiring too much intimate knowledge of the metadata structures on the part of the human requesting the search.

One thing that seems to still need to be done in the area of image acquisition is the encoding of context metadata. There seems to be some understanding (I can’t remember where I read about it) that it is important to note where on Earth an image originated, using GPS. However, it should also include the camera’s orientation to the Earth and reliable time data, and as many other of the camera’s relevent settings as possible. In this way, images could be processed together to re-generate a model of the data in the photograph, allowing analysis of three dimensional content and lighting, and be combined with such things as weather and other data to build up a detailed concept of the scene. To find meaning in data requires not only structure, but interrelationships with other related data. These relationships are orthogonal, and create a virtual semantic space into which the data can be positioned, creating a higher-level structure which is much less prone to blurring.

In addition, we need new devices which can collect richer information and apply structure at the time of capture, instead of only after the fact. Adding multiple perspectives in time and place, using multiple lenses (polyscopic images), multiple frames, and wider spectral sensitivity are some examples. Even more sophisticated hardware might be able to quickly analyse the motion in a scene to differentiate between different objects, taking multiple exposures at different settings in order to pre-distinguish those different objects. Special cameras could be invented to recognize character data in a scene, or microphones to recognize the difference between spoken language, animal noises and other sound types while recording, and inserting tags into the data stream.

Once raw captured data can be replaced with intelligently marked data, the work of writing software to make meaningful sense of that data will be vastly easier and more successful.

Comments are closed.