Text Retrieval series part 1

By Jason Romney

Suggested precede: Finding information just got a whole lot easier. Jason Romney reports.

Word processing is one of the most common computer uses. It's not long before you find all those essays, letters and memos - not to mention a torrent of incoming electronic mail - become unmanageable.

What is the solution? Melbourne barrister and computer programmer, Mr Chris Priestley, has developed an exciting new program called Total Research which will be of enormous assistance to people who need to find textual needles in data haystacks.

There are one or two basic concepts that should be understood about text retrieval. We will soon review a new version of Isys for Windows, a text retrieval program which analyses all the text you wish to navigate and creates indexes of the words in those files.

This method means that searches are fast. However, creating the indexes can takes a long time for large amounts of text - particularly on 386-based computers.

Other programs, such as Gopher and Retrieve It, use a free form text retrieval engine ie they do not makes an index.

A popular program such as On Location uses a hybrid of the two systems. It creates an index of your hard disk - but doesn't keep unique words in every file.

For example, it says, ``Jason" occurs in files a, b, and c, but will not tell you where in those files that word occurs. Thus, On Location finds files quickly. It then does a normal free form text retrieval through those files.

One of On Location's advantages is that the index it creates is relatively minute. By contrast, an Isys index can be many megabytes if a large number of files are to be indexed.

To this writer's knowledge, there is at this time no fully indexed text retrieval system for the Macintosh which will function like Isys. The closest is On Location - which, initially at least, only gives a pointer TO a file, not the location of text IN a file.

With Mr Priestley's Total Research, no indexes are created. This means searches take slightly longer, however, there are numerous other advantages.

These advantages relate to the program's interface with the user. Importantly, Total Research does a search and presents you with the matches it finds, in context.

A word by itself is useless. Nor do you really want to know how many matches there are. Context, for most ordinary text retrieval needs, is all-important.

In Total Research, the key search word you are seeking appears vertically aligned in the centre of the screen - or on a marker which can be moved horizontally across the screen to show more contextual information both before and after the search target.

Isys does much the same thing, however, Total Research relies on a user typing in less complicated search strings and when a context list appears, the maximum information is presented in the most useable manner.

Says Mr Priestley: ``I think the most powerful text retrieval combination is not the computer alone but the pattern matching or relevance matching of your BRAIN.

``It is that combination that does the best text retrieval. So when Total Research presents you with a list, you can scan the lines and see which ones are relevant to you."

With Total Research, you read the context lines, then after selecting the one that seems relevant, double click on it. That calls up the full text version in a separate window, centred on that particular match. That is, you can jump into the list at any point rather than in a linear fashion.

The next key step is what to do with the text once you have found it. Total Research is not just text retrieval but a complete information management system.

You select a portion of the found text by running the mouse over it, then click on an option at the bottom of the text screen called ``append". Your chosen text can be sent either to a text file or automatically catalogued in a database record.

In the file option, the text is included in the file, then the full path to the file that the text came from. Particularly useful for lawyers is the fact that the page number of any transcript can also be stored.

In the database option, Filemaker Pro on the Macintosh (which allows up to 64 pages of text in one data field) can record the full path back to the file, the page number of the transcript, the text you selected, an extra field for comments and a topic field which is a pop up list that the user can define.

Such a list for lawyers might include exhibits, rulings, discussions, witness names and issues in the case. Obviously this can be tailored readily to your own needs.

Using the power of the database, you can collect together notes relating to the same topic. For example, lawyers could order the computer to give them every note ever made on case exhibits, sort them by date and put them together in a report.

Making a record in a database is like making a bookmark back into your original source material. All records in the database can be traced back to the original source location file via hypertext links.

The database structure so created, adds value to your work by making links between database records and the original source material AUTOMATICALLY - without you doing further typing.

You concentrate on research. The program does the work. Sounds good eh?

For more information about Total Research ($600), call 642 4022.