School of Computing - Research - Working Papers - 1997


Working Papers for 1997



CA-0197:
Encrypted Speech with a Familiar User-Interface,
Mike Scott and Charlie Daly.

This paper gives an overview of the development of a secure phone. This device takes its input from a normal phone, digitises and compresses the audio, encrypts, it and modulates it using a modem chip from where it is sent out via the telephone system to (presumably) another similar device. A lot of the technical problems were solved by using off the shelf solutions, and details of, for example, voice digitisation and compression will not be dealt with. One of the security aspects, the key exchange, requires a reasonably powerful processor. For this reason, and to ease the software development process, we used a 386EX microprocessor (PC Compatible).

CA-0297:
An Overview of Information Filtering,
Alan Smeaton and Humphrey Sorensen.

Information filtering is a field of study which has undergone a major upsurge of interest in recent times as more and more information is made available online. This paper gives an overview of systems presented at a workshop on Practical Applications of Information Filtering, held in October 1996. The systems described are varied in the approaches that they take but all have one thing in common and that is that they are operational and have practical feedback on usage.

CA-0397:
Virtual Lectures for Undergraduate Teaching: Delivery Using RealAudio and the WWW,
Alan Smeaton and F. Crimmins.

"CA309: Databases" is a module taught to about 110 full-time and 30 part-time students in computing and 15 students in the B.Sc. in Applied Mathematical Sciences. These students are in the third year of a 4-year degree program. Our educational format in this course had been typical of most University courses, a passive transfer of information via lectures with a re-enforcement of material in the library or laboratory using reference materials or exercises. With increasing class sizes and the pressure to take on even more students, even to double our intake, this trend away from student-lecturer contact was becoming even more pronounced. Such educational delivery has clear weaknesses in the area of personal contacts, distancing the student and lecturer because of numbers, and eroding the number and quality of one-on-one contacts and making most students reticent to approach lectures. In this paper we describe our implementation of a proposal to tackle this problem and we indicate the ways in which our proposal is being evaluated.

CA-0497:
Lessons Learned from Developing and Delivering the BORGES Information Filtering Tool,
Alan Smeaton.

Although it may appear that it is only recently that we have discovered a need for automatic information filtering, the practice of automatically filtering a flow of information has been in use for over 30 years. The emphasis in systems since the earliest days has been on the speed of the filtering operation, ensuring that it is performed as quickly as possible. Because of the volume of information now being generated and the requirement to have this filtered, issues of quality or the relevance of information filtered for a user, are now becoming increasingly important. What made BORGES different from other information filtering projects was that it was user-driven and developed within a library context as a service offered by a University library. The filtering service was offered to a population of users at two University library sites (Dublin City University and Universität Autonoma de Barcelona) and feedback from users as well as log analysis of system use, was used to refine the BORGES system.

CA-0597:
The TREC Experiments and Their Impact on Europe,
Alan Smeaton and Donna Harman.

Information retrieval research on text collections has concentrated on improving the effectiveness of the indexing and retrieval operations. For the most part, the evaluation of IR systems has been carried out on relatively small collections of documents, queries and relevance assessments. In 1992 the first of a series of evaluation exercises called TREC was launched in the US and these exercises have continued annually since then. What makes TREC notable in IR research is that the document collections used are huge compared to previous ones and the groups participating in this collaborative evaluation represents a who is who of IR research across the world. In this paper we present an overview of TREC, the way it operates and the specialist "tracks" it supports. We then concentrate on European involvement in TREC, examining the participants and the emergence of European TREC-like exercises.

CA-0697:
Using Data Fusion for Engineering Effectibe Text Retrieval Systems,
Alan Smeaton.

In information retrieval, data fusion is a technique for combining the outputs of more than one implementation of a document retrieval strategy which ranks objects for retrieval. One of the observations often made about data fusion in IR is that the fusion together of document ranking can yield a level of effectiveness which is better than any of the individual retrieval strategies. In general, this holds true when the implementations are based on some conceptually different approaches. In this paper we explore this hypothesis using a text retrieval application on over 200 Mbytes of Mexican Spanish newspaper texts with a fixed set of queries for which relevant documents are known. Using 9 different retrieval strategies from the TREC-4 benchmarking exercise we fuse rankings in different combinations in an attempt to see whether there is a correlation between the conceptual independence of a document ranking strategy, and the observed improvement or otherwise in retrieval effectiveness from data fusion. Although the application we use for our experiments is text retrieval, the principles we explore hold true for engineering any kind of information system based on the ranked retrieval of objects.

CA-0797:
TREC-5 Experiments at Dublin City University: Query Space Reduction, Spanish Stemming & Character Shape Coding,
Fergus Kelledy and Alan Smeaton.

In this paper we describe work done as part of the TREC-5 benchmarking exercise by a team from Dublin City University. The 3 areas reported on are
(1) Ad hoc conventional retrieval using a query space reduction technique to improve retrieval efficiency without loss of effectiveness
(2) The application of a new stemming algorithm from Martin Porter to Spanish documents and queries and
(3) the application of Spitz's character shape coding approach to document representation

CA-0897:
Learning Theories and Computer Gaming,
Ken Maher and Micheal O'hEigeartaigh.

Learning has been described as "the process of acquiring relatively permanent change in understanding attitude, knowledge, information, and skill through experience" (Wittrock, 1977, p ix). Since learning can occur in many situations and contexts, this paper looks at the possible use of computer gaming to enhance the learning experience. A description of the psychological aspects of learning is given, as well as a taxonomy of computer/video games. The learning theories are applied to the types of games in order to highlight their educational potential.

CA-0997:
Implementation of Kernel Action Notation in CAML,
Sandra Ward and James Power.

A variety of specification formalisms exist for the precise description of programming language semantics; once such formalism is action semantics, which strongly emphasises modularity in the specification, coupled with a sophisticated set of "high-level" specification constructs. In this paper we describe the translation of action notation into the functional programming language CAML, a variant of ML. Because of its high level of abstraction, the notation itself demands a formal specification in terms of a more primitive notation, and it is this "kernel" notation that we implement.

CA-1097:
Developing a Proof Framework for Action Semantics in Coq,
Sandra Ward, James Power.

One of the advantages of formally specifying a programming language’s semantics is that it provides a formal foundation for proving properties of programs, and equivalencies between program constructs. Action Semantics provides a framework for constructing modular specification, as well as a foundation (using its basic action laws) for deriving these proofs. In this paper we discuss the use of the Coq proof assistant (based on higher-order constructive logic) to build such proofs.

CA-1197:
3D Perspective Texture Mapping of Polygons in Real Time,
Michael McMahon, Steven Collins.

This paper sets about optimizing the process whereby a 2D texture plane is mapped onto a 3-D quadrilateral planar surface, with a view to achieving enhanced realism of real-time graphical applications such as virtual reality systems. Each of the possible texturing algorithms are developed and their performance determined by measuring the time taken to render five thousand randomly generated three dimensional quadrilaterals. Included in these algorithms is a previously unpublished algorithm known as the "Hyperbolic Algorithm" and a newly developed algorithm by myself, which I have called the "Adaptive Scanline Sub-division Algorithm".

CA-1297:
The Internet, an Irish Business Resource,
Stephanie Ryan, Gary Keogh.

This paper is an analysis of a survey carried out during July 1996. It examines a sample of Irish companies to ascertain the extent to which they are using the Internet as a business resource. The original proposal was to survey 50 to 100 companies, 95 replied. It looks at who provides the Internet connection for these companies and the level of satisfaction with the service that is provided Communication via the Internet is also examined to see what tools Irish companies are using and to what extent. Other business uses of the Internet are also explored and the methods of virus protection and computer security measures are examined. Finally there are a number of conclusions made from the survey. Irish companies may have to become more adventurous in their use of the Internet if they want to use it to its full business potential.

CA-1397:
Architectural Issues For Integrating Legacy Systems Using CORBA 2 In The LIOM Project,
Pierce Hickey, Mark Roantree, Alan Crilly, John Murphy.

Using middleware to solve the problem of integrating existing legacy systems with new systems, while once a feasible solution, has led to a dearth of legacy middleware. In this paper we discuss solving the middleware problem with the OMG's CORBA and identify the architectural issues involved in such an endeavour. The impact of these issues for healthcare legacy systems, in particular those of the LIOM project is demonstrated.

CA-1497:
Relational Vs. Object Oriented Database Systems,
Declan Brady, John Murphy.

It is perceived by many that there are various "problems" with the relational model which cannot be resolved except by abandoning it in favour of some (as yet undefined) "object model of data". This perception is ill-founded, and based on some poor understanding of the issues involved. The Relational Model is the only secure basis on which to create a long-term foundation for the future of data. Object oriented databases will continue to satisfy various domain-specific niches, but only relational databases will meet the broader needs of data management in all areas.

The Third Manifesto of Hugh Darwen and Chris Date defines such a foundation for data. To allow the whole problem to be discussed clearly, a consistent terminology is required. The terms item, sort, and construct are adopted for this purpose. With data in relational databases (based on the Third Manifesto) a mechanism is needed to share relational data with object oriented databases in a simple, consistent manner, and such a mechanism is suggested. Finally, some decision criteria are proposed for the question of the choice of representation of items in a Third Manifesto database.

CA-1597:
The Stochastic Heuristic for the Resource Levelling Problem,
Allen Mushi, Micheal O'hEigeartaigh.

The Resource Levelling Problem (RLP) is a combinatorial optimisation problem with applications in manufacturing. A critical path is determined for a set of jobs/items that needs to be manufactured or assembled to form a final product. In capacity planning each of these jobs is associated with a predetermined load or "hassle" factor (resource level). When these jobs are scheduled together in a planning horizon, different profiles with respect to the resource levels are developed. The objective of the RLP is to minimise the maximum resource peaks by moving jobs within their slack/floating times. It is a NP-Hard combinatorial optimisation problem.

In this paper we present a stochastic heuristic algorithm for the approximate solution. The algorithm operates by converting a deterministic heuristic (Tabu Search in our case) into a global stochastic algorithm. This is done by applying random perturbations on the output of the deterministic algorithm and selecting the best candidate solution. The paper is divided into two sections; in the first section we describe the Tabu Search (TS) algorithm, as applied to the RLP, and the second section describes the stochastic case. We present a summary of the results and compare them with the optimal solutions (for small instances) developed in a previous technical report on Mixed Integer Programming (MIP) formulations.

CA-1697:
How to Glue a Donkey to a f-Structure or Porting a Dynamic Meaning Representation Language into LFG's Linear Logic Based Glue-Language Semantics,
Josef van Genabith, Richard Crouch.

In this paper we replace the static meaning representation language in the LFG linear logic based glue language semantics approach with a dynamic meaning language. This move extends the original approach to discourse phenomena and can be combined with the approach to underspecification developed previously by the authors. On the other hand it provides linear logic based approaches to quantifier scope and underspecification for dynamic semantics. We briefly compare the results with some alternative approaches discussed in the literature and sketch a QLF and a UDRS style interpretation for a set of linear logic premises thus obtained.

CA-1797:
On Interpreting F-Structures as UDRSs,
Josef van Genabith, Richard Crouch.

We describe a method for interpreting abstract flat syntactic representations, LFG f-structures, as underspecified semantic representations, here Underspecified Discourse Representation Structures (UDRSs). The method establishes a one-to-one correspondence between subsets of the LFG and UDRS formalisms. It provides a model theoretic interpretation and an inferential component which operates directly on underspecified representations for f-structures through the translation images of f-structures as UDRSs.

CA-1897:
Design Methodology for Hybrid Systems,
Aboud N. Hussein, David Sinclair.

This paper proposes an integrated methodology for development of hybrid systems which exploits benefits of object-orientation and techniques for the designing real-time or real-time embedded systems. The proposed methodology will assist the user in capturing the initial system requirements, identifying the components parts of the system, defining the interfaces between these components, and incrementally developing these into object design model specification of the system.

CA-1997:
Computerised text presentation system for visually impaired people.,
Les Allan, Micheal O'hEigeartaigh.

Up until now the problems of access to text via computers have been largely overcome by the visually impaired using various adaptive aids such as screen magnification software, screen readers (which send their output to speech synthesizers) and various forms of Braille output devices. This paper addresses the problems faced by visually impaired people for whom access to text is important either for work or leisure. The current technological solutions are investigated and some of the best screen magnifiers currently available are evaluated. Finally, a new approach to the problem of access to text using a modified display is presented using Windows based software. The difficulties involved with designing such a system are dealt with along with the issues of cost, special and general usability and the potential benefit of the modification to the mass market.

CA-2097:
Using Character Shape Coding for Information retrieval,
A.L. Spitz, Alan Smeaton.

In conventional information retrieval the task of finding users' search terms in a document is simple. When the document is not available in machine-readable format, optical character recognition (OCR) can usually be performed. We have developed a technique for performing information retrieval on document images in such a manner that the accuracy has great utility. The method makes generalisations about the images of characters, then performs classification of these and agglomerates the resulting character shape codes into word tokens based on character shape coding. These are sufficiently specific in their representation of the underlying words to allow reasonable performance of retrieval. Using a collection of over 250 Mbytes of document texts and queries with known relevance assessments, we present a series of experiments to determine how various parameters in the retrieval strategy affect retrieval performance and we obtain a surprisingly good results.

CA-2197:
Information Retrieval: Still Butting Heads with Natural Language Processing ?,
Alan Smeaton.

Information retrieval (IR) is about finding documents which may be of relevance to a user's query, from within a corpus or collection of texts. While apparently a simple task at first glance, IR is in fact a hard problem because of the subtleties introduced by the use of natural language in both documents and in queries. The automatic processing of natural language clearly represents significant potential for improving information retrieval tasks because of the dominance of the natural language medium on the whole IR task. Information extraction is also fundamentally about dealing with natural language albeit for a different function. It is thus of interest to the IE community to see how a related task, perhaps the most-related task, IR, has managed to use the same NLP base technology in its development so far. This is an especially valid comparison to make since IR has been the subject of research and development and has been delivering working solutions for many decades whereas IE is a more recent and emerging technology.

CA-2297:
Using NLP or NLP Resources for Information Retrieval Tasks,
Alan Smeaton.

The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In this chapter we will present a précis of our experiments in information retrieval using NLP which have had mixed successover the last few years. We introduce the respective roles of NLP and IR and then we summarise our early experiments on using syntactic analysis to derive term dependencies and structured representations of term-term relationships. We then re-thought the role that NLP could have for IR tasks and decided to concentrate our efforts onto using NLP resources rather than NLP tools in information retrieval and our more recent experiments in this area in which we use WordNet are summarised. Finally we present our conclusions and the status of our work.

CA-2397:
Computer Based Learning: Issues and Perspectives,
Ken Maher, Micheal O'hEigeartaigh.

This paper reviews developments in the field of instructional design, multimedia, hypermedia and computer gaming. Ideas are drawn from each of these fields in an attempt to ascertain the effective strategies needed in the design of educational software for young learners.

CA-2497:
A Review of Synthesising Autonomous Controllers To Provide Physically Based Animation.,
Alan Egan, Mike Scott.

This paper provides a review of current techniques involved in synthesising physically realistic animation through the use of a controller or control algorithm. All techniques have the common goal of decreasing user input in the animation process while augmenting computer activity in providing the final motion sequence. The general procedure common to all methods can be summarised as follows: creature is constructed and its dynamics are formed. The control algorithm chosen by the author leads to a set of plausible solutions and through optimisation the best such solution is adopted.

CA-2597:
Evaluating the ODMG Object Model for Usage in a Multidatabase Environment,
Mark Roantree.

This paper describes a body of research on the viability of using the ODMG Data Model as a Canonical Data Model in a multidatabase environment. The ODMG Object Model offers a standard for object-oriented database users while attempting to address some issues of interoperability. The purpose of this work is to examine the merits and weaknesses of the ODMG object model in a database environment by using an evaluation framework and then comparing the ODMG object model against other object models in this respect.

CA-2697:
RoboProf: an automated learning environment,
Charlie Daly.

This paper describes RoboProf, a program that attempts to encourage and monitor a student s progress throughout a course. Progress is controlled by dynamically generated on-line tests. Its use is described in a first year Algorithms and Data Structures module and results are analysed.

CA-2797:
Comparison of Three Object-Oriented Methods: An Empirical Study,
Aonghus Ó Cléirigh, John Murphy.

The purpose of this dissertation is to establish how the analysis phase of the OMT and Fusion methods, and the essential modelling stage of the Syntropy method differ? This is achieved by applying a comparison framework to each method as described in its supporting text, and by testing each method directly using a real situation. The significance of the differences among the methods is evaluated with regard to how each method fulfils general criteria for practical use. The methods are compared with respect to the following concepts: object structure, behaviour, rules, grouping and viewing, notation, deliverables, techniques for gathering information, analysis modelling activities, quality, reuse and CASE tools.

CA-2897:
Assisting the Hypertext Authoring Process with Topology Metrics and Information Retrieval,
Gavin Gollogley, Alan Smeaton.

As more and more documents become available in electronic format, the use of hypertext systems is becoming more common as a way to organise information. However as the size of a hypertext database grows, the lost in hyperspace problem may limit efficient and meaningful usage of hypertext systems. In order to increase local coherence, authors should limit 'the fragmentation characteristics of hypertext'. These characteristics seem to be endemic to hyper-documents and result from the segmentation of information into disjointed nodes. Fragmentation may result in a lack of interpretative context and thus lead to the impression that the hyper-document is an aggregation of loosely linked pieces of information rather then a coherent whole. Much of the present WWW is indeed an aggregation of loosely linked or even unconnected nodes though local "areas" can be coherent wholes. In an attempt to understand a new node, readers try to extract information and relate it in context to other nodes that they have viewed. In this thesis we describe an application which incorporates an apprentice link editor to suggest candidate information/hypertext links for the hypertext author to validate. These suggestions use node-to-node comparison and metrics to present the author with the most appropriate choices in adding a new node to the hypertext they are authoring. The theory is demonstrated via an implementation which is evaluated in a hypertext authoring task.

CA-2997:
PROMPTER - A Decision Support Tool for Software Project Management,
Rory O'Connor, Tony Moynihan, Tristan Renault, Annie Combelles.

This paper describes work undertaken within the context of the P3 (Project and Process Prompter) Project which aims to develop the Prompter tool, a "decision-support tool to assist in the planning and managing of a software project". Prompter will have the ability to help software project managers to assimilate best practice and 'know how' in the field of software project management and incorporate expert critiquing which will assist project managers in solving the complex problems associated with software project management. This 'know how' and best practice are encapsulated in distributed intelligent agents that act as advisors helping users at every stage of the software process. Each agent is an expert in a particular area of software project management and continually analyses project parameters making the user aware of any possible weakness, and helping to mitigate the associated risks. The agents will use their expertise to prompt the user to choose between suggested alternatives, as well as providing justification for their advice. The agents are contained in an agent library, which can be used in two main contexts: in a client server mode across a LAN; and in distributed mode across the Internet.

CA-3097:
Culture's Influence in Information Systems Development (ISD) Process: Tanzanian case,
Leonard Mselle, Tony Monyihan.

The purpose of this paper is to highlight the role of national culture in Information Systems Development (ISD) process. We are particularly interested in gaining an understanding of the influence of national culture on the IDS process in Tanzania. In this paper; the relationship between organizations and national culture is discussed. The links between ISD process and national culture are exposed. The present situation of studies on ISD process in Tanzania is discussed. The questions to be answered and our research methodology are exposed. are presented.

CA-3197:
Study of Windows Help Systems for the creation of a Generic Help QA Tool ,
Andy Way, Gary Hearne, Mark Roantree.

In constructing applications to manipulate text there is often a recurring need for parsing. Whether it is updating configuration files or processing macros, some form of parsing engine is implemented to ensure that the content is properly understood. Normally parsers are specifically designed to suit the document being parsed, leaving little or no room for deviation, meaning that each new project is created from scratch. Our work involves the need to parse Help files, for platforms such as DOS, Windows, Unix or OS/2. A single tool that could handle all formats is more useful than a suite of tools that individually handle parse one format. Taking this approach, the 'generality' of the tool ensures that it can accommodate with any new file format, without the need to develop a new tool

CA-3297:
Comparison of Lahiri sampling with unrestricted random and systematic sampling ,
Jane Horgan.

A method of monetary-unit selection is suggested which overcomes the main practical implementation problems associated with currently-used strategies. Empirical evidence indicates that it compares favourably with unrestricted random and systematic sampling of monetary units with respect to the coverage and magnitude of the Stringer, cell and moment bound estimates of the total. Its variability is similar to that of unrestricted random sampling but somewhat greater than that of systematic sampling when the sample size is large. The evidence with the bounds is supplemented with an analytical study of an unbiased point estimator. Advantages of the proposed scheme are discussed.

CA-3397:
A Process for Generalising the Comparsion of Electronic Documents,
Michelle Timmons, Mark Roantree, Andy Way.

In this report, we examine the feasibility of designing a generic system to compare the format and structure of two documents, where the format is the physical appearance of the document (e.g. underlined text, margins) and the structure is its composition (e.g. paragraphs, chapters, headings). The process of electronic publishing is studied, and in particular the markup used. The vast differences between the types of markup that will need to be considered are described. This provides a basis for the discussion on automating the comparison of these markup schemes, and how a generic document structure can be used in this process.

CA-3497:
Assistive Technology Skills to the Supporters of Physically Disabled People,
Bob Allen, Micheal O'hEigeartaigh.

Contents: Overview of Assistive Technology (A.T.) and Computer Applications for Disabled People, Disability Problems to be Addressed, Solutions, Integrating the Technology - Why?, Training the Trainers/Supporters, Skills Required, Methods of delivery, Examples of Existing A.T. Courses, Putting it All Together - Can the Tool be the Teacher?, Linking Tools to the Internet, Multimedia training Programs, Suggestions for Further Work

CA-3597:
An Evaluation of Service Delivery Techniques used to support disabled people and their local helpers in their use of assistive technologies,
Bob Allen, Micheal O'hEigeartaigh.

Contents: Assessment and Equipment Recommendation, Integrated Technologies, I.T. - A Tool for Support, Technical Support Mechanisms - Where do they Fit?, The HORIZON - TEST Project, Appropriate Levels of Support Technologies

CA-3697:
Spatial and Temporal Data Models for Geographical Information Systems,
Patrick Browne, Micheal O'hEigeartaigh.

The main purpose of this paper is to investigate the nature of spatial and temporal data models suitable for a Temporal Geographical Information System (TGIS). Current GIS have poorly conceived spatial models which have evolved over time in an ad hoc fashion. The temporal aspect of geographical information is also poorly developed. We consider a TGIS as a permanent repository for a national mapping agency's data and a vehicle for many of its operational procedures. In the near future a national geographical information system will form part of the information infrastructure of the state. It is this potential application area which sets the context of this paper. We review the various formalisms; in particular, we concentrate on the problems of database update. We try to establish the basic form of a spatio-temporal data model suitable for the representation of map data which in turn represents geographical space.

CA-3797:
Intelligent Assistance for Software Project Management,
Rory O'Connor.

This paper examines the issues of assisting software project managers in the decision making processes involved in the planning, managing and executing of a software development project. The role of software project management tools is examined and a proposal for an intelligent assistant system for software project management is set forth. The rationale behind this proposal is outlined and the summary results of a validation exercise conducted with project management tool users is discussed.