### School of Computing - Research - Working Papers - 1996




# Working Papers for 1996




CA-0196:
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval,
A.F. Smeaton and I. Quigley.

Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document also as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term approach to IR applications is based upon pre-computing semantically-based word-word similarities, work which is described elsewhere, and using these are part of the document-query similarity measure. A basic premise of our word-to-word similarity measure is that the input to this computation is the correct or intended word sense but in information retrieval applications, automatic and accurate word sense disambiguation remains an unsolved problem. In this paper we describe our first successful application of these ideas to an information retrieval application, specifically the indexing and retrieval of captions describing the content of images. We have hand-captioned 2714 images and to circumvent, for the time being, the problems raised by word sense disambiguation, we manually disambiguated polysemous words in captions. We also built a collection of 60 queries and for each, determined relevance assessments. Using this environment we were able to run experiments in which we varied how the query-caption similarity measure used our pre-computed word-word semantic distances. Our experiments, reported in the paper, show significant improvement for this environment over the more traditional approaches to information retrieval.

CA-0296:
An Information Retrieval Approach to Locating Information in Large Scale Federated Database Systems,
R. Richardson and A.F. Smeaton.

In large scale federated database systems (FDBS) there is a need for users to be able to locate information of interest from among the export schemas of database systems in the federation. This must be done by having those who provide export schemas also provide descriptions of the contents of those schemas but the problem then is matching information requests against schema descriptions with a reasonably high degree of precision. In this paper we describe our approach to this problem which involves using a vocabulary from a knowledge base, sense-disambiguating polysemous words and using the knowledge base as a structure from which we estimate the semantic similarity between words. These semantic similarity values are used in computing degrees of overlap between a user's query and the schema descriptions. In the paper we describe the design for a FDBS system, FEDDICT, which achieves the goal of allowing a more flexible match between query and schema description. As with other researchers who have addressed the same problem, we have found it impossible to evaluate our approach empirically on either real or synthetic data environments but we have evaluated the effectiveness of our matching algorithm in an image caption retrieval application. The results of these evaluations show that our approach yields improvements in the caption retrieval environment whch has exactly the same characteristics to that to be expected in a large scale federated database system.

CA-0396:
Signature Matching of Object-Z Specifications,
Renaat Verbruggen.

Zaremski and Wing in their article have presented the concept of signature matching in an effort to combine the various approaches taken to comparing queries for re-using function libraries. They take the view that retrieval of functions from a library for re-use can be based on the type involved in the calling of the function. This type consists of the list of types of the function's input and output parameters. They use the functional language ML to illustrate and implement their process and they extend their approach from simple functions to matching of multiple functions contained within modules.

We briefly sketch their contribution and then present extensions in two particular directions. Firstly we introduce the extra level of complexity that would be required if the specification of the function was included in the match and secondly we identify the issues associated with matching when the modules are actually classes in an object-oriented sense.

CA-0496:
A Simulated Annealing Procedure for the Resource Levelling Problem,
Allen Mushi and Micheal O'hEigeartaigh.

This paper explores the use of the Simulated Annealing as an optimisation technique for the Resource Levelling Problem (RLP). Resource levelling is one of the resource constrained project scheduling problems. Its objective is to adjust the start dates of individual activities within a project in order to minimise the maximum resource level (say Manpower) over the duration of the project while maintaining all given precedence relations. It is an NP-Hard combinatorial optimisation problem with major applications in Industry. In this paper we describe the application of RLP to operations in manufacturing industries. The Simulated Annealing procedure is then described and implemented. We conclude that, SA is a reasonably good heuristic for the RLP provided that careful decisions are made on the value of the initial temperature and the cooling schedule.

CA-0596:
Epistemological Pitfalls in Metaphor Comprehension:A Comparison of Three Models and a New Theory of Metaphor,
Tony Veale and Mark Keane.

If metaphor is to be viewed as a fundamental cognitive agency, as recent work suggests, what ramifications does this view have for a model of semantic memory? This paper presents a computational treatment of metaphor comprehension, named Sapper, which is built upon a parallel, adaptive, and learning network model of semantic memory. Sapper is a hybrid symbolic/connectionist model which views the interpretation of novel metaphors as a process of connectionist bridge-building, a process which subsequently alters the activation dynamics between different conceptual schemata in semantic memory, thereby causing these schemata to interact in a representationally dynamic fashion.

CA-0696:
Integrating the Formal Description Technique Estelle with the ROOM real-time object-oriented modelling language,
Gary Clynch and David Sinclair.

The development of real-time systems is problematic due to the inherent complexity of such systems. This paper proposes an integrated methodology for the development of real-time systems which exploits the benefits of Object-Orientation and Formal Description Techniques in a unique and integrated manner. The methodology uses the ROOM real-time object-oriented methodology for systems analysis. ROOM is a modelling language and development methodology specifically tailored to real-time systems. A translator has been implemented which translates ROOM system designs into Estelle. (The Estelle Formal Description Technique is used for detailed design.) The translator operates according to a transformational semantics which is a set of translation rules which specify how the concepts of the ROOM language are translated into Estelle. The ROOM/Estelle methodology is illustrated in this paper with a simple case study, the Cruise Control system of a car.

CA-0796:
The Evolution of 2D Froth with a Single/Multiple Defects,
Yu Feng and Heather J. Ruskin.

Simulations of 2D dry froths provide considerable insight to the behaviour of these disordered systems. Using the direct simulation method, we consider the problem raised by Levitan (1994) of a 2-D froth with a single defect, and also of a froth with multiple defects. The scaling properties obtained for various system sizes and initial structures have been discussed. For a single defect in an ideal hexagonal network, we have found that the second moment, m2, of the distribution of the number of cell sides for the region of the defect, does not tend to a constant as claimed by Levitan. For multiple defects in a froth with ordered and disordered initial conditions respectively, we have observed that the scaling properties (relating e.g. to the topological distribution f(n) and m2), depend on the amount of ordering in the ordered froth, but are universal for the disordered froth. In particular, our findings support the view that a quasi-scaling state rather than a transient state exists during the evolution of a highly ordered froth, and this conclusion gives rise to an alternative explanation of the early results of Aboav (1980).

CA-0896:
The Localisation Industry in Ireland and the Issues it Encounters,
Michelle Timmons, Gary Hearne, Andy Way and Mark Roantree.

It is now widely accepted within the computer industry that Ireland is a world centre of excellence in software localisation, with most major software firms having a significant presence in the field in this country. Most of the larger companies have constructed their own framework manuals which provide a model for their internationalisation and testing processes, but there is no formalised process.

In this paper we provide an overview of the localisation industry in Ireland, along with documenting the processes involved. We discuss the concepts and issues that arise when globalising software products and documentation for these products. We also provide a platform for specification of localisation tools which forms the basis of our research, which is aimed at the possibility of developing generic tools in order to verify the integrity and consistency of the structure of Document and Help files once translated.

CA-0996:
Filtering NEWS and WWW Pages with the BORGES Information Filtering Tool,
Alan F. Smeaton.

The BORGES project was a research and development project part-funded by the European Commission under the LIBRARIES program which developed, implemented and evaluated an information filtering service for WWW pages and USENET news. What made BORGES different from other IF applications was that it was user-driven and developed as a service provided by a University library. Our role within the project was to enhance the basic information filtering functionality with some information retrieval techniques and in this paper we describe both our outline of what contribution we would like to have made to BORGES, and our final impact on the project. This included user-controlled search term disambiguation and user profile expansion via the addition of related search terms.

CA-1096:
A WWW Survey of Postgraduate Courses in Information Engineering,
Alan F. Smeaton and Jerzy R. Nawrocki.

This document presents the results of a WWW search for information on courses in Information Engineering at Universities, especially in Europe. It is part of a combined effort to assimilate a review of the state of information engineering tecahing in Europe; other efforts include one-on-one interviews and questionnaires. For the purposes of the search, Information Engineering is defined as covering all aspects of the document lifecycle, from creation, indexing, storage, manipulation, filtering, categorisation and retrieval, while the definition of the document encompasses all kinds of multimedia information.

CA-1196:
An Investigation into the Effects of Convergence in Genetic Algorithms,
Brian Costello and David Sinclair.

Genetic Algorithms (GA) have been widely used in Academia for over twenty years,(and for the last ten years have found their way into many Industrial applications. The beauty of GA's is that the concept is intuitively very simple to understand. However, clustering may be caused by a number of different reasons, unexpected ways, which may effect the eventual outcome of the search. It is our goal to develop a highly robust, generalised algorithm to overcome the effects of such clustering, which is applicable to a wide variety of objective functions. To this end, four different objective functions were used to test our hypothesis.

CA-1296:
Real-time Collision Detection for Computer Graphics ,
Carol O'Sullivan and David Sinclair.

The range and variety of applications which need Collision Detection is enormous. In this paper, we are interested in applications where animations must be produced in real-time, where it is not known in advance how entities in the animation will behave. Most sequential algorithms are actually Hybrid algorithms, involving two or more phases of detection at varying degrees of accuracy, usually referred to as the Broad Phase, and the Narrow Phase. We identify the property of Dimension Reduction as being intuitively appealing, and implement a Hybrid algorithm with both phases based on this concept. We based our Broad Phase on bounding boxes to identify pairs of objects close to collision, and present a new Narrow Phase algorithm which uses dimension-reduction and 2D intersection tests of convex hulls to provide almost accurate collision detection between any convex polytopes. In a sample application, the accuracy of the collision detection was proven to be high, as no gaps were evident between objects when the narrow phase was activated. Through empirical studies and observation, we identified the fact that using Dynamically-sized bounding boxes for the Broad phase gave rise to more constant frame rates than using Fixed-sized bounding cubes.

CA-1396:
Computer Generated Celtic Knotwork using L-system Graphics,
Dave Carroll and Mike Scott.

In any domain where objects are sufficiently regular there may be an opportunity to develop a grammar based model of their geometric structure. One such domain is Celtic knot work particularly knot work of the type found in Celtic manuscripts such as the Book of Kells. Modelling such knot work using a grammar based approach based on L-systems is the subject of this paper. The paper describes the addition of a curve drawing construct to L-systems to allow the encoding of curved figures. A construct is developed which enables Bezier curves to be drawn using L-systems. This construct is used to model Celtic knot patterns. The patterns are encoded in an L-system and used as meta-patterns to construct simple carpet-pages made up of multiple knot patterns. A prototype production rule system and graphics interpreter for L-system strings using turtle graphics is also described. The problem of under-crossing and over-crossing of knot cords is addressed using Z-axis perturbation. This enables the generation of self-avoiding curves in three-space to give realistic looking knot patterns in 3D.

CA-1496:
Electronic Cash Protocols,
David Gibbons and Mike Scott.

In a presentation to the Institute for International Research, Graeme Ward observed that retail banks as we know them today will largely have ceased to exist twenty years from now. Those that do survive will be doing business in a vastly different way than they are today. The key to their survival will be the ease with which their customers can access and manage their money. Retail banks are preparing themselves for this challenge. At one end of the scale, phone-banking services are being offered; at the other end, on-line 'virtual' banking services over the Internet are being explored (see Security First Network Bank's home page for example). Another approach is the provision of an electronic cash system. This paper presents the essential features of an electronic cash system and examines the protocols which can support these features.

CA-1596:
Distributed Objects: An examination of analysis, design and run-time models,
Dave Halpin and John Murphy.

In this paper, the objects models needed to design and support a distributed object architecture are examined. The two fundamental points which underpin this work are (1) object-oriented modelling provides the most natural and intuitive way of modelling the real world, but (2) coming up with a representation of a real world domain in terms of a set of objects which are flexible and extensible with regard to the variability of that domain is a very difficult task.

CA-1696:
Approaches to Semantic Heterogeneity in Federated/Interoperable Systems,
E. Cass and J. Murphy.

Federated and interoperable systems develop from component systems which may be quite disparate. This paper explores the nature of semantic heterogeneity in such systems and reviews efforts to deal with it. A context based classification helps to disentangle schematic and purely semantic differences. Integration and maintenance of federated schemas needs painstaking efforts of understanding, enriching and re-expressing source semantics though it may be partly automated. With large scale systems the rise in the number of possible conflicts however makes schema integration imposible. Contexts may help to achieve expression of database schemas at a higher (more concentrated) semantic level through which mediators can process queries.

CA-1796:
The Application of Code Based Metrics to a Proprietary Language,
John Evans and Renaat Verbruggen.

The use of code based metrics can be desirable due to their static nature. Such measures are chararacterised by a simplicity in their calculation, ease in their collection by virtue of their application to the source code alone, and tool support which may be developed without difficulty. However, the suitability of such metrics varies between programming language.
This paper presents the results obtained from the application of a number of metrics to software written in PLEX-C, a proprietary language used by Ericsson Telecom. A description of each of the metrics is offered together with the statistical technique on which the results are based.
The results found indicate that for this language the commonly used Lines of Code measure is not reliable. Further a metrics exists which appears to be more reliable and consistent than the Lines of Code measure.

CA-1896:
A Simulated Annealing and a Genetic Algorithm for Routing Traffic in Telecommunications Networks,
Paula Carroll and David Sinclair.

This paper examines the use of heuristic techniques, namely Genetic Algorithms and Simulated Annealing, to solve the routing problem in both hierarchal and non-hierarchal telecommunications networks. The routing problem is described. The design of a good telecommunications network and the routing of traffic are often treated as sub-parts of the same problem. In this paper we do not investigate design or capacity expansion issues. The heuristics search for optimal routes on a circuit switched network with a fixed topology. The results could be used to optimise use of the network until such time as the network is to be expanded or upgraded to meet changing demands. The algorithm implementations are explained and results are presented.

CA-1996:
Plausible approaches for legacy system management using reverse engineering techniques,
R.Scannell and J.P. Murphy.

A substantial proportion of the legacy systems in use today will continue to be in operation into the next century. Arguments in favour of investment where the objective is only to improve the maintainability of such existing systems can be made. The ability to reverse engineer requirements and design documentation from existing systems provides the potential to apply more powerful process oriented maintenance models to an organisations legacy systems than the traditional task oriented model. Further, there is a wealth of knowledge embedded in existing systems. Extracting this knowledge using similar reverse engineering techniques will assist in any migration from the existing systems should that be the objective. Current activity in developing and implementing such techniques are described.

CA-2096:
Direct and Underspecified Interpretations of LFG f-Structures,
Josef van Genabith and Richard Crouch.

We describe an approach to interpreting LFG f-structures [Kaplan and Bresnan:82] truth-conditionally as underspecified quasi-logical forms. F-structures are either interpreted {\em indirectly} in terms of a homomorphic embedding into Quasi Logical Form (QLF) [Alshawi and Crouch:92] representations or {\em directly} in terms of adapting QLF interpretation clauses to f-structure representations. We provide a reverse mapping from QLFs to f-structures and establish isomorphic subsets of the QLF and LFG formalism. A simple mapping which switches off QLF contextual resolution can be shown to be truth preserving with respect to an independently given semantics [Dalrymple et al:95]. We compare our proposal with approaches discussed in the literature.

CA-2196:
Josef van Genabith and Richard Crouch.

Lexical Functional Grammar (LFG) f-structures are "high-level" syntactic representations; Quasi-Logical Forms (QLFs) and Underspecified Discourse Representation Structures (UDRSs) are recent underspecified and truth-conditionally interpreted semantic representations. It turns out that there are a number of striking structural similarities between f-structures, QLFs and UDRSs, so much so that f-structures can be "read" as QLFs or UDRSs and hence be assigned either a direct or indirect underspecified truth-conditional interpretation. Indirect interpretations are defined in terms of homomorphic embeddings of f-structures into the QLF or UDRT formalism. Based on these we give a direct QLF-style semantics for f-structures and show how the UDRT inference system can be exploited by the translation images of f-structures. The translation functions can be shown to be truth-preserving with respect to an independent semantics (and hence with respect to each other...) in the sense that the set of disambiguations obtained from a translation image of an f-structure coincides with the set of disambiguations obtained from the independent semantics for that f-structure. A number of grammatical interpretation components for LFG grammars have been proposed in the literature. Our present proposal differs in several respects: unlike earlier proposals it associates f-structures with an explicit underspecified truth conditional semantics; unlike earlier proposals interpretation does not require levels of representation distinct from f-structure representations. Finally, the ease with which "syntactic" f-structure representations translate into the independently motivated "semantic" QLF and UDRT representations provides independent support for a level of representation such as LFG f-structure.

CA-2296:
The Resource Levelling Problem (RLP) - Mixed-Integer Programming Formulations,
Allen Mushi and Micheal O'hEigeartaigh.

Resource Levelling is a variation of the resource-constrained scheduling problem. Initially the critical path through a network of tasks (each with a given resource requirement) is determined with all activities tentatively scheduled at their latest start times. For this schedule, a profile of the total resource requirement over time is obtained (i.e. the sum of the resource requirements of the tasks overlapping at each particular time are calculated). The problem is then to level the resource peaks as much as possible, subject to the time constraints of the activities. It is an NP-Hard combinatorial optimisation problem with major applications in manufacturing Industry. In this paper, we present four different Mixed Iinteger Programming formulations of the RLP. We conclude that the time indexed formulations give a better lower bound on the linear programming relaxation and hence a better starting point for the Integer Progarmming algorithms.

CA-2396:
E/VPL - A System for Modelling and Enacting Software Processes,
Rory O'Connor and Tony Moynihan.

This research addresses the technical issues involved in specifying and mechanically supporting the software development process and proposes a system called Enhanced Visual Process Language (E/VPL), which is a graphically-oriented process modelling system. A prototype system has been constructed to implement E/VPL and is evaluated to assess its potential as a process modelling system.

CA-2496:
PCLinda: A tool for Parallel Programming,
Oscar Manso and David Sinclair.

This paper presents the implementation characteristics of PCLinda, a programming environment that permits the use of a Network of Personal Computers as a Parallel Computer. The system is based on the Linda coordination model and it is implemented over Windows using Windows Sockets as its protocol for communications. PCLinda provides the user with a complete framework for the development and debugging of parallel and distributed applications under this environment. This paper also presents the application of the system in the parallel implementation of a Simulated Annealing algorithm.

CA-2596:
Sampling with unequal probabilities without replacement: the Poisson strategy,
Jane Horgan.

Given a population containing N units, it is required to draw a random sample of n distinct units in such a way that the probability of the ith unit to be in the sample is proportional to its "size". While numerous procedures exist, most are too complex and impracticable when n is large. An exception, Poisson sampling, is considered; we show that a ratio estimator with Poisson sampling is more efficient than the customary estimators used in conjunction with unequal probability with-replacement and fixed-sample-size without replacement selection procedures. We also derive conditions for which it is more efficient than the Horvitz-Thompson estimator used with Poisson sampling.

CA-2696:
OMCS - A distance learning application using the Java programming language,
Dan Clancy and Tony Moynihan.

The author is a software engineer at Lucent Technologies [formerly AT&T] at the research and development labs in Columbus, Ohio, USA. The specifications for this project are partly inspired by our current product, which is the software application for the control of GSM networks, known as the OMC, or Operation and Maintenance Centre. The aim of the Simulator is to create a similar 'look and feel' to the user interface to a real OMC, for the purpose of training an operator of the OMC system. Evaluation of the Java language itself, for use in distance learning applications, is the other goal of the dissertation.

CA-2796:
Automation of Task Analysis for Human Computer Interaction,
Edmund Cheung and Renaat Verbruggen.

Numerous research efforts have been put into the area of how to design and evaluate the computer user interface but not much about keeping the consistency of the user interface. For instance, buttons overlap each other in the dialog box. From the internationalisation point of view, the main task is to get the user interface translated into different languages. The software engineers and translators spend a lot of time translating the product a nd then manually fixing the user interface so that the localisation process at the moment is very inefficient and takes too long. Moreover, the translators need to decide which user interface is the best one. The goal of this research is to design a set of rules and a list of tasks, which incorporates simple object descriptions, that can maintain the user interface integrity. The user interface based on these guidelines would automatically fix up or translate itself. The future goal would be that we will develop the Automatic User Interface Checking Environment (AUICE) to identify knowledge, logical relations, and rules used by interface designers, and to embody them in AUICE as a way of further automating the design process.

CA-2896:
A Cellular Automata Model to Simulate Traffic Flow in Urban Networks,
Abdulhakeem Hammad, Heather J. Ruskin and Micheal O'hÉigeartaigh.

In this paper we present a three state deterministic Cellular Automata model to simulate traffic in urban roads. The time-evolution of the automata follows simple rules. The state of each site at the next time step is determined from the state of the site itself and those of the two nearest neighbour sites, where space and time are discrete in this model. We perform our simulation, applying a parallel update strategy, on a number of small size lattices with open and closed boundary conditions. The model shows a change from laminar traffic in light traffic to stop-start waves in dense traffic.