Note: Whereas some full texts provided through the links on this webpage are identical to the published versions, others are prepublication versions before final typesetting, and others have postpublication corrections. Even for online versions on this site having identical text to the printed version, the numbering of pages, figures and examples may differ between versions, so please do not refer to numbering in the online versions on this webpage and do not literally quote from these texts.
2012 · 2011 · 2010 · 2009 · 2008 · 2007 · 2006 · 2005 · 2004 · 2002 · 2000 · 1999 · 1998 · 1996 · 1995 · 1994 · 1993 · 1992 · 1991 · 1990 · 1989 · 1988 · 1987 · 1986 · 1985 · 1984 · 1983 · 1980
Vasiļjevs, Andrejs, Pedersen, Bolette Sandford, De Smedt, Koenraad, Borin, Lars and Skadiņa, Inguna. (2011). META-NORD: Baltic and Nordic Branch of the European Open Linguistic Infrastructure. In: Sjur Nørstebø Moshagen and Per Langgård (eds.), Proceedings of the NODALIDA 2011 Workshop Visibility and Availability of LT Resources, volume 13 of NEALT Proceedings Series, pages 18–22, Northern European Association for Language Technology (NEALT).
This position paper presents META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. Goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described.
De Smedt, Koenraad and Rögnvaldsson, Eiríkur (2011). The META-NORD Language Reports. In: Sjur Nørstebø Moshagen and Per Langgård (eds.), Proceedings of the NODALIDA 2011 Workshop Visibility and Availability of LT Resources, volume 13 of NEALT Proceedings Series, pages 23–27, Northern European Association for Language Technology (NEALT).
As part of the META-NORD project, the state of affairs in language technology in the Nordic and Baltic countries is being described in a set of eight reports. Each language report describes the situation of a language community and the position of the language service and language technology industry for that language. This position paper presents our methodology and preliminary findings. The final reports will be published in the META-NET series of white papers for all main languages of Europe.
Rosén, Victoria and De Smedt, Koenraad (2010). Syntactic annotation of learner corpora. In: Hilde Johansen, Anne Golden, Jon Erik Hagen, and Ann-Kristin Helland (eds.) Systematisk, variert, men ikke tilfeldig. Novus forlag, pp. 120–132.
Syntactic annotation of learner corpora is useful for investigating the grammatical properties of learner language. We discuss two approaches to syntactic annotation based on different methodological choices. One approach, recently proposed in the literature, is the manual annotation of learner language with dependency relations. Another approach, which we present as an alternative, is based on automatic parsing of a ‘correct’ version with an L2 grammar.
Gillis, Steven; Daelemans, Walter; De Smedt, Koenraad (2009). Artificial Intelligence. In: Sandra, Dominiek; Östman, Jan-Ola, Verschueren, Jef (eds.) Cognition and Pragmatics (Handbook of Pragmatics Highlights, Vol. 3, pp. 16-40). Amsterdam: John Benjamins.
Artificial intelligence (AI) is a branch of computer science in which methods and techniques are developed that permit intelligent computer systems to be built. We briefly describe the foundations of AI and review the history of language processing research in AI. Then we explain general aspects of knowledge representation and the most influential knowledge-based paradigms. Finally, we show how linguistic symbol manipulation is applied in semantics and pragmatics.
Dyvik, Helge; Meurer, Paul; Rosén, Victoria; De Smedt,
Koenraad (2009). Linguistically
Motivated Parallel Parsebanks. In: Passarotti, Marco; Przepiórkowski,
Adam; Raynaud, Sabine; Van Eynde, Frank (Eds.)
Parallel grammars and parallel treebanks can be a useful method for studying linguistic diversity and commonality. We use this approach to study how arguments to similar predicates are realized across languages. To that end, we formulate formal principles for aligning at phrase and word levels based on translational correspondences at predicate-argument level. A first version of a new tool for creating, storing, visualizing and searching treebank alignment at different levels has been constructed.
De Smedt, Koenraad (2009). NLP for writing: What
has changed?. In: Domeij, Rickard; Johansson Kokkinakis, Sofie;
Knutsson, Ola; Sofkova Hashemi, Sylvana (Eds.)
It might appear that few advances have been made in proofreading technology since the 1980s . On the one hand, spelling and grammar checking have become standard features in many kinds of applications that involve writing. On the other hand, a number of advanced research ideas and results from the 1980s do not seem to have been applied or further pursued in newer research. The present moment is therefore an opportunity to look back and reflect on what has been done so far and what has changed.
Rosén, Victoria; Meurer, Paul; De Smedt, Koenraad
(2009). LFG
Parsebanker: A Toolkit for Building and Searching a Treebank as a Parsed
Corpus. In: Van Eynde, Frank; Frank, Anette; De Smedt, Koenraad; Van
Noord, Gertjan (Eds.)
We present the LFG Parsebanker, a comprehensive toolkit for interactive incremental construction of a treebank as a parsed corpus. This web-based toolkit offers an environment for batch and interactive parsing, versioning, inspection of structures, discriminant-based disambiguation, and statistics. It has recently been extended with a structural search facility.
Heeringa, Wilbert; Gooskens, Charlotte; De Smedt,
Koenraad (2008) What Role
does Dialect Knowledge Play in the Perception of Linguistic Distances?
The present paper investigates to what extent subjects base their judgments of linguistic distances on actual dialect data presented in a listening experiment and to what extent they make use of previous knowledge of the dialects when making their judgments. The point of departure for our investigation were distances between 15 Norwegian dialects as perceived by Norwegian listeners. We correlated these perceptual distances with objective phonetic distances measured on the basis of the transcriptions of the recordings used in the perception experiment. In addition, we correlated the perceptual distances with objective distances based on other datasets. On the basis of the correlation results and multiple regression analyses we conclude that the listeners did not base their judgments solely on information that they heard during the experiments but also on their general knowledge of the dialects. This conclusion is confirmed by the fact that the effect is stronger for the group of listeners who recognised the dialects than for listeners who did not recognise the dialects on the tape.
De Smedt, Koenraad (2008) Processing and distributing language data.
Rosén, Victoria; Meurer, Paul; De Smedt, Koenraad
(2007). Designing and Implementing Discriminants for LFG Grammars. In: King,
Tracy Holloway and Butt, Miriam (Eds.)
We extend discriminant-based disambiguation techniques to LFG grammars. We present the design and implementation of lexical, morphological, c-structure and f-structure discriminants for an LFG-based parser. Chief considerations in the computation of discriminants are capturing all distinctions between analyses and relating linguistic properties to words in the string. Our work is mostly tested on Norwegian, but our approach is independent of the language and grammar.
Rosén, Victoria; De Smedt, Koenraad (2007). Theoretically motivated Treebank coverage. In: Joakim Nivre,
Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit (Eds.)
The question of grammar coverage in a treebank is addressed from the perspective of language description, not corpus description. We argue that a treebanking methodology based on parsing a corpus does not necessarily imply worse coverage than grammar induction based on a manually annotated corpus.
Lech, Till Christopher; De Smedt, Koenraad (2007). Ontology extraction for
coreference chaining. In: Johansson, Christer (Eds.).
The KunDoc project investigates coreference chaining with ontology-based methods. In this paper, we discuss knowledge-based methods for coreference chaining and in particular the use of ontologies and their acquisition from a corpus. We present the KunDoc methodology and its implementation. We use concepts and their interrelations extracted from a corpus of Norwegian newspaper articles to build up domain-specific ontologies which contribute with selectional restrictions on possible co-referents. We expect to see an improvement over methods that do not employ any semantic knowledge.
Rosén, Victoria; De Smedt, Koenraad; Meurer, Paul
(2006). Towards a Toolkit
Linking Treebanking to Grammar Development. In: Hajič, J; Nivre, J.
(Eds.)
We present advances in the construction of a treebanking toolkit that implements discriminants at several levels and we present improvements in its web-based interface. We will first outline our use of discriminants in the context of LFG-based parsing. Then, we will highlight some new features in our treebanking interface, including statistics. Finally, we will discuss the linking of treebanking and grammar development.
Lech, Till Christopher; De Smedt, Koenraad (2006). Dreistadt:
A language enabled MOO for language learning. In:
Dreistadt is an educational MOO (Multi User Domain, Object Oriented) for language learning. It presents a virtual world in which learners of German communicate with their fellow learners, teachers and native language users in other locations via the Internet. While the original Dreistadt had an artificial command language for interaction with the system, we have provided it with natural language processing capabilities, in order to allow a more seamless linguistic interaction. For this purpose, an NLP interface for controlled German has been added. The student's natural language commands are translated to system internal instructions by a set of syntactic, semantic and pragmatic analysis tools. The system is capable of handling pronouns and other referring expressions by applying domain knowledge and includes an inferencing component based on predicate logic.
Rosén, Victoria; De Smedt, Koenraad; Dyvik, Helge;
Meurer, Paul (2005). TREPIL: Developing Methods and Tools for Multilevel Treebank
Construction. In: Civit, Montserrat; Kübler, Sandra; Martí, Ma. Antònia
(Eds.)
Current trends in language technology require treebanks that do not stop at the level of constituent structure, but include deeper and richer levels of analysis, including appropriate meaning structures. Capturing sufficient detail at different levels of linguistic description is too complex a task to be practically achievable by manual annotation or shallow parsing; rather it requires sophisticated tools that help secure the consistency of parallel but different structures. We are constructing a multilevel treebanking tool that incorporates a deep parser and grammar for Norwegian. Thus, we are tightly linking our treebank to grammar development so as to achieve a sound embedding in grammatical theory and yield more useful results for applications.
Rosén, Victoria; Meurer, Paul; De Smedt, Koenraad
(2005). Constructing a parsed corpus with a large LFG grammar. In: Butt,
Miriam; King, Tracy (Eds.)
The TREPIL project (Norwegian treebank pilot project 2004-2008) is aimed at developing and testing methods for the construction of a Norwegian parsed corpus. Annotation of c-structures, f-structures and mrs-structures is based on automatic parsing with human validation and disambiguation. Parsing is done with a large LFG grammar and the XLE parser. We propose a method for efficient disambiguation based on discriminants and we have implemented a set of computational tools for this purpose.
Nivre, Joakim; De Smedt, Koenraad; Volk, Martin
(2005). Treebanking in Northern Europe: A White Paper. In: Holmboe, Henrik
(Ed.)
We present the case for an extensive scientific effort to build up large treebanks for the Nordic and Baltic languages, as a step towards developing advanced multilingual communication technologies for these languages in the future.
Lech, Till; De Smedt, Koenraad (2005). Enhancing Semantic Annotation through Coreference Chaining: An
Ontology-based Approach. In: Handschuh, Siegfried; Declerck, Thierry;
Koivunen, Marja-Riitta (Eds.)
Semantic annotation of natural language text requires a certain degree of understanding of the document in question. Especially the resolution of unclear reference is a major challenge when detecting relevant information units in a document. The ongoing KunDoc project examines how domain specific ontologies can support the task of coreference chaining in order to enhance applications such as automatic annotation, information extraction or automatic summarization. In this paper, we present a robust methodology for acquisition of semantic contexts that does not depend on a thorough syntactic parsing as necessary tools often are unavailable for “smaller” languages. Based on a shallow corpus-analysis, verb-subject relations constitute the framework for the extraction of semantic contexts. Our approach either adds the semantic contexts to concepts and instances in an existing ontology or builds up the domain knowledge necessary for coreference chaining from scratch.
De Smedt, Koenraad; Liseth, Anja; Hassel, Martin;
Dalianis, Hercules (2005). How short
is good? An evaluation of automatic summarization. In: Holmboe, Henrik
(Ed.)
The evaluation of automatic summarization is important and challenging, since in general it is difficult to agree on an ideal summary of a text. We report on research advances in summarization evaluation obtained in the context of ScandSum, a researcher network targeted at automatic summarization for the Scandinavian languages, supported by the Norwegian Council of Ministers under its Language Technology programme (2000-2004).
De Smedt, Koenraad; Andersen, Gisle (2005). Linking a
Norwegian web portal for Language Technology to its Nordic partner
sites. In: Holmboe, Henrik (Ed.)
In order to achieve a truly Nordic perspective, the Norwegian Documentation Centre for Language Technology has during its four years of activities (2001-2004) intensively cooperated with the other Nordic documentation centres in Denmark, Finland, Iceland and Sweden. This cooperation has eventually crystallized into a system for linking together the various national web portals. Located at http://www.norskdok.uib.no/, the Norwegian portal is aimed at providing a news service as well as a comprehensive and updated survey of activities in the field, language resources, networks and contact information of the participants. This information has been made searchable across all the Nordic documentation centres.
Fersøe, Hanne; Rögnvaldsson, Eiríkur; De Smedt,
Koenraad (2005). NorDokNet: Network of Nordic Documentation Centres - Contacts
to Future Baltic Partners. In: Holmboe, Henrik (Ed.) Nordisk
The Nordic network of national documentation centres for language technology, NorDokNet, has been operational since September 1st of 2001. The results of the network collaboration are visible on the web portals of the national documentation centres where the organization and classification fo the information follow a common framework for the data as decided in the network. Access to the portals is easy through http://nordoknet.org. The paper focuses on the visit of a delegation to the Baltic countries carried out from the 24th to the 29th of October, 2005.
Dalianis, Hercules; Hassel, Martin; De Smedt,
Koenraad; Liseth, Anja; Lech, Till Christopher; Wedekind, Jürgen (2004). Porting and
evaluation of automatic summarization. In: Holmboe, Henrik (Ed.)
In the context of the Nordic research network on summarization SCANDSUM, a Swedish system for automatic summarization system has been ported to Danish and Norwegian, in addition to a number of other languages including even Farsi. The principles and techniques of this research are described. Furthermore, the system is being extensively evaluated. It is argued that evaluation is an integral and important part of the research effort.
De Smedt, Koenraad; Black, William J. (2002). Humanities. In:
Adelsberger, H. H.; Collis, B.; Pawlowski, J. M. (Eds.)
Innovation in humanities education and research is stimulated by new technologies for the processing of language, speech, music, visual arts, and other expressions of the human mind. Three main topics will be discussed: the role of large scale resources such as text corpora and digital archives; advanced methods and tools for processing and simulation; and the use of courseware, multimedia and hypermedia in humanities teaching and learning.
De Smedt, Koenraad (2002). Some reflections on
studies in humanities computing.
In any academic field, research advances tend to percolate naturally to higher education in that field. In recent years, there has been a slow but steady increase in the number of courses and degree programs in humanities computing. This article presents some reflections on the status of humanities computing in higher education, in terms of curricula, degrees, and international student and staff mobility. The most important issue is the question of a what humanities computing degree should offer, in view of the wide interdisciplinarity of the field. Different institutions have coped with this question in quite different ways. With potentially far reaching consequences on methodology in the various relevant disciplines, humanities computing is bound to change both what and how humanities students learn. Curriculum innovation that aims to integrate computing in the humanities is a difficult process that requires reflection, cooperation, teacher training and other supporting actions.
Birkeland, Ingebjørg; De Smedt, Koenraad (2002).
This report presents a preliminary assessment of the SOCRATES/ERASMUS action "Thematic Network Projects" with special reference to Norwegian participation. Part I is a brief presentation of background information. Part II analyses some results, particularly from 16 projects in the period 1996-2000. In Part III, the impact of the action and its results is analysed and further discussed.
De Smedt, Koenraad; Rosén, Victoria (2000). Automatic
proofreading for Norwegian: The challenges of lexical and grammatical
variation. In: Nordgård, Torbjørn (Ed.)
In this paper we present some techniques, experiences and results from the SCARRIE project, which has aimed at developing improved proofreading tools for the Scandinavian languages. The focus is on methods which were used for spelling and grammar checking and particularly some novel analyses and treatments dealing with the extensive lexical and grammar variation in Norwegian Bokmål. The major findings are that (1) since in Bokmål, lexical variants may differ with respect to grammatical features, stylistic replacement at the word level causes a need for grammar checking, and (2) the different systems for gender agreement in Bokmål can be handled in an economical way by a single grammar and lexicon if the features in the lexicon are interpreted dynamically depending on the subnorm or style preferred by the author.
Rosén, Victoria; De Smedt, Koenraad (2000). *Er
korrekturlesningsevnen di god? Resultater fra SCARRIE. In: Jansen
Westvik, O.; Swan, T. Mørck, E.; Lorentz, O. (Eds.)
SCARRIE er et EU-finansiert forskningsprosjekt om automatisk korrekturlesning for de skandinaviske språkene. HIT-senteret hadde ansvar for den norske delen av prosjektet, som bare hadde midler til å ta for seg bokmål. I denne artikkelen beskriver vi nærmere noen av metodene som ble brukt og resultatene som ble oppnådd. Vi konkluderer at kvaliteten på det leksikalske grunnlagsmaterialet er avgjørende for prestasjonene til nesten alle komponentene i systemet. Vi har investert mye i SCARRIE-ordlisten og mener at den per i dag er blant de beste leksikalske kildene for bokmål. Stilverdi er en verdifull tilleggsressurs for stilriktig korreksjon av bokmål. Grammatisk korreksjon er utviklet slik at det fungerer noenlunde bra in vitro, men det er svært vanskelig å gjennomføre på en pålitelig måte i autentiske tekster. Avansert sammensetningsanalyse, derimot, kan bidra til en dramatisk forbedring av ordgjenkjenning og dermed mer tilfredsstillende automatisk korrekturlesning.
De Smedt, Koenraad; Rosén, Victoria (1999). Datamaskinell
skrivestøtte. Lindgren, Birgitta (Ed.)
For mindre språk, deriblant de skandinaviske språkene, er et mindre utvalg av verktøy tilgjengelig per idag. Avis- og forlagsbransjen har likevel et klart behov for ulike former for automatisert skrivestøtte. Korrekturlesning er alltid normerende, enten denne prosessen utføres manuelt eller ved hjelp av datamaskiner. Mange typer av kunnskap må legges inn og kombineres for å klare ulike oppgaver innenfor korrekturlesning. Innenfor Scarrie-prosjektet er det blitt foretatt utførlig arbeid med hensyn til leksikalsk og annen variasjon relatert til subnormer i Bokmål.
De Smedt, Koenraad; Gardiner, Helen; Ore, Espen;
Orlandi, Tito; Short, Harold; Souillot, Jacques; Vaughan, William (Eds.) (1999).
The book presents an analysis of the status quo with respect to computing in humanities education, points out related developments in society, presents current innovations and plans for the future, identifies best practice and makes proposals and recommendations. The different chapters in this book address these issues from different perspectives and in different disciplines: formal methods; textual scholarship; computational linguistics and language engineering; non-European languages; and history of art, architecture and design. A common conclusion is that computing is strongly affecting all the humanities disciplines and is a catalyst for educational innovation and cooperation, while at the same time presenting great challenges for higher education systems.
Orlandi, Tito; Burnard, Lou; Buzzetti, Dino; De Smedt,
Koenraad; Kropac, Ingo; Souillot, Jacques; Thaller, Manfred (1999). European
studies on formal methods in the humanities. In De Smedt, Koenraad; Gardiner,
Helen; Ore, Espen; Orlandi, Tito; Short, Harold; Souillot, Jacques; Vaughan,
William (Eds.)
This chapter presents an investigation of some methodological questions related to teaching and learning at humanities faculties in Europe, in particular those arising from the use of digital technologies.
De Smedt, Koenraad; Black, William; Van den Bosch,
Antal; Lavid López, Julia; Mc Kevitt, Paul; Way, Andy (1999). European studies
on computational linguistics. In De Smedt, Koenraad; Gardiner, Helen; Ore,
Espen; Orlandi, Tito; Short, Harold; Souillot, Jacques; Vaughan, William (Eds.)
This chapter assesses the situation of CL courses and programmes in the European educational landscape, and the use of educational CL tools for teaching and learning.
De Smedt, Koenraad (1998). Advanced computing in
the humanities: A network approach.
Humanities faculties are faced with a challenge to innovate both the learning content and the delivery of courses. The thematic network project on Advanced Computing in the Humanities (ACO*HUM) investigates the impact of new information and communication technologies on humanities education. ACO*HUM brings together one hundred institutions for higher education to develop a common strategy. Attention is paid to curriculum innovation and to new methods for teaching and learning. The current pilot areas are computational linguistics, historical informatics, computing in history of art, and computing for non-European languages. In our experience, the willingness to change humanities education is real but must be supported by substantial international infrastructural measures to secure cooperation and efficiency. Among the necessary measures we name the establishment of an international repository of computational resources, a brokerage for competence in teaching expertise, and technical and organizational support for transnational distributed ODL.
De Smedt, Koenraad (1998). Teaching and learning
computational linguistics in an international setting. In:
Computational Linguistics has long been a forerunner in the use of humanities computing technology. However, there are many organisational problems to be addressed to maintain and improve the quality of teaching and learning of Computational Linguistics. Advanced Computing in the Humanities (ACO*HUM) is an international network which investigates the use of new technologies in Humanities teaching and learning. It promotes international co-operation a.o. for teaching and learning Computational Linguistics.
De Smedt, Koenraad (1998). Beyond courseware as
giftpaper: Computers as exploratory learning tools for the humanities.
In:
It is a common misconception that the mere introduction of multimedia in the classroom, or putting lecture notes on the web, will significantly change education. Most multimedia courseware does not offer more than book-style materials enriched with some multimedia and hypermedia as giftpaper wrapped on the outside. Real pedagogical benefits are to be expected from environments which let the humanities student creatively explore and construct. Examples of advanced simulation environments which meet those needs are Tarski's World and the LFG Workbench. The introduction of creative exploratory tools in the humanities deserves a more important place in educational innovation strategies.
Rosén, Victoria; De Smedt, Koenraad (1998). SCARRIE: Automatisk
korrekturlesning for skandinaviske språk. In: Faarlund, Jan Terje;
Mæhlum, Brit; Nordgård, Torbjørn (Eds.)
I denne artikkelen ønsker vi å presentere SCARRIE, som er et EU-støttet forskningsprosjekt som er rettet mot hvordan lingvistiske kunnskaper kan forbedre automatisk korrekturlesning. Vi vil først vise hva slags skrivefeil en finner i norske tekster. Deretter vil vi illustrere hvordan lingvistiske kunnskaper kan føre til vesentlige forbedringer i programmer for automatisk korrekturlesning på tre områder: fonologisk analyse, morfologisk analyse og syntaktisk analyse.
De Smedt, Koenraad; Horacek, Helmut; Zock, Michael
(1996). Architectures for natural language generation: Problems and
perspectives. In: Adorni, G.; Zock, M. (Eds.),
Current research in natural language generation is situated in a computational linguistics tradition that was founded several decades ago. We critically analyse some of the architectural assumptions underlying existing systems and point out some problems in the domains of text planning and lexicalization. Guided by the identification of major generation challenges viewed from the angles of knowledge-based systems and cognitive psychology, we sketch some new directions for future research.
Dijkstra, Anton; De Smedt, Koenraad (Eds.) (1996).
This book is an advanced textbook giving a multidisciplinary overview of current computational models in the domain of human language processing. The first part of the book introduces the basic two paradigms for computational modelling: the Artificial Intelligence paradigm, using symbol manipulation, and the connectionist approach, using neural networks. The second part presents chapters on various subdomains of language comprehension, ranging from speech recognition to discourse comprehension. Part three presents chapters on structure for language production, ranging from discourse planning to articulation and handwriting. Each chapter explains and compares several representative computer models against the background of current experimental and theoretical work in psycholinguistics. The chapters can be looked at individually giving a modular structure which allows for a selection of chapters depending on course load, or thematic restrictions. Computational psycholinguistics is, therefore, an invaluable course text for advanced students in computational linguistics, psycholinguistics and cognitive science.
Dijkstra, Anton; De Smedt, Koenraad (1996). Computer
modelling. In: Dijkstra, Anton; De Smedt, Koenraad (Eds.) (1996).
This chapter provides an introduction to computer modeling and simulation in the field of language understanding and production.
Daelemans, Walter; De Smedt, Koenraad (1996). Computational modelling in Artificial Intelligence. In: Dijkstra,
Anton; De Smedt, Koenraad (Eds.) (1996).
The aim of this chapter is to support the readers’ understanding of AI-based computational models of language by introducing the essentials of the methods that underlie those models.
Andriessen, Jerry; De Smedt, Koenraad; Zock, Michael
(1996). Discourse planning: Empirical research and computer models. In:
Dijkstra, Anton; De Smedt, Koenraad (Eds.) (1996).
This chapter outlines theoretical issues in the production of multisentential discourse, reviews experimental evidence and presents some computer models for this task.
De Smedt, Koenraad (1996). Computional
models of incremental grammatical encoding. In Dijkstra, Anton; De
Smedt, Koenraad (Eds.) (1996).
This chapter presents a selection of linguistic and psycholinguistic phenomena, leading to the formulation of some important problems and assumptions in grammatical encoding. Then the chapter focuses on the incremental mode of sentence production. A number of computational models which account for this processing mode are discussed in detail. A comparison is made between models developed for different languages (Dutch, German or English) and different ordering phenomena.
De Smedt, Koenraad; Kempen, Gerard (1996). Discountinuous constituency
in Segment Grammar. In: Bunt, Harry; Van Horck, Arthur (Eds.),
Segment Grammar (SG) is a grammar formalism which is especially suited to model the incremental generation of sentences. SG is characterized by a dual level of syntactic description: f-structures, which are unordered functional structures composed out of syntactic segments, and c-structures, which represent left-to-right order of constituents. True discontinuities in SG are viewed as differences between immediate dominance (ID) relations in c-structures and those in corresponding f-structures. Constructions which are treated in this way include clause union, right dislocation, and fronting. Separable parts of words such as verbs and compound prepositions are not viewed as true discontinuities but as lexical entries consisting of separate syntactic segments.
De Smedt, Koenraad (1995). GeKnIPT: Op kennis gebaseerde
informatiepresentatie (een projectvoorstel). In: Noordman, L. G. M.; De Vroomen,
W. A. M. (Eds.),
Deze bijdrage schetst de opzet van een nieuw projectvoorstel op het gebied van informatiepresentatie. Het doel van het project is het ontwikkelen van een generische technologie voor de presentatie van informatie waarbij vooral aandacht wordt besteed aan (1) afstemming op individuele gebruikers en (2) multimodale presentatie. Twee praktijkgerichte prototypen op het gebied van de medische informatica spelen een belangrijke rol in het voorgestelde project.
De Smedt, Koenraad; Hovy, Eduard; McDonald, David;
Meteer, Marie (1995). The Seventh International Workshop on Natural Language
Generation (workshop report).
The Seventh International Workshop on Natural Language Generation was held from 21 to 24 June 1994 in Kennebunkport, Maine. Sixty-seven people from 13 countries attended this 4-day meeting on the study of natural language generation in computational linguistics and AI. The goal of the workshop was to introduce new, cutting-edge work to the community and provide an atmosphere in which discussion and exchange would flourish.
Gillis, Steven; Daelemans, Walter; De Smedt, Koenraad
(1995). Artificial
Intelligence. In Verschueren, J.; Östman, J.-O.; Blommaert, J. (Eds.),
Artificial intelligence (AI) is a branch of computer science in which methods and techniques are developed that permit intelligent computer systems to be built. The meaningful use of a natural language in order to communicate is considered to be a task requiring intelligence, even if the ability of people to speak and understand everyday language were not related to other cognitive abilities. We first briefly review the history of language processing research in AI and sketch the physical symbol system hypothesis which is the philosophical foundation for AI. Then we explain general aspects of knowledge representation and the most influential knowledge-based paradigms. Finally, we show how linguistic symbol manipulation is applied in semantics and pragmatics.
Daelemans, Walter; De Smedt, Koenraad (1994). Default
inheritance in an object-oriented representation of linguistic
categories.
We describe an object-oriented approach to the representation of linguistic knowledge. Rather than devising a dedicated grammar formalism, we explore the use of powerful but domain-independent object-oriented languages. We use default inheritance to organize regular and exceptional behavior of linguistic categories. Examples from our work in the areas of morphology, syntax and the lexicon are provided. Special attention is given to multiple inheritance, which is used for the composition of new categories out of existing ones, and to structured inheritance, which is used to predict, among other things, to which rule domain a word form belongs.
De Smedt, Koenraad (1994). Parallelism in
incremental sentence generation. In Adriaens, Geert; Hahn, Udo (Eds.),
IPF (Incremental Parallel Formulator) is a computer model in which the formulation stage in sentence generation is distributed among a number of parallel processes. Each conceptual fragment which is passed on to the Formulator gives rise to a new process, which attempts to formulate only that fragment and then exits. The task of each formulation process consists basically of instantiating one or more syntactic segments and attaching these to the present syntactic structure by means of a general unification operation. A shared memory provides the necessary (and only) interaction between parallel processes and allows the integration of segments created by different processes into one syntactic structure. The race between parallel processes in time may partly explain some variations in word order and lexical choice.
Claassen, Wim; Bos, Edwin; Huls, Carla; De Smedt,
Koenraad (1993). Commenting on action: A continuous linguistic feedback generator. In:
Gray, W. D.; Hefley, W. E.; Murray, D. (Eds.),
Action mode interfaces, in which users achieve their goals by manipulating representations, suffer from some fundamental disadvantages. In this paper, we present a working prototype of a system called Continuous Linguistic Feedback Generator (CLFG), a facility that addresses the major disadvantages. CLFG generates natural language descriptions of the actions the user is performing. These descriptions are presented in both the visual and audio channels. The knowledge sources and algorithm that enable CLFG to provide relevant and concise information are described in detail.
Zock, M., Carcagno, D., Kay, M., Namer, F., Nogier, J.
F., Nossin, M., & De Smedt, K. (1992). Automatic text generation: A tool
for the business world? In:
Text generation is a highly complex task, requiring different kinds of expertise. Despite this complexity, there are already a great number of systems in various languages, developed for various purposes. Natural language has a great industrial potential which, surprisingly, has been neglected for a long time. Even if computers are incapable of matching human performance, programs that generate natural language are invaluable tools for modeling human performance, and will eventually be applied in communicative tasks. Linguists, computer scientists and psychologists must work together to achieve advances in the field.
Daelemans, Walter; De Smedt, Koenraad; Gazdar, Gerald
(1992). Inheritance in
natural language processing.
In this introduction to the special issues, we begin by outlining a concrete example that indicates some of the motivations leading to the widespread use of inheritance networks in computational linguistics. This example allows us to illustrate some of the formal choices that have to be made by those who seek network solutions to natural language processing (NLP) problems. We provide some pointers into the extensive body of AI knowledge representation publications that have been concerned with the theory of inheritance over the last dozen years or so. We go on to identify the four rather separate traditions that have led to the current work in NLP. We then provide a fairly comprehensive literature survey of the use that computational linguists have made of inheritance networks over the last two decades, organized by reference to levels of linguistic description. In the course of this survey, we draw the reader's attention to each of the papers in these issues of Computational Linguistics and set them in the context of related work.
De Smedt, Koenraad; Kempen, Gerard (1991). Segment Grammar:
A formalism for incremental sentence generation. In: Paris, Cecile L.;
Swartout, William R.; Mann, William C. (Eds.),
Incremental sentence generation imposes special constraints on the representation of the grammar and the design of the formulator (the module which is responsible for constructing the syntactic and morphological structure). In the model of natural speech production presented here, a formalism called Segment Grammar is used for the representation of linguistic knowledge. We give a definition of this formalism and present a formulator design which relies on it. Next, we present an object-oriented implementation of Segment Grammar. Finally, we compare Segment Grammar with other formalisms.
De Smedt, Koenraad (1991). Revisions during
generation using non-destructive unification. In:
A realistic model for natural language generation must account for overt revisions of the syntactic structure (self-corrections) as well as covert revisions (backtracking on syntactic options). This paper presents the preliminaries of a hybrid architecture for grammatical encoding (the `tactical' phase in sentence generation) which allows such revisions. This architecture combines the concept of activation with a non-destructive variant of the unification algorithm and views sentence generation as an optimalization process.
Daelemans, Walter; De Smedt, Koenraad; De Graaf, Josje
(1991).
We describe an object-oriented approach to the representation of linguistic knowledge. Rather than devising a dedicated grammar formalism, we explore the use of powerful but domain-independent object-oriented languages. We use default inheritance to organize regular and exceptional behavior of linguistic categories. Examples from our work in the areas of morphology, syntax and the lexicon are provided. Special attention is given to multiple inheritance, which is used for the composition of new categories out of existing ones, and to structured inheritance, which is used to predict, among other things, to which rule domain a word form belongs.
De Smedt, Koenraad; Schotel, Henk (1991). Review of:
[Luger, George F.; Stubblefield, William F. (1989)
Review.
De Smedt, Koenraad (1990).
Spontaneous speech is characterized by the fact, that the speaker often has not yet fully worked out the content of a sentence before the first words are uttered. A computer model called IPF is presented which accounts for this characteristic by operating in a parallel and incremental mode. Part One discusses psychological and linguistic aspects of IPF. The requirements for incremental generation are investigated. A grammar formalism called Segment Grammar is presented which fulfills these requirements. This unification-based grammar is so organized, that for each utterance it constructs a functional structure as well as a (surface) constituent structure. Variations in word order and some discontinuities are accounted for from a perspective of incremental generation. Part Two discusses representational and computational aspects of IPF. Because the generation of natural language is a knowledge-intensive process, a representation of linguistic knowledge is presented which is hierarchically structured and captures generalizations while allowing exceptions. An object-oriented programming language, CommonORBIT, provides the necessary mechanisms for this purpose, as is illustrated with examples. Furthermore, the application of concurrent programming concepts in IPF is explained. Finally, the model is evaluated and some extensions and future research topics are proposed.
De Smedt, Koenraad (1990). Een objectgerichte
taal gebaseerd op LISP.
Objecten zijn voorstellingen van afzonderlijke entiteiten in een computermodel van de werkelijkheid. De kennis over deze objecten kan bestaan uit data, maar ook uit procedures die toepasbaar zijn op deze objecten. In objectgerichte (of object-georiënteerde) talen worden procedures en data dan ook ingekapseld in objecten. Door middel van erving kunnen objecten kennis delen met andere. Op deze manier kan men nieuwe objecten creëren als specialisaties of combinaties van andere objecten. Tevens ondersteunt erving een manier van programmeren door verfijning en draagt het bij tot het vermijden van redundantie. De programmeertaal CommonORBIT is een objectgerichte uitbreiding van Common LISP. De kenmerken van deze taal worden vergeleken met die van andere objectgerichte talen om zo tot een genuanceerd overzicht te komen van enkele architecturen binnen het objectgerichte paradigma.
De Smedt, Koenraad (1990). IPF: An incremental parallel
formulator. In: Dale, Robert; Mellish, Chris; Zock, Michael (Eds.),
A computer simulation model of the human speaker is presented which generates sentences in a piecemeal way. The module responsible for Grammatical Encoding (the tactical component) is discussed in detail. Generation is conceptually and lexically guided and may proceed from the bottom of the syntactic structure upwards as well as from the top downwards. The construction of syntactic structures is based on unification of so-called syntactic segments.
Kempen, Gerard; De Smedt, Koenraad (1990). Tree
Adjoining Grammar, Segment Grammar, and incremental sentence generation. In:
Segment Grammar and Tree Adjoining Grammar are similar in that they are lexically guided and fulfill some requirements of incremental generation. However, the distinction between an functional structure and a (surface) constituent structure in Segment Grammar allows more flexible processing in an incremental mode.
De Smedt, Koenraad; De Graaf, Josje (1990). Structured
inheritance in frame-based representation of linguistic categories. In:
Daelemans, Walter; Gazdar, Gerald (Eds.),
Structured inheritance is a powerful mechanism which models a slot filler after one higher in the hierarchy. Provided by several general-purpose frame-based and object-oriented representation languages, it is also very useful for linguistic representation. Examples from morphology and syntax are provided in the context of a natural language generation task.
Van der Linden, Erik; Brinkkemper, Sjaak; De Smedt,
Koenraad; Van Boven, P.; Van der Linden, M. (1990). The representation of
lexical objects. In: Magay, T.; Zigány, J. (Eds.),
Information analysis methods developed in computer science for the construction of database systems can also be applied to computational lexicography. These methods deliver an abstract and concise description of the objects involved in a lexical information system, and reveal the considerations that are used when establishing the identity of the lexical units. Two underlying principles are introduced: the abstraction principle, positing that objects that do not occur in reality may nevertheless have to be represented in the lexicon, and the generalization principle, stating that the inclusion of these objects necessitates linguistic generalizations tied to the lexicon.
De Smedt, Koenraad (1989).
Objects are representations of entities in a domain which is modeled in a computer. Each object encapsulates the knowledge relevant to one abstract concept or physical object in the real world. This knowledge may consist of data but also of procedures which are applicable to the object. Using the metaphor of a society of communicating entities, these procedures are activated by sending messages to objects. In an applicative view of object-oriented programming, procedures are called as generic functions. Object-oriented languages allow knowledge to be shared between several objects by a mechanism called inheritance. The knowledge representation language which is discussed in this report is CommonORBIT, an extension to Common LISP. It is an easy but powerful language, which offers a general-purpose object-oriented representation with similarities to semantic networks and frame-based systems.
De Smedt, Koenraad (1989). Distributed unification in
parallel incremental syntactic tree formation. In:
For incremental syntactic tree formation, a grammar formalism called Segment Grammar has been proposed. This paper shows how tree formation sith such a formalism can be seen as distributed unification which allows parallelism in syntactic tree formation.
Van Berkel, Brigit; De Smedt, Koenraad (1988). Triphone analysis: A
combined method for the correction of orthographical and typographical
errors. In:
Most existing systems for the correction of word level errors are oriented toward either typographical or orthographical errors. Triphone analysis is a new correction strategy which combines phonemic transcription with trigram analysis. It corrects both kinds of errors (also in combination) and is a superior method for the correction of orthographical errors.
De Smedt, Koenraad (1988). Automatische correctie van
spelling.
Dit artikel geeft een kort overzicht van de mogelijkheden en beperkingen van automatische correctie van de spelling in Nederlandse teksten.
De Smedt, Koenraad (1988). Knowledge representation
techniques in artificial intelligence: An overview. In: Van der Veer, G. C.;
Mulder, G. (Eds.)
The complexity of knowledge involved in Artificial Intelligence systems justifies the distinction of a separate programming level for the representation of knowledge. An overview is given of four styles of symbolic knowledge representation currently used in AI: (1) logic, (2) production rules, (3) procedures, and (4) semantic networks and frames. Each style is characterized briefly and some advantages and disadvantages of each style are mentioned. A number of languages for knowledge representation are mentioned.
De Smedt, Koenraad; Huls, Carla; Pijls, Fieny (1988).
Taalkennis in
tekstverwerking.
Het gebruik van de computer voor de verwerking van natuurlijke taal heeft zich in het verleden sterk toegespitst op automatische vertaling en vraag-antwoord-systemen. Nochtans zijn er veel meer toepassingsgebieden waar natuurlijke taalverwerking interessante mogelijkheden biedt. De toepassing van taalkennis in redactionele taken is tot op heden onvoldoende gewaardeerd en geëxploiteerd. In dit artikel schetsen wij een auteursomgeving die taalkundige ondersteuning biedt bij het schrijven en redigeren van teksten. Wij bespreken onder meer automatische correctie van tik- en spelfouten (ook grammatische zoals d/t-fouten), betrouwbare woordafbreking en raadpleging van een lexicon. Ook stellen we enkele afgeleide systemen voor, met name een schooltekstverwerker en een generator van semi-standaardteksten.
De Smedt, Koenraad; Kempen, Gerard (1987). Incremental
sentence production, self-correction and coordination. In Kempen. Gerard (Ed.),
In the generation of spontaneous speech, several stages of processing of the conceptual, lexico-syntactic, morpho-phonological and articulatory modules are customarily distinguished. It is not necessary for these modules to operate on input structures corresponding to whole sentences. Rather, the modules can work on different part of the final utterance simultaneously in an incremental fashion. Such a framework can accommodate several performance phenomena such as hesitations within the sentence, changes of mind, self-corrections, afterthoughts, and dead ends, i.e. the fact that people sometimes "talk themselves into a corner" and have to restart the utterance. We present a general framework for an incremental generation and describe how various conceptual activities during production may effect the partial syntactic structure.
Kempen, Gerard; Anbeek, Gert; Desain, Peter; Konst,
Leo; De Smedt, Koenraad (1987). Author environments: Fifth generation text
processors. In Directorate General XIII of the Commission of the European
Communities (Ed.),
Artificial Intelligence techniques for Natural Language Processing enable the construction of knowledge-based editorial software systems which greatly facilitate the preparation, manipulation and translation of full-text documents. They can offer many new forms of support which are far beyond the reach of present-day word processors. We propose Author Environment or Author System as collective terms for such text editing tools. After a somewhat principled discussion of what we mean by representing a natural-language text in a computer, we describe the goals, design, implementation and functionality of the Author Environment which we are developing as part of the ESPRIT OS-82 project which aims at the construction of an intelligent office workstation. We focus on the linguistic modules and the user interface.
Kempen, Gerard; Anbeek, Gert; Desain, Peter; Konst,
Leo; De Smedt, Koenraad (1987). Auteursomgevingen: Tekstverwerkers van de vijfde
generatie.
Technieken die binnen de Artificiële Intelligentie (AI) voor verwerking van natuurlijke taal, maken de bouw mogelijk van 'intelligente' tekstverwerkers die nieuwe vormen van ondersteuning bieden bij het schrijven, redigeren en vertalen van teksten en documenten. Als algemene benaming voor zulke tekstverwerkers stellen wij de termen auteursomgeving of auteursysteem voor. Na een theoretische uiteenzetting over het representeren van in natuurlijke taal gestelde tekst in een computer, beschrijven wij achtereenvolgens doelstelling, ontwerp, implementatie en functionaliteit van de auteursomgeving die in het Psychologisch Laboratorium van de KUN wordt ontwikkeld als onderdeel van een ESPRIT-project dat de bouw van een intelligent kantoorwerkstation beoogt. De aandacht zal met name gericht zijn op de linguïstische modules en het gebruikersinterface.
Van der Linden, Erik; De Smedt, Koenraad (1987).
Computerlexica voor een auteursysteem.
Een computerlexicon ten behoeve van een auteursysteem moet een ondersteunende rol kunnen spelen bij onder meer de volgende taken: detecteren en corrigeren van spel- en tikfouten, afbreken van woorden aan de rechtermarge, aangeven van andere verbogen of vervoegde vormen van een woord, signaleren van foutieve zinsbouw, transformeren van zinnen. Deze verschillende taken stellen elk andere eisen aan de inhoud en structuur van het woordenboek. Een aantal lexica die hiervoor zijn ontworpen in het Taaltechnologieproject te Nijmegen worden besproken en een aanzet wordt gegeven tot een ontwerp voor een lexicale databank.
De Smedt, Koenraad (1987). Object-oriented programming
in Flavors and CommonORBIT. In R. Hawley (Ed.),
When writing a program, it is often useful to have a computational representation of the objects in the problem domain. Object-oriented programming environments strongly support such representations by organizing procedures as well as data in terms of objects. Two such programming environments are discussed: FLAVORS and CommonORBIT. A comparison is made with respect to underlying metaphors, the behavior of objects, specialization hierarchies, behavior sharing mechanisms, and integration in LISP. Attention is given to aspects of system design which affect programming style, computing efficiency, and modularity.
De Smedt, Koenraad; Geurts, Bart; Desain, Peter
(1987). Waiting for the gift of sound and vision. In:
It is generally acknowledged that both the linguistic and the graphical modes of interaction possess specific advantages and disadvantages for man-machine interaction. We do not argue that a linguistic mode or a graphical mode is better, but merely that the two are complementary. We envisage a multi-modal system which is primarily based on direct manipulation on a graphical screen but which is supplemented by natural language communication in specific situations.
De Smedt, Koenraad (1986).
In programmatuur voor de verwerking van natuurlijke taal, zoals die thans wordt ontwikkeld in het Taaltechnologieproject te Nijmegen, speelt het lexicon een belangrijke rol. In deze paper worden eerst de doelstellingen van het Taaltechnologieproject opgesomd. Dan worden de mogelijkheden van een gedrukt woordenboek tegenover die van een computerwoordenboek gesteld. Verder wordt een overzicht geboden van de manier waarop lexica gestructureerd zijn die thans worden gebruikt binnen het project. Tenslotte wordt een visie gegeven op het nog te verrichten werk voor de constructie van computerlexica voor taaltechnologische toepassingen.
De Smedt, Koenraad (1985). Some aspects of Dutch
morphology and syntax in the context of the Language Technology project at the
University of Nijmegen.
An overview is given of the various modules of the Language Technology Project at the Psychology Laboratory of the University of Nijmegen, which aims at the development of dialogue systems as well as author systems. The article focuses on the role of the various knowledge sources which are involved in the tasks of generation and analysis on the word and sentence levels.
De Smedt, Koenraad (1984). Using object-oriented
knowledge-representation techniques in morphology and syntax programming. In
O'Shea, Tim (Ed.),
The class/subclass and rule/exception relations in natural language grammars can be captured elegantly by the inheritance mechanism in an object-oriented programming language. Some examples are given of how morphological and syntactic relations can be captured using role relationships between objects.
Martin, Willy; Tops, Guy A. J. (Eds.). (1984).
Lexicographic contributions were made to this comprehensive English-Dutch dictionary.
De Smedt, Koenraad (1984). Kennisrepresentatie. In
Kempen, Gerard; Sprangers, Chris (Eds.),
Vijf belangrijke symbolische formalismen voor kennisrepresentatie in Artificiële Intelligentie worden besproken: (1) predicatenlogica, (2) procedurele representaties, (3) semantische netwerken, (4) productiesystemen en (5) frames.
De Smedt, Koenraad (1984).
ORBIT is an applicative object-oriented programming language. It is implemented as an extension of FRANZ LISP and allows free intermixing of object-oriented and non-object-oriented code. ORBIT is a reasonably efficient tool for writing and running programs that operate on complex and dynamic data. Its features include multiple and selective inheritance, the ability to automatically store inherited or computed information, and reverse function application. The ORBIT programming environment provides special editing and debugging tools.
De Smedt, Koenraad (1984). Implementing an IPG
generator in an applicative object-oriented programming language. In:
Language production can be viewed as the creation of objects and the activation of object-oriented (generic) functions in an applicative object-oriented programming language. The resulting linguistic structures are networks of objects with relations between them. In this way, constituents can be viewed as objects and syntactic functions as object-oriented (generic) functions. The paper discusses how this approach is compatible with the basic tenets of Incremental Procedural Grammar (IPG).
Kempen, Gerard; Konst, Leo; De Smedt, Koenraad (1984).
Taaltechnologie voor het Nederlands: Vorderingen bij de bouw van een
nederlandstalig dialoog- en auteursysteem.
Dit artikel geeft een overzicht van het Taaltechnologie-project dat sinds eind 1982 wordt uitgevoerd aan de Katholieke Universiteit Nijmegen. Na de belangrijkste doestellingen te hebben geschetst beschrijven we de stand van zaken medio 1984. We zetten de principes uiteen die ten grondslag liggen aan de Nederlandstalige linguïstische modules waaraan wordt gewerkt: woord- en zinsontleders, woord- en zinsgeneratoren, een linguïstische databank, en een objectgericht kennisrepresentatiesysteem. Het is de bedoeling deze modules te integreren tot een dialoogsysteem voor communicatie met databanken en expertsystemen in natuurlijke taal, en tot een auteursysteem voor tekstverwerking op basis van ingebouwde taalkennis.
Steels, Luc; De Smedt, Koenraad (1983). Some examples of
frame-based syntactic processing. In Daems, Frans; Goossens, Louis (Eds.),
Language processing is viewed as a problem solving process with two components: a problem solving mechanism, and a grammar as the body of knowledge needed to drive it. The grammar consists of associations of descriptions grouped in frames. Frames are organized in tangled hierarchies based on generalization and refinement relationships. Linguistic structures are collections of descriptions which are generalized and/or refined until a particular goal is reached. A small but illustrative example of this approach is presented.
De Smedt, Koenraad (1983). Implementing an IPG
generator in an applicative object-oriented programming language. In:
Language production can be viewed as the creation of objects and the activation of object-oriented (generic) functions in an applicative object-oriented programming language. The resulting linguistic structures are networks of objects with relations between them. In this way, constituents can be viewed as objects and syntactic functions as object-oriented (generic) functions. The paper discusses how this approach is compatible with the basic tenets of Incremental Procedural Grammar (IPG).
De Smedt, Koenraad (1980).
Frame-based representation languages can benefit from the correspondence between slots in a frame and case relations in natural language. This view facilitates the incorporation of natural-language-like features such as verbs and adjectives in a frame description.