Automatic Analysing and Summarizing: 2007

DUBLIN CORE ELEMENTS

D.C Title : Human Language Technologies for Europe

D.C Creator : Accipio consulting

D.C Subject : European languages, cross-lingual, technologies

D.C Description : “For our multilingual Europe, cross-lingual communication and information exchange is of fundamental importance. Twenty official European languages, i.e. 190 language pairs or 380 translation directions, put cost and effort on every cross-lingual activity, in government, in business, and in our community. While this effort is comparatively small for some sorts of transaction and communication, it is large enough to prevent others from ever taking place. In the future this situation will be changed dramatically by the availability of translation capability provided by automatic systems – less perfect than professional human translators, but cheaper, faster, available on the spot, and good enough for many purposes. While efficiency gains in traditional human translation are also to be expected, the major uptake of these technologies will be in automated cross-lingual applications. Spoken language translation and machine translation will start in niche markets but rapidly expand in scope, largely independent of the currently existing translation services business. As enabling technologies, they will stimulate Europe’s commerce and economy. For Europe it is a strategic necessity to have human language technologies available that facilitate cross-lingual communication and information exchange to the greatest extent possible.
This report begins by illustrating the significance of human language technologies, in particular for Europe, and describing the present state of affairs. It examines the European perspective in a global context with specific reference to the United States, India and East Asia. Current state of the art in research and in business is explored and expectations for future market developments outlined. Interviews with decision makers and specialists from research and business broaden the perspective and provide insight into the topic.”

D.C Publisher : Technology and Corpora for Speech to Speech Translation (TC Star)

D.C Contributor

D.C Date : 2006/05

D.C Type : report

D.C Format : pdf

D.C Identifier : http://www.tc-star.org/pubblicazioni/D17_HLT_ENG.pdf

D.C Source : http://www.tc-star.org/

D.C Language : En

D.C Relation

D.C Coverage : Europe

D.C Rights

Cross lingual information extraction and automated text summarization

DUBLIN CORE ELEMENTS

D.C Title: Cross lingual information extraction and automated text summarization

D.C Creator: Carnegie Mellon, School of computer science

D.C Subject: Information Extraction, Text Summarization, extracting

D.C Description: Information Extraction (IE) and Text Summarization are two methods of extracting relevant portions of the input text. IE produces templates, whose slots are filled with the important information, while Summarization produces one of various types of summary. Over the past 15 years, IE systems have come a long way, with commercial applications being around the corner. Summarization, in contrast, is a much younger enterprise. At present, it borrows techniques from IR and IE, but still requires a considerable amount of research before its unique aspects will be clearly understood.

D.C Publisher: Eduard Hovy

D.C Contributor: Ralph Grishman et alii

D.C Date: 1999/04

D.C Type: report

D.C Format: html

D.C Identifier : http://www.cs.cmu.edu/~ref/mlim/chapter3.html

D.C Source : http://www.cs.cmu.edu/~ref/mlim/

D.C Language: en

D.C Relation:
Aone, C., M.E. Okurowski, J. Gorlinsky, B. Larsen. 1997. A Scalable Summarization System using Robust NLP. Proceedings of the Workshop on Intelligent Scalable Text Summarization, 66—73. ACL/EACL Conference, Madrid, Spain.

DeJong, G.J. 1979. FRUMP: Fast Reading and Understanding Program. Ph.D. dissertation, Yale University.

Firmin Hand, T. and B. Sundheim. 1998. TIPSTER-SUMMAC Summarization Evaluation. Proceedings of the TIPSTER Text Phase III Workshop. Washington.

Grishman, R. and B. Sundheim (eds). 1996. Message Understanding Conference 6 (MUC-6): A Brief History. Proceedings of the COLING-96 Conference. Copenhagen, Denmark (466—471).

Hovy, E.H. and C-Y. Lin. 1998. Automating Text Summarization in SUMMARIST. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.

Hovy, E.H. and C-Y. Lin. 1999. Automated Multilingual Text Summarization and its Evaluation. Submitted.

Jing, H., R. Barzilay, K. McKeown, and M. Elhadad. 1998. Summarization Evaluation Methods: Experiments and Results. In E.H. Hovy and D. Radev (eds), Proceedings of the AAAI Spring Symposium on Intelligent Text Summarization (60—68).

Jones, K.S. and J.R.Galliers. 1996. Evaluating Natural Language Processing Systems: An Analysis and Review. New York: Springer.

Knight, K. and J. Graehl. 1997. Machine Transliteration. Proceedings of the 35th ACL-97 Conference. Madrid, Spain, (128—135).

Lin, C-Y. 1999. Training a Selection Function for Extraction in SUMMARIST. Submitted.

Marcu, D. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. Ph.D. dissertation, University of Toronto.

Marcu, D. 1999. The Automatic Construction of Large-scale Corpora for Summarization Research. Forthcoming.

Jacobs, P.S. and L.F. Rau. 1990. SCISOR: Extracting Information from On-Line News. Communications of the ACM 33(11): 88—97.

Kupiec, J., J. Pedersen, and F. Chen. 1995. A Trainable Document Summarizer. In Proceedings of the 18th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 68—73. Seattle, WA.

Mani, I. et al. 1998. The TIPSTER Text Summarization Evaluation: Initial Report.

Miike, S., E. Itoh, K. Ono, and K. Sumita. 1994. A Full-Text Retrieval System with Dynamic Abstract Generation Function. Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR-94), 152—161.

Radev, D. 1998. Generating Natural Language Summaries from Multiple On-Line Sources: Language Reuse and Regeneration. Ph.D. dissertation, Columbia University.

Reimer, U. and U. Hahn. 1998. A Formal Model of Text summarization Based on Condensation Operators of a Terminological Logic. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.

Sager, N. 1970. The Sublanguage Method in String Grammars. In R.W. Ewton, Jr. and J. Ornstein (eds.), Studies in Language and Linguistics (89—98).

Sparck Jones, K. 1998. Introduction to Text Summarisation. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.

Strzalkowski, T. et al., 1998. ? In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.

D.C Coverage

D.C Rights:Carnegie Mellon, School of computer science

Lexicon

ODLIS: Online Dictionary for Library and Information Science http://lu.com/odlis/
A glossary of archival and records http://www.archivists.org/glossary/index.asp
The Information Professional's Glossary http://www.sir.arizona.edu/resources/glossary.html

Assignment indexing
(ODLIS) A method of indexing in which a human indexer selects one or more subject headings or descriptors from a list of controlled vocabulary to represent the subject(s) of a work. The indexing terms selected to represent the content need not appear in the title or text of the document indexed. Synonymous with assigned indexing.
=Indexation de nomination

(A glossary of archival and records) The process of creating an ordered list of headings, using terms that may not be found in the text, with pointers to relevant portions in the material. Assignment indexing usually draws headings from a controlled vocabulary. Assignment indexing is distinguished from extraction indexing, typically done by a computer, which relies on the terms found in the document. At one point, assignment indexing was a manual process, requiring human judgment to link the headings to the concepts in the text. The process of assigning terms from a controlled vocabulary has been automated, with mixed success, by building rules of analysis that can assign headings on the basis of other, related terms in the text. =Indexation de nomination

Automation
(The Information Professional's Glossary) Changing from manual, paperbased methods of recording, organizing, and retrieving information to computerized systems. Circulation control and cataloging are among the most widely automated library functions.
=Automatisation

Automatic indexing
(ODLIS) a method of indexing in which an algorithm is applied by a computer to the title and/or the text of a work to identify and extract words and phrases representing subjects, for use as heading under which entries are made in the index.
=Indexation automatique

(A glossary of archival and records) The use of computers to enable concepts to be located within the material indexed. Early automatic indexing was largely limited to extraction indexing, relying on specific terms in the text to represent concepts. With increasing sophistication, computers were able to perform assignment indexing, allowing users to search using concepts in a controlled vocabulary using words that may not appear in the text. Both forms of automatic indexing created ordered lists of concepts that could be browsed, with pointers to the place where those concepts would appear in the text. Full-text search engines, such as those that index the web, create links between terms or phrases and the documents but generally do not produce a browsable list of headings.
also computer-based indexing, machine indexing

Computer-aided retrieval
(A glossary of archival and records) The use of computers to aid access to information on physical media, especially microfilm. Computer-aided retrieval typically includes an index and sometimes a brief description of the material that can be searched. When relevant material is identified, the system may automatically load microfilm or offline-storage media containing the materials, then locate it within its container and display it.
=Recherche assistée par ordinateur

Full-text search :
(A glossary of archival and records) (Computing) • A technique to provide access to electronic materials by indexing every word in every item.
Full-text searching is distinguished from manual indexing, which searches headings assigned to each document. Early full-text searching was limited, especially in its ability to distinguish different senses of a word; a search for glass might find documents relating to Philip Glass, window glass, and glass ice. More sophisticated search engines could use several terms, with Boolean logic and adjacency rules, to help narrow false hits. Modern search engines seen on the web use a variety of similar techniques to rank documents in terms of likely relevancy.
= Recherche en texte intégral

Autres sources

Library Terminology: A Guide for International Students
http://www.libraries.rutgers.edu/rul/lib_servs/intl_students_terms.shtml
http://palimpsest.stanford.edu/lex/
Glossary of Library Terms
Glossary of Academic Information Technology Terms
Library Terminology: A Guide for International Students

Document analysis : a way to integrate existing paper information in architectural databases

DUBLIN CORE ELEMENTS

D.C Title: Document analysis : a way to integrate existing paper information in architectural databases

D.C Creator: Tombre K. ; Paul J.C.

D.C Subject: Automatic analysis ; Printed document ; Formatting ; Automatic recognition ; Industrial drawing ; Graphic document ; Case study ; General study ; Document processing ; Documentation ; Documentation data processing

D.C Description: In any domain, the use of information systems leads to the problem of converting the existing archives of paper documents into a format suitable for the computerized system. In this area, most of the attention has probably been given to strutured document analysis, i.e. the automated analysis of business document such as letters, forms, documentation, manuals etc., including the well-known area of character recognition. But document analysis is also a powerful tool in technical domains such as architecture, where large quantitities of drawings of various kinds are available on paper. In this paper we shortly present the state of the art in technical drawing analysis and we investigate the suitably of document analysis to the conversion from paper to architectural databases

D.C Publisher: CNRS

D.C Contributor

D.C Date: 1994

D.C Type: Report

D.C Format: book

D.C Identifier : INIST-CNRS, Cote INIST : RP 13239

D.C Source

D.C Language: en

D.C Relation

D.C Coverage

D.C Rights: Copyright 2006 INIST-CNRS

Notes on Automatic Indexing

By Seth A. Maislin

"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement. I'm a firm believer that the technology doesn't exist, and that a human being is required to write an index. Thus I don't use the software, and I also don't recommend it.

There are those who advocate it, arguing that it's "not as bad as an indexer would have you think." These people are often coming from the standpoint that automatic software is faster and cheaper, and they're right. Thus the issue surrounds quality.

I believe that good automatic indexes will exist once there's good artificial intelligence, something that presently doesn't exist. In very limited circumstances, however, it does; a machine can easily cull capitalized words from a textbook to create an approximation of an index of names -- although, again, the machine isn't going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same format and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an unuseful and trivial way, as are the names in this paragraph! So imagine the problems trying to get a machine to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text.

FYI, those who advocate automatic software, however, would argue that the machine gets "close enough" so that a human being can edit the resulting product. However, expert evaluators unanimously agree that the software fails; those who disagree are likely those who are sufficiently ignorant of indexing in the first place such that they are unable to determine the quality differences.

Oh, I should mention that there are software programs that human indexers use to simplify and speed up the mechanics of the index process. For example, it would be silly to disallow a computer to alphabetize the entries, reformat the index, and manipulate page numbers. There are a few software packages that do this exclusively, which are considered top of the line; other applications that have indexing capabilities, such as Microsoft Word and Adobe FrameMaker, have some of these capabilities, with notable limitations.

For information on the various software available, see http://www.asindexing.org/site/software.shtml.

DUBLIN CORE ELEMENTS

D.C Title: Notes on Automatic Indexing

D.C Creator: Seth A. Maislin

D.C Subject: Automated indexing software, automatic index, index process

D.C Description: Limits of the automatic indexing

D.C Publisher: Seth A. Maislin

D.C Contributor

D.C Date: 2005-03-15

D.C Type: article

D.C Format: html

D.C Identifier : http://taxonomist.tripod.com/indexing/autoindex.html

D.C Source

D.C Language:en

D.C Relation

D.C Coverage

D.C Rights: Seth A. Maislin

Automatic Abstracting & Summarizing Tools

Vimal Kumar Varun
Scientist 'D'
Department of Scientific & Industrial Research
Technology Bhawan, New Mehrauli Road, New Delhi-110016. INDIA
Internet: vkv@alpha.nic.in URL: http://vkv.tripod.com

Information Today & Tomorrow, Vol. 21, No. 2, June 2002, p.12-p.16
http://itt.nissat.tripod.com/itt0202/ruoi0202.htm

Preamble

Information overload is becoming a problem for an increasingly large number of people, and a key step in reducing the problem is to use an abridgement tool. A summary tailored to your interest provides a convenient way to get a quick impression of the content of a document. This maybe to make a decision whether or not to read the full document, or even as an alternative to the original, thus saving once valuable time.

Summarization is the process of condensing a source text into a shorter version preserving its information content. There are two kinds of automatic summarization. The first summarizes whole documents, either by extracting important sentences or by rephrasing and shortening the original text. Most summarization tools currently under development extract key passages or topic sentences, rather than rephrasing the document. Rephrasing is a much more difficult task. The second process summarizes across multiple documents. Cross-document summarization is harder, but potentially more valuable. It will increase the value of alerting services by condensing retrieved information into smaller, more manageable reports. Cross-document summarization will allow us to deliver very brief overviews of new developments to busy clients. We can expect some tools to do this within the next 2-4 years.

Brevity Document Summarizer, concise outlines of your documents
http://www.lextek.com/brevity/

Brevity easily generates document summaries and it can be as long or as short as one wishes. It can also be used to highlight key sentences or words in the document. The key benefits of Brevity are: accurate generation of automated document summaries, quick determination of a document's contents, highlighting of significant words and sentences in a document and find the key parts of a document. The demo of the Sumamrizer is available at http://www.lextek.com/brevity/bravedemo.htm.

Contact: Lextek International at sales@Lextek.com for more information

Copernic Summarizer, free yourself from information overload
http://www.copernic.com/products/summarizer/index.html

This easy-to-use summarizing software dramatically increases the productivity and efficiency by creating concise summaries of any document or Web page without missing any important information. It can be invoked directly from the application like MS Word, MS Outlook, Eudora, Netscape and Adobe Acrobat.

Using sophisticated statistical and linguistic algorithms, it pinpoints the key concepts and extracts the most relevant sentences, resulting in a summary that is a shorter and condensed version of the original text. Complete feature list is available at http://www.copernic.com/products/summarizer/features.html. Free trial version of Copernic Summarizer fully functional for 30 days is available at http://www.copernic.com/products/summarizer/download.html.

Extractor, text summarization software for automatic indexing and abstracting
http://extractor.iit.nrc.ca/

Extractor is a software for automatically summarizing text, developed by the Interactive Information Group. Extractor takes a text file as input and generates a list of key words and sentences as output. On-line demo (also for German texts) is available at the site.

HyperGen Sumamrization Tool
http://crl.nmsu.edu/Research/Projects/minds/core_sumamrizer/

HyperGen exploits hypertext technology to automatically generate hypertext structure from a plain or hypertext document. Every part of the document is summarized. The different summaries are linked together in a hypertext structure where each hyperlink is labeled meaningfully. HyperGen has implemented preliminary ideas for generating meaningful labels by identifying key topics and rhetorical types.

A presentation on `Hypertext Summary Extraction for Fast Document Browsing' by Kavi Mahesh is available at http://crl.nmsu.edu/Research/Projects/minds/core_summarizer/talk/. This presentation includes slides and several examples of HyperGen's plain and hypertext summaries with corresponding summaries from Microsoft's summarization tool for comparison.

The key features of HyperGen incldues: Hypertext summarization of documents; Automatic generation of Hypertext summaries with multiple layers of detail from plain or hypertext documents; Generation of meaningful labels for hyperlinks; etc. It is multilingual; ideal for document filtering, browsing, or document content visualization; can be used in conjunction with any web browser; can be easily integrated with extraction, retrieval, and machine translation systems; and implemented entirely in Java.

Contact: Kavi Mahesh at mahesh@crl.nmsu.edu for more information.

Intelligent Miner for Text: summarization tool
http://www-3.ibm.com/software/data/iminer/fortext/summarize/summarize.html

The summarization tool automatically extracts the most relevant sentences from a document, creates a summary of the document from these sentences, and uses a set of ranking strategies on sentence and on word level to calculate the relevancy of a sentence to a document. The user can set the length of the summary.

IBM Intellidence Miner for Text available at http://www-3.ibm.com/software/data/iminer/ includes text analysis tools such as a Feature Extraction tool, Clustering tools, a Summarization tool, and a Categorization tool. Also incorporates the IBM Text Search Engine, NetQuestion Solution, an Internet/intranet text-search solution, and the IBM Web Crawler Package.

Inxight Summarizer, systems for the automatic production of text summaries
http://www.inxight.com/products/summarizer/

The Inxight Summarizer™ SDK (software development kit) allowing applications developers to incorporate into their products an intelligent solution to many problems inherent to online searches. By focusing on the relevant key sentences contained within a document, the Summarizer technology enables end-users to browse quickly though volumes of information and extract the documents most suitable to their search requirements. Summarizer utilizes consistent sentence-selection criteria that match the conceptual content of documents. End-users save precious time and effort since they do not have to download and read each retrieved document to determine its relevancy. They experience easier navigation through Web sites, faster access to pertinent information and increased productivity.

Summarizer can summarize a typical document in a fraction of a second and so enables users to use more of their time utilizing data, not just trying to find it. Also, to expedite search functions, the Summarizer can be "trained" to find key sentences based on the structure of specific document types. Information is accessible by the length of key sentences or the number of key phrases. The end-user can control the weight of phrases by query phrase or drop phrases.

MultiGen
http://www.cs.columbia.edu/~regina/demo4/

MultiGen is a multi-document summarization tool developed at Columbia University. Multiple document summarization could be useful, for example, in the context of large information retrieval systems to help determine which documents are relevant. Such summaries can cut down on the amount of reading by synthesizing information common among all retrieved documents and by explicitly highlighting distinctions.

It automatically generates a concise summary by identifying similarities and differences across a set of related documents. Input to the system is a set of related documents, such as those retrieved by a search engine in response to a particular query. The MultiGen examples are available at http://www.cs.columbia.edu/~regina/demo4/examples.html.

Contact: Principal investigators Prof. Kathleen R. McKeown and Dr. Judith L. Klavans at kathy@cs.columbia.edu and klavans@cs.columbia.edu respectively for more infomation.

Pertinence Automatic Summary or Abstract
http://www.pertinence.net/index_en.html

It is a a data-processing tool which transforms a source text into a new text in a shorter version keeping the relevant information intact. Several formats of texts including html, pdf, MS word are accepted. The original text can be in one of several languages like English, French, Spanish, Italian, Portuguese, German, Chinese, Japanese and Korean. Input can be a file on your hard disk, or a file from the Net or some text that you can cut and paste! The domain of the text supported are chemistry, finance, law and medicine. For other domains, contact Pertinence at contact@pertinence.net.

One can try Pertinence free at http://www.pertinence.net/index_en.html after registering at http://www.pertinence.net/register_en.html by entering name, organization and email address. On submission, the password of preferred length (2-8 characters) is sent by an email. Internet Explorer 5+ or Netscape 6+ is recommended.

Contact: Authors A Lehmam and P Bouvet at lehmam@pertinence.net and bouvet@pertinence.net respectively for more information.

Sinope Summarizer, automatic text summarizer
http://www.carp-technologies.nl/en/sinope/

The Sinope Summarizer integrates with Microsoft Internet Explorer and summarizes the text in the Web page. The percent summary level can be adjusted from 1 to 100%. The tool keeps pictures and other formatting details intact. The utility is available for English, German and Dutch text. A must have for everybody that surfs the internet. The shareware trial version of Sinope Summarizer Personal Edition for 30 days is availabe http://www.carp-technologies.nl/en/sinope/downloads.html

Sinope Summarizer Personal Edition

Generate summaries with Sinope Summarizer
The Sinope Summarizer is the summarizing tool for professionals. It automatically generates summaries of arbitrary texts fully while retaining images, formatting and page layout. The Sinope Summarizer uses advanced language technologies to determine what the text is about and which information elements are important.

Summarize web pages while browsing the Internet
The Sinope Summarizer Personal Edition integrates with Microsoft Internet Explorer and enables users to summarize Web pages while browsing the Internet. It understands English, German and Dutch texts (more languages will be supported in the near future). Furthermore, the tool is provided to summarize saved html and plain text files, and a Clipboard Summarizer to summarize the contents of the Windows clipboard.

Generating summaries is as easy as dragging a slider!
The Sinope Summarizer gives the user complete control over the summary length. Generating and viewing a summary is as easy as dragging a slider!

Summarist, The software produces excerpts from texts
http://www.isi.edu/natural-language/projects/SUMMARIST.html

SUMMARIST is an attempt to develop robust extraction technology as far as it can go and then continue research and development of techniques to perform abstraction. This work faces the depth vs. robustness tradeoff: either systems analyze/interpret the input deeply enough to producegood summaries (but are limited to small application domains), or they work robustly over more or less unrestricted text (but cannot analyze deeply enough to fuse the input into a true summary, and hence perform only topic extraction). In particular, symbolic techniques, using parsers, grammars, and semantic representations, do not scale up to real-world size, while Information Retrieval and other statistical techniques, being based on word counting and word clustering, cannot create true summaries because they operate at the word (surface) level instead of at the concept level.

To date, SUMMARIST produces extract summaries in five languages and has been linked to translation engines for these languages in the MuST system at http://www.isi.edu/~cyl/must/must_beta.htm. Work is underway both to extend the extract-based capabilities of SUMMARIST and to build up the large knowledge collection required for inference-based abstraction. The project members includes: Eduard Hovy, senior project leader at http://www.isi.edu/natural-language/people/hovy.html; Chin-Yew Lin, research scientist at http://www.isi.edu/~cyl; and Daniel Marcu, research scientist at http://www.isi.edu/~marcu/.

TextAnalyst, Text Mining system for automatic indexing and Abstracting
http://www.megaputer.com/products/ta/index.php3

TextAnalyst 2.0, first delivered in the beginning of 1999 by Megaputer Intellence Inc., is unique software for automated semantic analysis of natural language texts. The system helps the user quickly summarize, efficiently navigate, and cluster documents in a textbase, as well as perform semantic information retrieval. TextAnalyst, a unique software tool for semantic analysis, navigation, and search of unstructured texts, can successfully tackle these and many other tasks.

Download TextAnalyst presentation and brochure from http://www.megaputer.com/down/tm/Text_Mining.pps and http://www.megaputer.com/down/tm/ta/docs/textanalyst_brochure.pdf respectively. The TextAnalyst tutorial is available at http://www.megaputer.com/products/ta/tutorial/ta_tutorial.zip. Download free software evaluations at http://www.megaputer.com/php/eval.php3.

TexNet32
http://instruct.uwo.ca/gplis/677/texnet32/texnet32.htm

It is a Freeware Software for the semi automatic production of Abstracts by Professor Tim Craven. It assists in the writing of abstracts and other short summaries including word and phrase extraction and various other capabilities.

TexNet32 is a 32-bit version of the TexNetF text network management system. Like TexNetF, it provides users with special tools designed to assist in writing conventional abstracts. The model of a hybrid abstracting system in which some tasks are performed by human abstractors and others by software seems to deliver the best results at this stage of technology development. TexNet32 generally uses typical Windows 95 interface elements, supporting keyboard and mouse, menus, and some accelerator keys.

The TexNet32 main window contains a menu bar and other windows that belong to the program: Full text, Parameters, Paragraph weights, Ancillary lists, Words in full text, Extract, Notes, and Abstract. Some of these are initially minimized. None can be closed before the main window is closed; if you attempt to close any of them, it will just be minimized. Contents of the menu bar and its pull-down menus vary with the kind of window that is active. You cannot close either of the minimum two "Editing" windows except by ending the session.

The currently active window is identified by the colour of its caption bar. To activate a window, click on it or select from the "Window" menu. The sizes of windows can be adjusted by the usual Windows 95 operations. Note that all operations are performed for the currently active window! (This is expecially important to remember when opening a source text). TexNet32 Recent Updates is available at http://instruct.uwo.ca/gplis/677/texnet32/texnetup.htm. Download it from http://instruct.uwo.ca/gplis/677/texnet32/texnet33.exe.

Contact: Prof Tim Craven at craven@uwo.ca or visit http://publish.uwo.ca/~craven/index.htm for more information.

ViewSum
http://www.viewsum.com

ViewSum is a text summarization tool that can provide a personalized summary of any document. Depending on your needs it can summarize the document by any amount - even to a single sentence or set of keywords. Key advantages over many other summarizers are that ViewSum will take account of your specified interests and preferences when generating a summary, leading to results tailored to your personal needs, and summaries are made from complete sentences.

ViewSum supports drag and drop of over 200 different document formats, and can be integrated into leading applications, such as Microsoft Word, Outlook and Internet Explorer. A Quick Help guide giving an overview of ViewSum is available at http://193.113.58.107/ViewSum/overview.htm.

Zentext Summarizer
http://www.zentext.com/z_product_summarizer.html

Zentext Summarizer Lite allows you to summarize large amounts of text instantly and intelligently free of cost. One can try this online at http://www.zentext.com/z_product_summarizer.html by simply pasting the text to be summarized and speifying the number of sentences required in the summary output. The service also hosts a summarizer utility, very small in size, can be downloaded from http://www.zentext.com/summarizer/summarizer.exe.

To run this utility, the computer should have Java Virtual Machine installed. This utility reframes the sentence into some long sentences and provides the summary in the true sense.

DUBLINCORE ELEMENTS

D.C Title: Automatic Abstracting & Summarizing Tools

D.C Creator: Vimal Kumar Varun

D.C Subject: Information overload, Summarization, Abstracting, Cross-document summarization, Automatic abstracting tools, summarizing tools

D.C Description: Describes automatic abstracting & summarizing tools like Brevity Document Summarizer, Copernic Summarizer, Extractor, HyperGen Summarization Tool, Intelligent Miner for Text: Summarization Tool, Inxight Summarizer, MultiGen, Pertinence Automatic Summariser, Sinope Summarisers, Summarist, TextAnalyst, TextNet32, ViewSum. Zentext Summarizer

D.C Publisher: Information Today & Tomorrow

D.C Contributor

D.C Date: 2002-06

D.C Type: article

D.C Format: html

D.C Identifier : http://itt.nissat.tripod.com/itt0202/ruoi0202.htm

D.C Source

D.C Language: en

D.C Relation

D.C Coverage

D.C Rights: Information Today & Tomorrow

Selective Analysis for Automatic Abstracting: Evaluating

Horacio Saggion and Guy Lapalme. Selective Analysis for Automatic Abstracting: Evaluating
Indicativeness and Acceptability. University of Montréal. (On line). Accessibility: http://www.iro.umontreal.ca/~saggion/evaluation2.pdf

Abstract

They have developed a new methodology for automatic abstracting of scientific and technical
articles called Selective Analysis. This methodology allows the generation of indicative informative abstracts integrating different types of information extracted from the source text.
The indicative part of the abstract identifies the topics of the document while the informative one
elaborates some topics according to the reader’s interest. The first evaluation of the methodology
demonstrates that Selective Analysis performs well in the task of signaling the topic of the
document demonstrating the viability of such a technique. The sentences the system produces
from instantiated templates are considered to be as acceptable as human produced sentences.

DUBLIN CORE ELEMENTS

D.C Title : Selective Analysis for Automatic Abstracting: Evaluating
Indicativeness and Acceptability

D.C Creator : Horacio Saggion and Guy Lapalme

D.C Subject : automatic abstracting, scientific and technical article, selective analysis, indicative informative

D.C Description : a new methodology for automatic abstracting of scientific and technical
articles called Selective Analysis.

D.C Publisher : Université de Montréal

D.C Contributor :

D.C Date :

D.C Type : these

D.C Format : PDF

D.C Identifier : http://www.iro.umontreal.ca/~saggion/evaluation2.pdf

D.C Source :

D.C Language : en

D.C Relation :

D.C Coverage :

D.C Rights : Horacio Saggion and Guy Lapalme

AUTOMATIC INDEXING

Tulic, Martin. Automatic indexing. 04.03.05 (on line). Accessibility: http://www.anindexer.com/about/auto/autoindex.html

The popularity of Internet search engines has caused many people think of the process of entering queries to retrieve documents from the Web based as automatic indexing. It is not.
Automatic indexing is the process of assigning and arranging index terms for natural-language texts without human intervention. For several decades, there have been many attempts to create such processes, driven both by the intellectual challenge and by the desire to significantly reduce the time and cost of producing indexes. Dozens if not hundreds of computer programs have been written to identify the words in a text and their location, and to alphabetize the words. Typically, definite and indefinite articles, prepositions and other words on a so-called stop list are not included in the program's output. Even some word processors provide this capability. Nevertheless, computer-generated results are often more like concordances (lists of words in a document) than truly usable indexes. There are several reasons for this.
The primary reason computers cannot automatically generate usable indexes is that, in indexing, abstraction is more important than alphabetization. Abstractions result from intellectual processes based on judgments about what to include and what to exclude. Computers are good at algorithmic processes such as alphabetization, but not good at inexplicable processes such as abstraction. Another reason is that headings in an index do not depend solely on terms used in the document; they also depend on terminology employed by intended users of the index and on their familiarity with the document. For example: in medical indexing, separate entries may need to be provided for brand names of drugs, chemical names, popular names and names used in other countries, even when certain of the names are not mentioned in the text. A third reason is that indexes should not contain headings for topics for which there is no information in the document. A typical document includes many terms signifying topics about which it contains no information. Computer programs include those terms in their results because they lack the intelligence required to distinguish terms signifying topics about which information is presented from terms about which no information is presented. A fourth reason is that headings and subheadings should be tailored to the needs and viewpoints of anticipated users. Some are aimed at users who are very knowledgeable about topics addressed in the document; others at users with little knowledge. Some are reminders to those who read the document already; others are enticements to potential readers. To date, no one has found a way to provide computer programs with the judgment, expertise, intelligence or audience awareness that is needed to create usable indexes. Until they do, automatic indexing will remain a pipe dream.
Although automated indexing is a pipe dream, computers are nevertheless an essential tool used by (but not a replacement for) indexers.

DUBLIN CORE ELEMENTS

D.C Title : Automatic indexing

D.C Creator : Tulic Martin

D.C Subject : index, computer program, indexing, abstraction

D.C Description : it presents the reasons why computers cannot automatically generate usable indexes.

D.C Publisher : Tulic Martin

D.C Contributor :

D.C Date : 04-03-05

D.C Type : article

D.C Format : HTML

D.C Identifier : http://www.anindexer.com/about/auto/autoindex.html

D.C Source :

D.C Language : en

D.C Relation :

D.C Coverage :

D.C Rights : Martin Tulic

AUTOMATIC INDEXING

BROWNE, Glenda. Automatic indexing. ANZI (Australian and
New Zealand Society of Indexers), 1996. (on line).
Accessibility: http://www.aussi.org/conferences/papers/browneg.htm

Introduction

This paper will examine developments in automatic indexing and abstracting in which the computer creates the index and abstract, with little or no human intervention. The emphasis is on practical applications, rather than theoretical studies. This paper does not cover computer-aided indexing, in which computers enhance the work of human indexers, or indexing of the Internet.

Research into automatic indexing and abstracting has been progressing since the late 1950's. Early reports claimed success, but practical applications have been limited. Computer indexing and abstracting are now being used commercially, with prospects for further use in the future. The history of automatic indexing and abstracting is well covered by Lancaster (1991).

Database indexing

Extraction indexing

The simplest method for indexing articles for bibliographic databases is extraction indexing, in which terms are extracted from the text of the article for inclusion in the index. The frequency of words in the article is determined, and the words which are found most often are included in the index. Alternatively, the words which occur most often in the article compared to their occurrence in the rest of the database, or in normal language, are included. This method can also take into account word stems (so that run and running are recognised as referring to the same concept), and can recognise phrases as well as single words.

Computer extraction indexing is more consistent than human extraction indexing. However, most human indexing is not simple extraction indexing, but is assignment indexing, in which the terms used in the index are not necessarily those found in the text.

Assignment indexing

For assignment indexing, the computer has a thesaurus, or controlled vocabulary, which lists all the subject headings which may be used in the index. For each of these subject headings it also has a list of profile words. These are words which, when found in the text of the article, indicate that the thesaurus term should be allocated.

For example, for the thesaurus term childbirth, the profile might include the words: childbirth, birth, labor, labour, delivery, forceps, baby, and born. As well as the profile, the computer also has criteria for inclusion -- instructions as to how often, and in what combination, the profile words must be present for that thesaurus term to be allocated.

The criteria might say, for example, that if the word childbirth is found ten times in an article, then the thesaurus term childbirth will be allocated. However if the word delivery is found ten times in an article, this in itself is not enough to warrant allocation of the term childbirth, as delivery could be referring to other subjects such as mail delivery. The criteria in this case would specify that the term delivery must occur a certain number of times, along with one or more of the other terms in the profile.

Computer database indexing in practice

In practice in database indexing, there is a continuum of use of computers, from no computer at all to fully automatic indexing.

• No computer.
• Computer clerical support, e.g. for data entry.
• Computer quality control, e.g. checking that all index terms are valid thesaurus terms.
• Computer intellectual assistance, e.g. helping with term choice and weighting.
• Automatic indexing (Hodge 1994).

Most database producers use computers at a number of different steps along this continuum. At the moment, however, automatic indexing is only ever used for a part of a database, for example, for a specific subject, access point, or document type.

Automatic indexing is used by the Defense Technology Information Center (DTIC) for the management-related literature in its database; it is used by FIZ Karlsruhe for indexing chemical names; it was used until 1992 by the Russian International Centre for Scientific and Technical Information (ICSTI) for its Russian language materials; and it was used by INSPEC for the re-indexing of its backfiles to new standards (Hodge 1994).

BIOSIS (Biological Abstracts) uses computers at all steps on the continuum, and uses automatic indexing in a number of areas. Title keywords are mapped by computer to the Semantic Vocabulary of 15,000 words; the terms from the Semantic Vocabulary are then mapped to one of 600 Concept Headings (that is, subject headings which describe the broad subject area of a document; Lancaster 1991).

The version of BIOSIS Previews available on the database host STN International uses automatic indexing to allocate Chemical Abstracts Service Registry Numbers to articles to describe the chemicals, drugs, enzymes and biosequences discussed in the article. The codes are allocated without human review, but a human operator spends five hours per month maintaining authority files and rules (Hodge 1994).

Retrieval and ranking tools (top)

There are two sides to the information retrieval process: documents must be indexed (by humans or computers) to describe their subject content; and documents must be retrieved using retrieval software and appropriate search statements.

Retrieval and ranking tools include those used with bibliographic databases, the 'indexes' used on the Internet, and personal computer software packages such as Personal Librarian (Koll 1993). Some programs, such as ISYS, are specialised for the fast retrieval of search words.

In theory these are complementary approaches, and both are needed for optimal retrieval. In practice, however, especially with documents in full-text databases, indexing is often omitted, and the retrieval software is relied on instead.

For these documents, which will not be indexed, it is important to ensure the best possible access. To accomplish this, the authors of the documents must be aware of the searching methods which will be used to retrieve them. Authors must use appropriate keywords throughout the text, and ensure that keywords are included in the title and section headings, as these are often given priority by retrieval and ranking tools (Sunter 1995).

The process whereby the creators of documents structure them to enhance retrieval is known as bottom-up indexing. A role for professional indexers in bottom-up indexing is as guides and trainers to document authors (Locke 1993).

One reason that automatic indexing may be unsuited to book indexing is that book indexes are not usually available electronically, and cannot be used in conjunction with powerful search software (Mulvany and Milstead 1994).

Document abstracting

Computers abstract documents (that is, condense their text) by searching for high frequency words in the text, and then selecting sentences in which clusters of these high frequency words occur. These sentences are then used in the order in which they appear in the text to make up the abstract. Flow can be improved by adding extra sentences (for example, if a sentence begins with 'Hence' or 'However' the previous sentence can be included as well) but the abstract remains an awkward collection of grammatically unrelated sentences.

To try and show the subject content, weighting can be given to sentences from certain locations in the document (e.g. the introduction) and to sentences containing cue words (e.g. 'finally', which suggests that a conclusion is starting). In addition, an organisation can give a weighting to words which are important to them: a footwear producer, for example, could require that every sentence containing the words foot or shoe should be included in the abstract.

Computer abstracting works best for documents which are written formally and consistently. It has been used with some success for generating case summaries from the text of legal decisions (Lancaster 1991).

After recent developments in natural language processing by computers, it is now possible for a computer to generate a grammatically correct abstract, in which sentences are modified without loss of meaning.

For example, from the following sentence:
"The need to generate enormous additional amounts of electric power while at the same time protecting the environment is one of the major social and technological problems that our society must solve in the next (sic!) future"
the computer generated the condensed sentence:
"The society must solve in the future the problem of the need to generate power while protecting the environment" (Lancaster 1991). Text summarisation experiments by British Telecom have resulted in useful, readable, abstracts (Farkas 1995).

Book indexing

There are a number of different types of microcomputer based software packages which are used for indexing.

The simplest are concordance generators, in which a list of the words found in the document, with the pages they are on, is generated. It is also possible to specify a list of words such that the concordance program will only include words from that list. This method was used to index drafts of the ISO999 indexing standard to help the committee members keep track of rules while the work was in progress (Shuter 1993).

Computer-aided indexing packages, such as Macrex and Cindex, are used by many professional indexers to enhance their work. They enable the indexer to view the index in alphabetical or page number order, can automatically produce various index styles, and save much typing.

Embedded indexing software is available with computer packages such as word processors, PageMaker, and Framemaker. With embedded indexing the document to be indexed is on disk, and the indexer inserts tags into the document to indicate which index terms should be allocated for that page. It does not matter if the document is then changed, as the index tags will move with the part of the document to which they refer. (So if twenty pages are added at the beginning of the document, all of the other text, including the index tags, will move 20 pages further on).

Disadvantages of embedded indexing are that it is time-consuming to do and awkward to edit (Mulvany 1994). Indexers who use embedded indexing often also use a program such as Macrex or Cindex to overcome these problems.

Embedded indexing is commonly used for documents such as computer software manuals which are published in many versions, and which allow very little time for the index to be created after the text has been finalised. With embedded indexing, indexing can start before the final page proofs are ready.
Embedded indexing will probably be used more in the future: for indexing works which are published in a number of formats; for indexing textbooks which are printed on request using only portions of the original textbook or using a combination of sources; and for indexing electronically published works which are continually adapted. In some of these applications the same person may do the work of the editor and indexer.
The most recent development in microcomputer book indexing software is Indexicon (Version 2), an automatic indexing package.

DUBLIN CORE ELEMENTS

D.C Title: Automatic indexing

D.C Creator : Browne Glenda

D.C Subject : automatic indexing, automatic abstracting, automatic summarizing, retrieval tools, information retrieval, Database indexing, Document abstracting, Book indexing

D.C Description: This paper examines developments in automatic indexing and abstracting in which the computer creates the index and abstract, with little or no human intervention. The emphasis is on practical applications, rather than theoretical studies. This paper does not cover computer-aided indexing, in which computers enhance the work of human indexers, or indexing of the Internet.
Research into automatic indexing and abstracting has been progressing since the late 1950's. Early reports claimed success, but practical applications have been limited. Computer indexing and abstracting are now being used commercially, with prospects for further use in the future. The history of automatic indexing and abstracting is well covered by Lancaster (1991).

D.C Publisher: ANZI, Australian and New Zealand Society of Indexers

D.C Contributor

D.C Date : 1996

D.C Type : Journal article

D.C Format : HTML

D.C Identifier : http://www.aussi.org/conferences/papers/browneg.htm

D.C Source

D.C Language : En

D.C Relation

D.C Coverage

D.C Rights: Glenda Browne

Automatic Analysing and Summarizing

Cross lingual information extraction and automated text summarization

Lexicon

Document analysis : a way to integrate existing paper information in architectural databases

Notes on Automatic Indexing

Automatic Abstracting & Summarizing Tools

Selective Analysis for Automatic Abstracting: Evaluating

AUTOMATIC INDEXING

AUTOMATIC INDEXING

Content of the blog

Descriptors

Archives du blog