DUBLIN CORE ELEMENTS
D.C Title: Cross lingual information extraction and automated text summarization
D.C Creator: Carnegie Mellon, School of computer science
D.C Subject: Information Extraction, Text Summarization, extracting
D.C Description: Information Extraction (IE) and Text Summarization are two methods of extracting relevant portions of the input text. IE produces templates, whose slots are filled with the important information, while Summarization produces one of various types of summary. Over the past 15 years, IE systems have come a long way, with commercial applications being around the corner. Summarization, in contrast, is a much younger enterprise. At present, it borrows techniques from IR and IE, but still requires a considerable amount of research before its unique aspects will be clearly understood.
D.C Publisher: Eduard Hovy
D.C Contributor: Ralph Grishman et alii
D.C Date: 1999/04
D.C Type: report
D.C Format: html
D.C Identifier : http://www.cs.cmu.edu/~ref/mlim/chapter3.html
D.C Source : http://www.cs.cmu.edu/~ref/mlim/
D.C Language: en
D.C Relation:
Aone, C., M.E. Okurowski, J. Gorlinsky, B. Larsen. 1997. A Scalable Summarization System using Robust NLP. Proceedings of the Workshop on Intelligent Scalable Text Summarization, 66—73. ACL/EACL Conference, Madrid, Spain.
DeJong, G.J. 1979. FRUMP: Fast Reading and Understanding Program. Ph.D. dissertation, Yale University.
Firmin Hand, T. and B. Sundheim. 1998. TIPSTER-SUMMAC Summarization Evaluation. Proceedings of the TIPSTER Text Phase III Workshop. Washington.
Grishman, R. and B. Sundheim (eds). 1996. Message Understanding Conference 6 (MUC-6): A Brief History. Proceedings of the COLING-96 Conference. Copenhagen, Denmark (466—471).
Hovy, E.H. and C-Y. Lin. 1998. Automating Text Summarization in SUMMARIST. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.
Hovy, E.H. and C-Y. Lin. 1999. Automated Multilingual Text Summarization and its Evaluation. Submitted.
Jing, H., R. Barzilay, K. McKeown, and M. Elhadad. 1998. Summarization Evaluation Methods: Experiments and Results. In E.H. Hovy and D. Radev (eds), Proceedings of the AAAI Spring Symposium on Intelligent Text Summarization (60—68).
Jones, K.S. and J.R.Galliers. 1996. Evaluating Natural Language Processing Systems: An Analysis and Review. New York: Springer.
Knight, K. and J. Graehl. 1997. Machine Transliteration. Proceedings of the 35th ACL-97 Conference. Madrid, Spain, (128—135).
Lin, C-Y. 1999. Training a Selection Function for Extraction in SUMMARIST. Submitted.
Marcu, D. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. Ph.D. dissertation, University of Toronto.
Marcu, D. 1999. The Automatic Construction of Large-scale Corpora for Summarization Research. Forthcoming.
Jacobs, P.S. and L.F. Rau. 1990. SCISOR: Extracting Information from On-Line News. Communications of the ACM 33(11): 88—97.
Kupiec, J., J. Pedersen, and F. Chen. 1995. A Trainable Document Summarizer. In Proceedings of the 18th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 68—73. Seattle, WA.
Mani, I. et al. 1998. The TIPSTER Text Summarization Evaluation: Initial Report.
Miike, S., E. Itoh, K. Ono, and K. Sumita. 1994. A Full-Text Retrieval System with Dynamic Abstract Generation Function. Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR-94), 152—161.
Radev, D. 1998. Generating Natural Language Summaries from Multiple On-Line Sources: Language Reuse and Regeneration. Ph.D. dissertation, Columbia University.
Reimer, U. and U. Hahn. 1998. A Formal Model of Text summarization Based on Condensation Operators of a Terminological Logic. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.
Sager, N. 1970. The Sublanguage Method in String Grammars. In R.W. Ewton, Jr. and J. Ornstein (eds.), Studies in Language and Linguistics (89—98).
Sparck Jones, K. 1998. Introduction to Text Summarisation. In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.
Strzalkowski, T. et al., 1998. ? In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization. Cambridge: MIT Press.
D.C Coverage
D.C Rights:Carnegie Mellon, School of computer science
Lexicon
ODLIS: Online Dictionary for Library and Information Science http://lu.com/odlis/
A glossary of archival and records http://www.archivists.org/glossary/index.asp
The Information Professional's Glossary http://www.sir.arizona.edu/resources/glossary.html
A
Assignment indexing
(ODLIS) A method of indexing in which a human indexer selects one or more subject headings or descriptors from a list of controlled vocabulary to represent the subject(s) of a work. The indexing terms selected to represent the content need not appear in the title or text of the document indexed. Synonymous with assigned indexing.
=Indexation de nomination
(A glossary of archival and records) The process of creating an ordered list of headings, using terms that may not be found in the text, with pointers to relevant portions in the material. Assignment indexing usually draws headings from a controlled vocabulary. Assignment indexing is distinguished from extraction indexing, typically done by a computer, which relies on the terms found in the document. At one point, assignment indexing was a manual process, requiring human judgment to link the headings to the concepts in the text. The process of assigning terms from a controlled vocabulary has been automated, with mixed success, by building rules of analysis that can assign headings on the basis of other, related terms in the text. =Indexation de nomination
Automation
(The Information Professional's Glossary) Changing from manual, paperbased methods of recording, organizing, and retrieving information to computerized systems. Circulation control and cataloging are among the most widely automated library functions.
=Automatisation
Automatic indexing
(ODLIS) a method of indexing in which an algorithm is applied by a computer to the title and/or the text of a work to identify and extract words and phrases representing subjects, for use as heading under which entries are made in the index.
=Indexation automatique
(A glossary of archival and records) The use of computers to enable concepts to be located within the material indexed. Early automatic indexing was largely limited to extraction indexing, relying on specific terms in the text to represent concepts. With increasing sophistication, computers were able to perform assignment indexing, allowing users to search using concepts in a controlled vocabulary using words that may not appear in the text. Both forms of automatic indexing created ordered lists of concepts that could be browsed, with pointers to the place where those concepts would appear in the text. Full-text search engines, such as those that index the web, create links between terms or phrases and the documents but generally do not produce a browsable list of headings.
also computer-based indexing, machine indexing
C
Computer-aided retrieval
(A glossary of archival and records) The use of computers to aid access to information on physical media, especially microfilm. Computer-aided retrieval typically includes an index and sometimes a brief description of the material that can be searched. When relevant material is identified, the system may automatically load microfilm or offline-storage media containing the materials, then locate it within its container and display it.
=Recherche assistée par ordinateur
F
Full-text search :
(A glossary of archival and records) (Computing) • A technique to provide access to electronic materials by indexing every word in every item.
Full-text searching is distinguished from manual indexing, which searches headings assigned to each document. Early full-text searching was limited, especially in its ability to distinguish different senses of a word; a search for glass might find documents relating to Philip Glass, window glass, and glass ice. More sophisticated search engines could use several terms, with Boolean logic and adjacency rules, to help narrow false hits. Modern search engines seen on the web use a variety of similar techniques to rank documents in terms of likely relevancy.
= Recherche en texte intégral
Autres sources
Library Terminology: A Guide for International Students
http://www.libraries.rutgers.edu/rul/lib_servs/intl_students_terms.shtml
http://palimpsest.stanford.edu/lex/
Glossary of Library Terms
Glossary of Academic Information Technology Terms
Library Terminology: A Guide for International Students
Document analysis : a way to integrate existing paper information in architectural databases
DUBLIN CORE ELEMENTS
D.C Title: Document analysis : a way to integrate existing paper information in architectural databases
D.C Creator: Tombre K. ; Paul J.C.
D.C Subject: Automatic analysis ; Printed document ; Formatting ; Automatic recognition ; Industrial drawing ; Graphic document ; Case study ; General study ; Document processing ; Documentation ; Documentation data processing
D.C Description: In any domain, the use of information systems leads to the problem of converting the existing archives of paper documents into a format suitable for the computerized system. In this area, most of the attention has probably been given to strutured document analysis, i.e. the automated analysis of business document such as letters, forms, documentation, manuals etc., including the well-known area of character recognition. But document analysis is also a powerful tool in technical domains such as architecture, where large quantitities of drawings of various kinds are available on paper. In this paper we shortly present the state of the art in technical drawing analysis and we investigate the suitably of document analysis to the conversion from paper to architectural databases
D.C Publisher: CNRS
D.C Contributor
D.C Date: 1994
D.C Type: Report
D.C Format: book
D.C Identifier : INIST-CNRS, Cote INIST : RP 13239
D.C Source
D.C Language: en
D.C Relation
D.C Coverage
D.C Rights: Copyright 2006 INIST-CNRS
D.C Title: Document analysis : a way to integrate existing paper information in architectural databases
D.C Creator: Tombre K. ; Paul J.C.
D.C Subject: Automatic analysis ; Printed document ; Formatting ; Automatic recognition ; Industrial drawing ; Graphic document ; Case study ; General study ; Document processing ; Documentation ; Documentation data processing
D.C Description: In any domain, the use of information systems leads to the problem of converting the existing archives of paper documents into a format suitable for the computerized system. In this area, most of the attention has probably been given to strutured document analysis, i.e. the automated analysis of business document such as letters, forms, documentation, manuals etc., including the well-known area of character recognition. But document analysis is also a powerful tool in technical domains such as architecture, where large quantitities of drawings of various kinds are available on paper. In this paper we shortly present the state of the art in technical drawing analysis and we investigate the suitably of document analysis to the conversion from paper to architectural databases
D.C Publisher: CNRS
D.C Contributor
D.C Date: 1994
D.C Type: Report
D.C Format: book
D.C Identifier : INIST-CNRS, Cote INIST : RP 13239
D.C Source
D.C Language: en
D.C Relation
D.C Coverage
D.C Rights: Copyright 2006 INIST-CNRS
Notes on Automatic Indexing
By Seth A. Maislin
"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement. I'm a firm believer that the technology doesn't exist, and that a human being is required to write an index. Thus I don't use the software, and I also don't recommend it.
There are those who advocate it, arguing that it's "not as bad as an indexer would have you think." These people are often coming from the standpoint that automatic software is faster and cheaper, and they're right. Thus the issue surrounds quality.
I believe that good automatic indexes will exist once there's good artificial intelligence, something that presently doesn't exist. In very limited circumstances, however, it does; a machine can easily cull capitalized words from a textbook to create an approximation of an index of names -- although, again, the machine isn't going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same format and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an unuseful and trivial way, as are the names in this paragraph! So imagine the problems trying to get a machine to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text.
FYI, those who advocate automatic software, however, would argue that the machine gets "close enough" so that a human being can edit the resulting product. However, expert evaluators unanimously agree that the software fails; those who disagree are likely those who are sufficiently ignorant of indexing in the first place such that they are unable to determine the quality differences.
Oh, I should mention that there are software programs that human indexers use to simplify and speed up the mechanics of the index process. For example, it would be silly to disallow a computer to alphabetize the entries, reformat the index, and manipulate page numbers. There are a few software packages that do this exclusively, which are considered top of the line; other applications that have indexing capabilities, such as Microsoft Word and Adobe FrameMaker, have some of these capabilities, with notable limitations.
For information on the various software available, see http://www.asindexing.org/site/software.shtml.
DUBLIN CORE ELEMENTS
D.C Title: Notes on Automatic Indexing
D.C Creator: Seth A. Maislin
D.C Subject: Automated indexing software, automatic index, index process
D.C Description: Limits of the automatic indexing
D.C Publisher: Seth A. Maislin
D.C Contributor
D.C Date: 2005-03-15
D.C Type: article
D.C Format: html
D.C Identifier : http://taxonomist.tripod.com/indexing/autoindex.html
D.C Source
D.C Language:en
D.C Relation
D.C Coverage
D.C Rights: Seth A. Maislin
"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement. I'm a firm believer that the technology doesn't exist, and that a human being is required to write an index. Thus I don't use the software, and I also don't recommend it.
There are those who advocate it, arguing that it's "not as bad as an indexer would have you think." These people are often coming from the standpoint that automatic software is faster and cheaper, and they're right. Thus the issue surrounds quality.
I believe that good automatic indexes will exist once there's good artificial intelligence, something that presently doesn't exist. In very limited circumstances, however, it does; a machine can easily cull capitalized words from a textbook to create an approximation of an index of names -- although, again, the machine isn't going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same format and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an unuseful and trivial way, as are the names in this paragraph! So imagine the problems trying to get a machine to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text.
FYI, those who advocate automatic software, however, would argue that the machine gets "close enough" so that a human being can edit the resulting product. However, expert evaluators unanimously agree that the software fails; those who disagree are likely those who are sufficiently ignorant of indexing in the first place such that they are unable to determine the quality differences.
Oh, I should mention that there are software programs that human indexers use to simplify and speed up the mechanics of the index process. For example, it would be silly to disallow a computer to alphabetize the entries, reformat the index, and manipulate page numbers. There are a few software packages that do this exclusively, which are considered top of the line; other applications that have indexing capabilities, such as Microsoft Word and Adobe FrameMaker, have some of these capabilities, with notable limitations.
For information on the various software available, see http://www.asindexing.org/site/software.shtml.
DUBLIN CORE ELEMENTS
D.C Title: Notes on Automatic Indexing
D.C Creator: Seth A. Maislin
D.C Subject: Automated indexing software, automatic index, index process
D.C Description: Limits of the automatic indexing
D.C Publisher: Seth A. Maislin
D.C Contributor
D.C Date: 2005-03-15
D.C Type: article
D.C Format: html
D.C Identifier : http://taxonomist.tripod.com/indexing/autoindex.html
D.C Source
D.C Language:en
D.C Relation
D.C Coverage
D.C Rights: Seth A. Maislin
Automatic Abstracting & Summarizing Tools
Vimal Kumar Varun
Scientist 'D'
Department of Scientific & Industrial Research
Technology Bhawan, New Mehrauli Road, New Delhi-110016. INDIA
Internet: vkv@alpha.nic.in URL: http://vkv.tripod.com
Information Today & Tomorrow, Vol. 21, No. 2, June 2002, p.12-p.16
http://itt.nissat.tripod.com/itt0202/ruoi0202.htm
Preamble
Information overload is becoming a problem for an increasingly large number of people, and a key step in reducing the problem is to use an abridgement tool. A summary tailored to your interest provides a convenient way to get a quick impression of the content of a document. This maybe to make a decision whether or not to read the full document, or even as an alternative to the original, thus saving once valuable time.
Summarization is the process of condensing a source text into a shorter version preserving its information content. There are two kinds of automatic summarization. The first summarizes whole documents, either by extracting important sentences or by rephrasing and shortening the original text. Most summarization tools currently under development extract key passages or topic sentences, rather than rephrasing the document. Rephrasing is a much more difficult task. The second process summarizes across multiple documents. Cross-document summarization is harder, but potentially more valuable. It will increase the value of alerting services by condensing retrieved information into smaller, more manageable reports. Cross-document summarization will allow us to deliver very brief overviews of new developments to busy clients. We can expect some tools to do this within the next 2-4 years.
Brevity Document Summarizer, concise outlines of your documents
http://www.lextek.com/brevity/
Brevity easily generates document summaries and it can be as long or as short as one wishes. It can also be used to highlight key sentences or words in the document. The key benefits of Brevity are: accurate generation of automated document summaries, quick determination of a document's contents, highlighting of significant words and sentences in a document and find the key parts of a document. The demo of the Sumamrizer is available at http://www.lextek.com/brevity/bravedemo.htm.
Contact: Lextek International at sales@Lextek.com for more information
Copernic Summarizer, free yourself from information overload
http://www.copernic.com/products/summarizer/index.html
This easy-to-use summarizing software dramatically increases the productivity and efficiency by creating concise summaries of any document or Web page without missing any important information. It can be invoked directly from the application like MS Word, MS Outlook, Eudora, Netscape and Adobe Acrobat.
Using sophisticated statistical and linguistic algorithms, it pinpoints the key concepts and extracts the most relevant sentences, resulting in a summary that is a shorter and condensed version of the original text. Complete feature list is available at http://www.copernic.com/products/summarizer/features.html. Free trial version of Copernic Summarizer fully functional for 30 days is available at http://www.copernic.com/products/summarizer/download.html.
Extractor, text summarization software for automatic indexing and abstracting
http://extractor.iit.nrc.ca/
Extractor is a software for automatically summarizing text, developed by the Interactive Information Group. Extractor takes a text file as input and generates a list of key words and sentences as output. On-line demo (also for German texts) is available at the site.
HyperGen Sumamrization Tool
http://crl.nmsu.edu/Research/Projects/minds/core_sumamrizer/
HyperGen exploits hypertext technology to automatically generate hypertext structure from a plain or hypertext document. Every part of the document is summarized. The different summaries are linked together in a hypertext structure where each hyperlink is labeled meaningfully. HyperGen has implemented preliminary ideas for generating meaningful labels by identifying key topics and rhetorical types.
A presentation on `Hypertext Summary Extraction for Fast Document Browsing' by Kavi Mahesh is available at http://crl.nmsu.edu/Research/Projects/minds/core_summarizer/talk/. This presentation includes slides and several examples of HyperGen's plain and hypertext summaries with corresponding summaries from Microsoft's summarization tool for comparison.
The key features of HyperGen incldues: Hypertext summarization of documents; Automatic generation of Hypertext summaries with multiple layers of detail from plain or hypertext documents; Generation of meaningful labels for hyperlinks; etc. It is multilingual; ideal for document filtering, browsing, or document content visualization; can be used in conjunction with any web browser; can be easily integrated with extraction, retrieval, and machine translation systems; and implemented entirely in Java.
Contact: Kavi Mahesh at mahesh@crl.nmsu.edu for more information.
Intelligent Miner for Text: summarization tool
http://www-3.ibm.com/software/data/iminer/fortext/summarize/summarize.html
The summarization tool automatically extracts the most relevant sentences from a document, creates a summary of the document from these sentences, and uses a set of ranking strategies on sentence and on word level to calculate the relevancy of a sentence to a document. The user can set the length of the summary.
IBM Intellidence Miner for Text available at http://www-3.ibm.com/software/data/iminer/ includes text analysis tools such as a Feature Extraction tool, Clustering tools, a Summarization tool, and a Categorization tool. Also incorporates the IBM Text Search Engine, NetQuestion Solution, an Internet/intranet text-search solution, and the IBM Web Crawler Package.
Inxight Summarizer, systems for the automatic production of text summaries
http://www.inxight.com/products/summarizer/
The Inxight Summarizer™ SDK (software development kit) allowing applications developers to incorporate into their products an intelligent solution to many problems inherent to online searches. By focusing on the relevant key sentences contained within a document, the Summarizer technology enables end-users to browse quickly though volumes of information and extract the documents most suitable to their search requirements. Summarizer utilizes consistent sentence-selection criteria that match the conceptual content of documents. End-users save precious time and effort since they do not have to download and read each retrieved document to determine its relevancy. They experience easier navigation through Web sites, faster access to pertinent information and increased productivity.
Summarizer can summarize a typical document in a fraction of a second and so enables users to use more of their time utilizing data, not just trying to find it. Also, to expedite search functions, the Summarizer can be "trained" to find key sentences based on the structure of specific document types. Information is accessible by the length of key sentences or the number of key phrases. The end-user can control the weight of phrases by query phrase or drop phrases.
MultiGen
http://www.cs.columbia.edu/~regina/demo4/
MultiGen is a multi-document summarization tool developed at Columbia University. Multiple document summarization could be useful, for example, in the context of large information retrieval systems to help determine which documents are relevant. Such summaries can cut down on the amount of reading by synthesizing information common among all retrieved documents and by explicitly highlighting distinctions.
It automatically generates a concise summary by identifying similarities and differences across a set of related documents. Input to the system is a set of related documents, such as those retrieved by a search engine in response to a particular query. The MultiGen examples are available at http://www.cs.columbia.edu/~regina/demo4/examples.html.
Contact: Principal investigators Prof. Kathleen R. McKeown and Dr. Judith L. Klavans at kathy@cs.columbia.edu and klavans@cs.columbia.edu respectively for more infomation.
Pertinence Automatic Summary or Abstract
http://www.pertinence.net/index_en.html
It is a a data-processing tool which transforms a source text into a new text in a shorter version keeping the relevant information intact. Several formats of texts including html, pdf, MS word are accepted. The original text can be in one of several languages like English, French, Spanish, Italian, Portuguese, German, Chinese, Japanese and Korean. Input can be a file on your hard disk, or a file from the Net or some text that you can cut and paste! The domain of the text supported are chemistry, finance, law and medicine. For other domains, contact Pertinence at contact@pertinence.net.
One can try Pertinence free at http://www.pertinence.net/index_en.html after registering at http://www.pertinence.net/register_en.html by entering name, organization and email address. On submission, the password of preferred length (2-8 characters) is sent by an email. Internet Explorer 5+ or Netscape 6+ is recommended.
Contact: Authors A Lehmam and P Bouvet at lehmam@pertinence.net and bouvet@pertinence.net respectively for more information.
Sinope Summarizer, automatic text summarizer
http://www.carp-technologies.nl/en/sinope/
The Sinope Summarizer integrates with Microsoft Internet Explorer and summarizes the text in the Web page. The percent summary level can be adjusted from 1 to 100%. The tool keeps pictures and other formatting details intact. The utility is available for English, German and Dutch text. A must have for everybody that surfs the internet. The shareware trial version of Sinope Summarizer Personal Edition for 30 days is availabe http://www.carp-technologies.nl/en/sinope/downloads.html
Sinope Summarizer Personal Edition
Generate summaries with Sinope Summarizer
The Sinope Summarizer is the summarizing tool for professionals. It automatically generates summaries of arbitrary texts fully while retaining images, formatting and page layout. The Sinope Summarizer uses advanced language technologies to determine what the text is about and which information elements are important.
Summarize web pages while browsing the Internet
The Sinope Summarizer Personal Edition integrates with Microsoft Internet Explorer and enables users to summarize Web pages while browsing the Internet. It understands English, German and Dutch texts (more languages will be supported in the near future). Furthermore, the tool is provided to summarize saved html and plain text files, and a Clipboard Summarizer to summarize the contents of the Windows clipboard.
Generating summaries is as easy as dragging a slider!
The Sinope Summarizer gives the user complete control over the summary length. Generating and viewing a summary is as easy as dragging a slider!
Summarist, The software produces excerpts from texts
http://www.isi.edu/natural-language/projects/SUMMARIST.html
SUMMARIST is an attempt to develop robust extraction technology as far as it can go and then continue research and development of techniques to perform abstraction. This work faces the depth vs. robustness tradeoff: either systems analyze/interpret the input deeply enough to producegood summaries (but are limited to small application domains), or they work robustly over more or less unrestricted text (but cannot analyze deeply enough to fuse the input into a true summary, and hence perform only topic extraction). In particular, symbolic techniques, using parsers, grammars, and semantic representations, do not scale up to real-world size, while Information Retrieval and other statistical techniques, being based on word counting and word clustering, cannot create true summaries because they operate at the word (surface) level instead of at the concept level.
To date, SUMMARIST produces extract summaries in five languages and has been linked to translation engines for these languages in the MuST system at http://www.isi.edu/~cyl/must/must_beta.htm. Work is underway both to extend the extract-based capabilities of SUMMARIST and to build up the large knowledge collection required for inference-based abstraction. The project members includes: Eduard Hovy, senior project leader at http://www.isi.edu/natural-language/people/hovy.html; Chin-Yew Lin, research scientist at http://www.isi.edu/~cyl; and Daniel Marcu, research scientist at http://www.isi.edu/~marcu/.
TextAnalyst, Text Mining system for automatic indexing and Abstracting
http://www.megaputer.com/products/ta/index.php3
TextAnalyst 2.0, first delivered in the beginning of 1999 by Megaputer Intellence Inc., is unique software for automated semantic analysis of natural language texts. The system helps the user quickly summarize, efficiently navigate, and cluster documents in a textbase, as well as perform semantic information retrieval. TextAnalyst, a unique software tool for semantic analysis, navigation, and search of unstructured texts, can successfully tackle these and many other tasks.
Download TextAnalyst presentation and brochure from http://www.megaputer.com/down/tm/Text_Mining.pps and http://www.megaputer.com/down/tm/ta/docs/textanalyst_brochure.pdf respectively. The TextAnalyst tutorial is available at http://www.megaputer.com/products/ta/tutorial/ta_tutorial.zip. Download free software evaluations at http://www.megaputer.com/php/eval.php3.
TexNet32
http://instruct.uwo.ca/gplis/677/texnet32/texnet32.htm
It is a Freeware Software for the semi automatic production of Abstracts by Professor Tim Craven. It assists in the writing of abstracts and other short summaries including word and phrase extraction and various other capabilities.
TexNet32 is a 32-bit version of the TexNetF text network management system. Like TexNetF, it provides users with special tools designed to assist in writing conventional abstracts. The model of a hybrid abstracting system in which some tasks are performed by human abstractors and others by software seems to deliver the best results at this stage of technology development. TexNet32 generally uses typical Windows 95 interface elements, supporting keyboard and mouse, menus, and some accelerator keys.
The TexNet32 main window contains a menu bar and other windows that belong to the program: Full text, Parameters, Paragraph weights, Ancillary lists, Words in full text, Extract, Notes, and Abstract. Some of these are initially minimized. None can be closed before the main window is closed; if you attempt to close any of them, it will just be minimized. Contents of the menu bar and its pull-down menus vary with the kind of window that is active. You cannot close either of the minimum two "Editing" windows except by ending the session.
The currently active window is identified by the colour of its caption bar. To activate a window, click on it or select from the "Window" menu. The sizes of windows can be adjusted by the usual Windows 95 operations. Note that all operations are performed for the currently active window! (This is expecially important to remember when opening a source text). TexNet32 Recent Updates is available at http://instruct.uwo.ca/gplis/677/texnet32/texnetup.htm. Download it from http://instruct.uwo.ca/gplis/677/texnet32/texnet33.exe.
Contact: Prof Tim Craven at craven@uwo.ca or visit http://publish.uwo.ca/~craven/index.htm for more information.
ViewSum
http://www.viewsum.com
ViewSum is a text summarization tool that can provide a personalized summary of any document. Depending on your needs it can summarize the document by any amount - even to a single sentence or set of keywords. Key advantages over many other summarizers are that ViewSum will take account of your specified interests and preferences when generating a summary, leading to results tailored to your personal needs, and summaries are made from complete sentences.
ViewSum supports drag and drop of over 200 different document formats, and can be integrated into leading applications, such as Microsoft Word, Outlook and Internet Explorer. A Quick Help guide giving an overview of ViewSum is available at http://193.113.58.107/ViewSum/overview.htm.
Zentext Summarizer
http://www.zentext.com/z_product_summarizer.html
Zentext Summarizer Lite allows you to summarize large amounts of text instantly and intelligently free of cost. One can try this online at http://www.zentext.com/z_product_summarizer.html by simply pasting the text to be summarized and speifying the number of sentences required in the summary output. The service also hosts a summarizer utility, very small in size, can be downloaded from http://www.zentext.com/summarizer/summarizer.exe.
To run this utility, the computer should have Java Virtual Machine installed. This utility reframes the sentence into some long sentences and provides the summary in the true sense.
DUBLINCORE ELEMENTS
D.C Title: Automatic Abstracting & Summarizing Tools
D.C Creator: Vimal Kumar Varun
D.C Subject: Information overload, Summarization, Abstracting, Cross-document summarization, Automatic abstracting tools, summarizing tools
D.C Description: Describes automatic abstracting & summarizing tools like Brevity Document Summarizer, Copernic Summarizer, Extractor, HyperGen Summarization Tool, Intelligent Miner for Text: Summarization Tool, Inxight Summarizer, MultiGen, Pertinence Automatic Summariser, Sinope Summarisers, Summarist, TextAnalyst, TextNet32, ViewSum. Zentext Summarizer
D.C Publisher: Information Today & Tomorrow
D.C Contributor
D.C Date: 2002-06
D.C Type: article
D.C Format: html
D.C Identifier : http://itt.nissat.tripod.com/itt0202/ruoi0202.htm
D.C Source
D.C Language: en
D.C Relation
D.C Coverage
D.C Rights: Information Today & Tomorrow
Scientist 'D'
Department of Scientific & Industrial Research
Technology Bhawan, New Mehrauli Road, New Delhi-110016. INDIA
Internet: vkv@alpha.nic.in URL: http://vkv.tripod.com
Information Today & Tomorrow, Vol. 21, No. 2, June 2002, p.12-p.16
http://itt.nissat.tripod.com/itt0202/ruoi0202.htm
Preamble
Information overload is becoming a problem for an increasingly large number of people, and a key step in reducing the problem is to use an abridgement tool. A summary tailored to your interest provides a convenient way to get a quick impression of the content of a document. This maybe to make a decision whether or not to read the full document, or even as an alternative to the original, thus saving once valuable time.
Summarization is the process of condensing a source text into a shorter version preserving its information content. There are two kinds of automatic summarization. The first summarizes whole documents, either by extracting important sentences or by rephrasing and shortening the original text. Most summarization tools currently under development extract key passages or topic sentences, rather than rephrasing the document. Rephrasing is a much more difficult task. The second process summarizes across multiple documents. Cross-document summarization is harder, but potentially more valuable. It will increase the value of alerting services by condensing retrieved information into smaller, more manageable reports. Cross-document summarization will allow us to deliver very brief overviews of new developments to busy clients. We can expect some tools to do this within the next 2-4 years.
Brevity Document Summarizer, concise outlines of your documents
http://www.lextek.com/brevity/
Brevity easily generates document summaries and it can be as long or as short as one wishes. It can also be used to highlight key sentences or words in the document. The key benefits of Brevity are: accurate generation of automated document summaries, quick determination of a document's contents, highlighting of significant words and sentences in a document and find the key parts of a document. The demo of the Sumamrizer is available at http://www.lextek.com/brevity/bravedemo.htm.
Contact: Lextek International at sales@Lextek.com for more information
Copernic Summarizer, free yourself from information overload
http://www.copernic.com/products/summarizer/index.html
This easy-to-use summarizing software dramatically increases the productivity and efficiency by creating concise summaries of any document or Web page without missing any important information. It can be invoked directly from the application like MS Word, MS Outlook, Eudora, Netscape and Adobe Acrobat.
Using sophisticated statistical and linguistic algorithms, it pinpoints the key concepts and extracts the most relevant sentences, resulting in a summary that is a shorter and condensed version of the original text. Complete feature list is available at http://www.copernic.com/products/summarizer/features.html. Free trial version of Copernic Summarizer fully functional for 30 days is available at http://www.copernic.com/products/summarizer/download.html.
Extractor, text summarization software for automatic indexing and abstracting
http://extractor.iit.nrc.ca/
Extractor is a software for automatically summarizing text, developed by the Interactive Information Group. Extractor takes a text file as input and generates a list of key words and sentences as output. On-line demo (also for German texts) is available at the site.
HyperGen Sumamrization Tool
http://crl.nmsu.edu/Research/Projects/minds/core_sumamrizer/
HyperGen exploits hypertext technology to automatically generate hypertext structure from a plain or hypertext document. Every part of the document is summarized. The different summaries are linked together in a hypertext structure where each hyperlink is labeled meaningfully. HyperGen has implemented preliminary ideas for generating meaningful labels by identifying key topics and rhetorical types.
A presentation on `Hypertext Summary Extraction for Fast Document Browsing' by Kavi Mahesh is available at http://crl.nmsu.edu/Research/Projects/minds/core_summarizer/talk/. This presentation includes slides and several examples of HyperGen's plain and hypertext summaries with corresponding summaries from Microsoft's summarization tool for comparison.
The key features of HyperGen incldues: Hypertext summarization of documents; Automatic generation of Hypertext summaries with multiple layers of detail from plain or hypertext documents; Generation of meaningful labels for hyperlinks; etc. It is multilingual; ideal for document filtering, browsing, or document content visualization; can be used in conjunction with any web browser; can be easily integrated with extraction, retrieval, and machine translation systems; and implemented entirely in Java.
Contact: Kavi Mahesh at mahesh@crl.nmsu.edu for more information.
Intelligent Miner for Text: summarization tool
http://www-3.ibm.com/software/data/iminer/fortext/summarize/summarize.html
The summarization tool automatically extracts the most relevant sentences from a document, creates a summary of the document from these sentences, and uses a set of ranking strategies on sentence and on word level to calculate the relevancy of a sentence to a document. The user can set the length of the summary.
IBM Intellidence Miner for Text available at http://www-3.ibm.com/software/data/iminer/ includes text analysis tools such as a Feature Extraction tool, Clustering tools, a Summarization tool, and a Categorization tool. Also incorporates the IBM Text Search Engine, NetQuestion Solution, an Internet/intranet text-search solution, and the IBM Web Crawler Package.
Inxight Summarizer, systems for the automatic production of text summaries
http://www.inxight.com/products/summarizer/
The Inxight Summarizer™ SDK (software development kit) allowing applications developers to incorporate into their products an intelligent solution to many problems inherent to online searches. By focusing on the relevant key sentences contained within a document, the Summarizer technology enables end-users to browse quickly though volumes of information and extract the documents most suitable to their search requirements. Summarizer utilizes consistent sentence-selection criteria that match the conceptual content of documents. End-users save precious time and effort since they do not have to download and read each retrieved document to determine its relevancy. They experience easier navigation through Web sites, faster access to pertinent information and increased productivity.
Summarizer can summarize a typical document in a fraction of a second and so enables users to use more of their time utilizing data, not just trying to find it. Also, to expedite search functions, the Summarizer can be "trained" to find key sentences based on the structure of specific document types. Information is accessible by the length of key sentences or the number of key phrases. The end-user can control the weight of phrases by query phrase or drop phrases.
MultiGen
http://www.cs.columbia.edu/~regina/demo4/
MultiGen is a multi-document summarization tool developed at Columbia University. Multiple document summarization could be useful, for example, in the context of large information retrieval systems to help determine which documents are relevant. Such summaries can cut down on the amount of reading by synthesizing information common among all retrieved documents and by explicitly highlighting distinctions.
It automatically generates a concise summary by identifying similarities and differences across a set of related documents. Input to the system is a set of related documents, such as those retrieved by a search engine in response to a particular query. The MultiGen examples are available at http://www.cs.columbia.edu/~regina/demo4/examples.html.
Contact: Principal investigators Prof. Kathleen R. McKeown and Dr. Judith L. Klavans at kathy@cs.columbia.edu and klavans@cs.columbia.edu respectively for more infomation.
Pertinence Automatic Summary or Abstract
http://www.pertinence.net/index_en.html
It is a a data-processing tool which transforms a source text into a new text in a shorter version keeping the relevant information intact. Several formats of texts including html, pdf, MS word are accepted. The original text can be in one of several languages like English, French, Spanish, Italian, Portuguese, German, Chinese, Japanese and Korean. Input can be a file on your hard disk, or a file from the Net or some text that you can cut and paste! The domain of the text supported are chemistry, finance, law and medicine. For other domains, contact Pertinence at contact@pertinence.net.
One can try Pertinence free at http://www.pertinence.net/index_en.html after registering at http://www.pertinence.net/register_en.html by entering name, organization and email address. On submission, the password of preferred length (2-8 characters) is sent by an email. Internet Explorer 5+ or Netscape 6+ is recommended.
Contact: Authors A Lehmam and P Bouvet at lehmam@pertinence.net and bouvet@pertinence.net respectively for more information.
Sinope Summarizer, automatic text summarizer
http://www.carp-technologies.nl/en/sinope/
The Sinope Summarizer integrates with Microsoft Internet Explorer and summarizes the text in the Web page. The percent summary level can be adjusted from 1 to 100%. The tool keeps pictures and other formatting details intact. The utility is available for English, German and Dutch text. A must have for everybody that surfs the internet. The shareware trial version of Sinope Summarizer Personal Edition for 30 days is availabe http://www.carp-technologies.nl/en/sinope/downloads.html
Sinope Summarizer Personal Edition
Generate summaries with Sinope Summarizer
The Sinope Summarizer is the summarizing tool for professionals. It automatically generates summaries of arbitrary texts fully while retaining images, formatting and page layout. The Sinope Summarizer uses advanced language technologies to determine what the text is about and which information elements are important.
Summarize web pages while browsing the Internet
The Sinope Summarizer Personal Edition integrates with Microsoft Internet Explorer and enables users to summarize Web pages while browsing the Internet. It understands English, German and Dutch texts (more languages will be supported in the near future). Furthermore, the tool is provided to summarize saved html and plain text files, and a Clipboard Summarizer to summarize the contents of the Windows clipboard.
Generating summaries is as easy as dragging a slider!
The Sinope Summarizer gives the user complete control over the summary length. Generating and viewing a summary is as easy as dragging a slider!
Summarist, The software produces excerpts from texts
http://www.isi.edu/natural-language/projects/SUMMARIST.html
SUMMARIST is an attempt to develop robust extraction technology as far as it can go and then continue research and development of techniques to perform abstraction. This work faces the depth vs. robustness tradeoff: either systems analyze/interpret the input deeply enough to producegood summaries (but are limited to small application domains), or they work robustly over more or less unrestricted text (but cannot analyze deeply enough to fuse the input into a true summary, and hence perform only topic extraction). In particular, symbolic techniques, using parsers, grammars, and semantic representations, do not scale up to real-world size, while Information Retrieval and other statistical techniques, being based on word counting and word clustering, cannot create true summaries because they operate at the word (surface) level instead of at the concept level.
To date, SUMMARIST produces extract summaries in five languages and has been linked to translation engines for these languages in the MuST system at http://www.isi.edu/~cyl/must/must_beta.htm. Work is underway both to extend the extract-based capabilities of SUMMARIST and to build up the large knowledge collection required for inference-based abstraction. The project members includes: Eduard Hovy, senior project leader at http://www.isi.edu/natural-language/people/hovy.html; Chin-Yew Lin, research scientist at http://www.isi.edu/~cyl; and Daniel Marcu, research scientist at http://www.isi.edu/~marcu/.
TextAnalyst, Text Mining system for automatic indexing and Abstracting
http://www.megaputer.com/products/ta/index.php3
TextAnalyst 2.0, first delivered in the beginning of 1999 by Megaputer Intellence Inc., is unique software for automated semantic analysis of natural language texts. The system helps the user quickly summarize, efficiently navigate, and cluster documents in a textbase, as well as perform semantic information retrieval. TextAnalyst, a unique software tool for semantic analysis, navigation, and search of unstructured texts, can successfully tackle these and many other tasks.
Download TextAnalyst presentation and brochure from http://www.megaputer.com/down/tm/Text_Mining.pps and http://www.megaputer.com/down/tm/ta/docs/textanalyst_brochure.pdf respectively. The TextAnalyst tutorial is available at http://www.megaputer.com/products/ta/tutorial/ta_tutorial.zip. Download free software evaluations at http://www.megaputer.com/php/eval.php3.
TexNet32
http://instruct.uwo.ca/gplis/677/texnet32/texnet32.htm
It is a Freeware Software for the semi automatic production of Abstracts by Professor Tim Craven. It assists in the writing of abstracts and other short summaries including word and phrase extraction and various other capabilities.
TexNet32 is a 32-bit version of the TexNetF text network management system. Like TexNetF, it provides users with special tools designed to assist in writing conventional abstracts. The model of a hybrid abstracting system in which some tasks are performed by human abstractors and others by software seems to deliver the best results at this stage of technology development. TexNet32 generally uses typical Windows 95 interface elements, supporting keyboard and mouse, menus, and some accelerator keys.
The TexNet32 main window contains a menu bar and other windows that belong to the program: Full text, Parameters, Paragraph weights, Ancillary lists, Words in full text, Extract, Notes, and Abstract. Some of these are initially minimized. None can be closed before the main window is closed; if you attempt to close any of them, it will just be minimized. Contents of the menu bar and its pull-down menus vary with the kind of window that is active. You cannot close either of the minimum two "Editing" windows except by ending the session.
The currently active window is identified by the colour of its caption bar. To activate a window, click on it or select from the "Window" menu. The sizes of windows can be adjusted by the usual Windows 95 operations. Note that all operations are performed for the currently active window! (This is expecially important to remember when opening a source text). TexNet32 Recent Updates is available at http://instruct.uwo.ca/gplis/677/texnet32/texnetup.htm. Download it from http://instruct.uwo.ca/gplis/677/texnet32/texnet33.exe.
Contact: Prof Tim Craven at craven@uwo.ca or visit http://publish.uwo.ca/~craven/index.htm for more information.
ViewSum
http://www.viewsum.com
ViewSum is a text summarization tool that can provide a personalized summary of any document. Depending on your needs it can summarize the document by any amount - even to a single sentence or set of keywords. Key advantages over many other summarizers are that ViewSum will take account of your specified interests and preferences when generating a summary, leading to results tailored to your personal needs, and summaries are made from complete sentences.
ViewSum supports drag and drop of over 200 different document formats, and can be integrated into leading applications, such as Microsoft Word, Outlook and Internet Explorer. A Quick Help guide giving an overview of ViewSum is available at http://193.113.58.107/ViewSum/overview.htm.
Zentext Summarizer
http://www.zentext.com/z_product_summarizer.html
Zentext Summarizer Lite allows you to summarize large amounts of text instantly and intelligently free of cost. One can try this online at http://www.zentext.com/z_product_summarizer.html by simply pasting the text to be summarized and speifying the number of sentences required in the summary output. The service also hosts a summarizer utility, very small in size, can be downloaded from http://www.zentext.com/summarizer/summarizer.exe.
To run this utility, the computer should have Java Virtual Machine installed. This utility reframes the sentence into some long sentences and provides the summary in the true sense.
DUBLINCORE ELEMENTS
D.C Title: Automatic Abstracting & Summarizing Tools
D.C Creator: Vimal Kumar Varun
D.C Subject: Information overload, Summarization, Abstracting, Cross-document summarization, Automatic abstracting tools, summarizing tools
D.C Description: Describes automatic abstracting & summarizing tools like Brevity Document Summarizer, Copernic Summarizer, Extractor, HyperGen Summarization Tool, Intelligent Miner for Text: Summarization Tool, Inxight Summarizer, MultiGen, Pertinence Automatic Summariser, Sinope Summarisers, Summarist, TextAnalyst, TextNet32, ViewSum. Zentext Summarizer
D.C Publisher: Information Today & Tomorrow
D.C Contributor
D.C Date: 2002-06
D.C Type: article
D.C Format: html
D.C Identifier : http://itt.nissat.tripod.com/itt0202/ruoi0202.htm
D.C Source
D.C Language: en
D.C Relation
D.C Coverage
D.C Rights: Information Today & Tomorrow
Inscription à :
Articles (Atom)