Exploiting E-mail structure to improve summarization
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2005
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/16846 |
_version_ | 1826198876827156480 |
---|---|
author | Lam, Derek Scott, 1979- |
author2 | Steven L. Rohall and Chris Schmandt. |
author_facet | Steven L. Rohall and Chris Schmandt. Lam, Derek Scott, 1979- |
author_sort | Lam, Derek Scott, 1979- |
collection | MIT |
description | Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. |
first_indexed | 2024-09-23T11:11:23Z |
format | Thesis |
id | mit-1721.1/16846 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T11:11:23Z |
publishDate | 2005 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/168462019-04-12T17:09:41Z Exploiting E-mail structure to improve summarization Lam, Derek Scott, 1979- Steven L. Rohall and Chris Schmandt. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. Includes bibliographical references (p. 77-81). This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. For this thesis, I designed and implemented a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains and commonly-found features, to generate summaries. The system uses existing software designed to summarize single text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software typically gives a poor summary of e-mail messages. To remedy this poor performance, the system's approach preprocesses e-mail messages to synthesize new input to this software, so that it will output more useful summaries of e-mail. This pre-processing involves a lightweight, heuristics-based approach to filtering e-mail to remove e-mail signatures, header fields, and quoted parent messages. I also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, I discuss conclusions from a pilot user study of my summarization system, and conclude with areas for further investigation. by Derek Scott Lam. M.Eng. 2005-05-19T15:00:27Z 2005-05-19T15:00:27Z 2002 2002 Thesis http://hdl.handle.net/1721.1/16846 51479527 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 81 p. 310153 bytes 309910 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Lam, Derek Scott, 1979- Exploiting E-mail structure to improve summarization |
title | Exploiting E-mail structure to improve summarization |
title_full | Exploiting E-mail structure to improve summarization |
title_fullStr | Exploiting E-mail structure to improve summarization |
title_full_unstemmed | Exploiting E-mail structure to improve summarization |
title_short | Exploiting E-mail structure to improve summarization |
title_sort | exploiting e mail structure to improve summarization |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/16846 |
work_keys_str_mv | AT lamderekscott1979 exploitingemailstructuretoimprovesummarization |