Document Analysis with AVL Trees - Demo Applet

The AVL tree is a binary search tree with a balance condition, which requires, for every node, the height of its left and right child can differ no more than 1. Therefore, the balance condition ensures that the AVL tree is maintained as a complete tree after every operation and guarantees O(logN) complexity of running time for searching.

This applet simply demonstrates how two documents can be analyzed in terms of similarity or difference, by using the AVL tree data structure. Indeed, many stand-alone applications or embedded analysis engines have been developed to compare multiple documents and identify their similarity or difference. In this applet, the document analysis is done by using the characteristic sets, such as A B,  A U B, A - B, B - A where A and B are the sets of the words in the documents A and B)

Notes on source codes: The implementation is not dedicated to AVL Tree data structure itself. Instead, the AVL tree class is embodied in the java applet class with some other optional methods for the application. Also, note that the source codes is not efficiently written in terms of complexity of running time, and a DELETE method is omitted since this particular application doesn't need one.   view source code (to download, just right-click on the link and save it as .java file)

Command Syntax :

RESETDELAY number,  INSERT word number,  FIND word BEGIN-DOCUMENT-A document, END-DOCUMENT-A,

BEGIN-DOCUMENT-B
document, END-DOCUMENT, SIMPLE-COMPARE

NOTE : All commands should be capitalized, and each command and parameter should be separated by a single space.

Execute the following commands to test INSERT, FIND, and AVL tree rotations.
(COPY and PASTE the whole series of commands into the Text Area. Then, click on the Execute button)

Execute the following commands to test AVL trees of two documents, the elimination of HTML tag and nonalphabetic characters, and comparison of two documents.


by Hyung-Joon Kim, CSE 373, University of Washington