Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees

A binary search tree can be used to store data in a computer system for retrieval by name. Different elements in the tree may be referenced with different probabilities. If we define the cost of the tree as the average number of elements which must be examined in searching for an element, then diffe...

Full description

Bibliographic Details
Main Author: Bayer, Paul J.
Other Authors: Rivest, Ronald L.
Published: 2023
Online Access:https://hdl.handle.net/1721.1/148897
_version_ 1811097344450494464
author Bayer, Paul J.
author2 Rivest, Ronald L.
author_facet Rivest, Ronald L.
Bayer, Paul J.
author_sort Bayer, Paul J.
collection MIT
description A binary search tree can be used to store data in a computer system for retrieval by name. Different elements in the tree may be referenced with different probabilities. If we define the cost of the tree as the average number of elements which must be examined in searching for an element, then different trees have different costs. We show that two particular types of trees, weight balanced trees and min-max trees, which are easily constructed from the probability distribution on the elements, are close to optimal. Specifically, we show that for any probability distribution with entropy H, H-log2H-(log2e-1)<=Copt<= Cwb ,+ H+2/Cmm,+H+2 where Copt, Cwb, and Cmm are the optimal, weigh balances, and min-max costs. We gain some added insight by deriving an expression for the expected value of the entropy of a random probability distribution.
first_indexed 2024-09-23T16:58:06Z
id mit-1721.1/148897
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T16:58:06Z
publishDate 2023
record_format dspace
spelling mit-1721.1/1488972023-03-30T03:02:03Z Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees Bayer, Paul J. Rivest, Ronald L. A binary search tree can be used to store data in a computer system for retrieval by name. Different elements in the tree may be referenced with different probabilities. If we define the cost of the tree as the average number of elements which must be examined in searching for an element, then different trees have different costs. We show that two particular types of trees, weight balanced trees and min-max trees, which are easily constructed from the probability distribution on the elements, are close to optimal. Specifically, we show that for any probability distribution with entropy H, H-log2H-(log2e-1)<=Copt<= Cwb ,+ H+2/Cmm,+H+2 where Copt, Cwb, and Cmm are the optimal, weigh balances, and min-max costs. We gain some added insight by deriving an expression for the expected value of the entropy of a random probability distribution. 2023-03-29T14:06:24Z 2023-03-29T14:06:24Z 1975-11 https://hdl.handle.net/1721.1/148897 02191962 MIT-LCS-TM-069 MAC-TM-69 application/pdf
spellingShingle Bayer, Paul J.
Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title_full Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title_fullStr Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title_full_unstemmed Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title_short Improved Bounds on the Costs of Optimal and Balanced Binary Search Trees
title_sort improved bounds on the costs of optimal and balanced binary search trees
url https://hdl.handle.net/1721.1/148897
work_keys_str_mv AT bayerpaulj improvedboundsonthecostsofoptimalandbalancedbinarysearchtrees