This page is where I keep track of ideas I have for theses and honours projects. If you think you would like to work on any of these ideas then please contact me. For honours projects I require that you contact me (at the latest) at the beginning of the term before you intend to register for your project. For example, if you want to enroll in your honours project in January you should come to see me in September. A good honours project takes a fair bit of background research as well as implementation and writing. This is more than most students can do in 3.5 months.
In a recent paper, my coauthor and I argue that there are faster alternatives to binary search, but the fastest method requires that the data in the array be stored in a special order (not sorted order) called the Eytzinger order. I would like to build a small C++ library around this idea that would have the following functionality:
Of course, some thought needs to go into the design of the API and some work is needed to implement in-place algorithms for the first two functions. (These can be done with repeated calls to in-place in-shuffle and out-shuffle algorithms, but these are non-trivial algorithms.)
Data structures can be hard to teach and to learn. By their nature, they're dynamic so are best taught by watching operations take place on them. I'm interested in the creation of short, high-quality videos, making use of ray-traced animation, that explain some advanced data structures. As a concrete example, a good computer animated video with voice over could explain most of x-fast tries and y-fast tries in 5 minutes.
I'm interested in compelling and well-made games that can be used to help teach basic mathematics ot high-school and even university level students. Some such games (for algebra) are discussed in these posts on gamifying algebra (I and II) and illusrated in this simple game.
As a first step, I would be interested in an improved version of the above game with a more graphical, and usable interface. I would like to see things like dragging a variable/value from the left side to the right, using a mouse wheel or scrollbar to scale [multiply and divide equations by a constant], using drag-and-drop to combine pairs of equations, and so on. All the while, the game would do the tedious work of keeping track of what manipulations have been done, so that this transcript acts as a proof.
As a second step, I would like to extend this kind of game to working with inequalities, where the goal is to prove statements like: Show that x2 + 2 ≤ x + 1 for all x.
or show that x+1 ≤ 2x for all x ≥ 1.
High-dynamic range (HDR) imaging is way of dealing with the fact that digital cameras can record image intensity data much more accurately than computer monitors and printers can reproduce them. In fact, non-photorealistic HDR even makes up for the fact that digital cameras can record intensity information even more reliably than we can view it.
At the heart of HDR is the tonemapping process that compresses a huge (16 bits or more) intensity space into a small (8 bit) intensity space. Several tonemapping algorithms have been proposed and give different kinds of results. However, most existing implementations of these algorithms are not real-time. You select some parameters, hit the tonemap button and wait seconds, minutes, or hours, for the result. After which you may need to adjust the parameters and start again.
The goal of this project is to take one or more tonemapping algorithms and implement them as a library (with prototype application) that is suitable for real-time previewing and very fast output. The techniques used for this will probably include multiresolution image processing.
When recreational and comercial divers are at depth, they are exposed
to compressed breathing gas mixtures. During this exposure, the inert
gases in these mixtures (usually nitrogen and/or helium) are absorbed
into the divers tissues. These divers make careful and slow ascents
that include decompression stops
so that these dissolved gases
are slowly released from their tissues
and expelled through the lungs. Ascending too quickly can lead to
a condition dubbed the bends
in which bubbles form in the
divers tissues and/or bloodstream that can
cause fatigue, pain, damage to the central nervous system, and even
death.
There are many pieces of desktop software used to plan dives and provide decompression schedules to avoid the bends. Many divers also carry a dive computer with an electronic depth gauge that models, in real time, the amount of dissolved gases in their tissues and gives a real-time decompression schedule. Unfortunately, all these computers either use proprietary decompression models or use a closed source implementation of a published decompression model. Often, comparing different computers that claim to implement the same decompression model will expose large variations in the resulting decompression schedules.
Given the possible consequences (including pain, permanent injury, paralysis, and even death) of bugs in the implementations of decompression algorithms the current situation is unacceptable. The goal of this project is to provide Open Source implementations of some of the more popular decompression models that are suitable for use on a dive computer. This means the implementations must handle real-time data, work with limited processing power and, above all, be correct and bug-free.
The HTML DOM describes how an (X)HTML/XML document can be represented
as a tree and gives a mapping of the parts of this tree to objects.
The implementation of the HTML DOM in JavaScript is what makes dynamic
web pages possible. One of the most common operations in the HTML DOM
is to find all elements in the document with a given ID, Name, or Tag
Name (document.getElementById()
,
document.getElementsByName()
,
document.getElementsByTagName()
).
Spend a bit of time experimenting with this script and you will soon find that these methods work by traversing the entire tree to find the node(s) being searched for. There are much more efficient ways of doing this (hashtables come to mind), and JavaScript is such a flexible language that you can easily modify the DOM implementation to use these instead.
This project would involve designing and implementing an efficient data structure that is integrated with the JavaScript DOM implementation that allows these operations to be executed efficiently. This project requires, at the very least, an advanced knowledge of data structure design techniques and strong programming ability.
A project in this category requires the robust implementation of one or more algorithms (subject to my approval) and a thorough battery of performance and correctness tests. If you are interested in implementing an algorithm or two, have a look at my list of publications and see if there's something there that interests you.
Cache-oblivious algorithms are algorithms that work in external memory (on disk, say) without knowing any of the parameters of the external memory (block size, say). Many algorithms and data structures for the cache-oblivious model have been proposed, but very little effort has been made to experimentally compare these algorithms with each other, or with cache-aware algorithms.
The goal of this project is to design a little C++ library for the implementation of cache-oblivious algorithms. The main part of this library would be an array implementation that transparently reads and writes parts of itself to disk when necessary, so that it only ever occupies a small part of internal memory. The library would be tested and tweaked by implementing several cache-oblivious algorithms and data structures using this array abstraction.
A profiler is a system that monitors a program to see how long the that program spends in each of its subroutines. Profilers are used for finding frequently executed sections of code (hot spots) so that they can be optimized. Profilers work either by stopping the program at some random times and recording the current instruction pointer (subroutine) or by inserting code into the program so that each invocation of a subroutine is recorded. In both cases, all this data is stored somewhere and post-processed to come up with the program's "profile."
The problem with this approach is that profilers require some place to store all this data. This means that the machine the program is running on must either have a lot of memory or a hard disk. This makes profiling embedded software very difficult. It also means that profiling a program may change its behaviour since the program being profiled has less available memory or is frequently halted while the profiler writes data to disk.
For this project I would like to treat a profiler as a program that processes a data stream consisting of the subroutines (or even individual lines of code) that are executed. Recently, many researchers have proposed algorithms for finding frequently occuring items in data streams and these algorithms require only a small amount of memory that is independent of the length of the data stream. As an example, it is possible to find all items in a data stream that occur more than 10% of the time by using only 10 counters and labels. Using these algorithms we could develop a profiler that does not require much (internal or external) memory or time.
P. Bose, E. Kranakis, P. Morin, and Y. Tang.3-dimensional scanners (laser range finders and coordinate measuring machines) can measure an object by taking a sample of points on its surface. These points are usually then fed into some reconstruction software that attempts reconstructs the surface of the object from this point data.
For the most part, this software works well, particularly when the object is smooth (like a ball). However, where these algorithms tend to break down is in maintaining sharp corners. For example, the two images below show a perfect cube and a reconstruction of a cube from a sample of 6000 points on the surface of a perfect cube. Notice that in the second image, the corners are not quite as sharp.
The goal of this project is to develop efficient and accurate algorithms to detect corners in 3d-point data and perform reconstructions that preserve these corners. Efficiency is as important as accuracy here because the resolution of 3d scanners is becoming such that the number of sample points is huge.
Data depth is a field of statistics where one tries to assign, for
each point in Rd a
Once a depth function is defined then a number of computational questions arise:
There are many definitions of depth for which the answers to these three questions are not yet known. In other cases, efficient algorithms have been proposed, but there are still no implementations of these algorithms.
The goal of this project is to develop efficient algorithms for solving depth problems in 2 dimensions and to implement these algorithms in a data depth library.
A prefix tree is a binary search tree in which each node stores the
size the of the subtree rooted at that node. Using any of the usual
balancing schemes, prefix trees support the
usual binary search tree operations of Insert
and
Delete
, and Search
in O(log
n) time each. In addition, they also support the operation
At
(i) that finds the element with rank i in
the same time bound.
Many search tree implementations such as red-black trees, and 2-4
trees have the nice property that updates only perform a constant
number of changes to pointers in the data structure. Unfortunately,
prefix trees don't have this nice property since each insertion
requires incrementing the sizes of the subtrees on a root-to-leaf
path. Is it possible to support the prefix tree operations of
Insert
, Delete
, Search
, and
At
with a data structure that only performs a constant
(amortized) number of pointer changes per update operation?
Finding such a data structure could shave a logarithmic factor from the space complexity of the range-median data structure described in the following paper:
D. Krizanc, P. Morin, and M. Smid.