Publications

	Sparse Sums of Positive Semidefinite Matrices	2011/07
	Marcel K. de Carli Silva, Nicholas J. A. Harvey, Cristiane M. Sato. arXiv, July 2011. PDF. Recently there has been much interest in "sparsifying" sums of rank-one matrices: modifying the coefficients such that only a few are nonzero, while approximately preserving the matrix that results from the sum. Results of this sort have found applications in many different areas, including sparsifying graphs. In this paper we consider the more general problem of sparsifying sums of positive semidefinite matrices. We give several algorithms for solving this problem and describe several applications of these algorithms.
	Keywords: Graph Algorithms, Sparsifiers, SDPs, Sparse Solutions, Multiplicative Weights Method.

Learning Submodular Functions

2011/06

Maria-Florina Balcan, Nicholas J. A. Harvey.
43rd Annual ACM Symposium on Theory of Computing (STOC 11), San Jose, CA, June 2011. PDF.
arXiv, August 2010. PDF.

There has recently been significant interest in the machine learning community on understanding and using submodular functions. Despite this recent interest, little is known about submodular functions from a learning theory perspective. Motivated by applications such as pricing goods in economics, this paper considers PAC-style learning of submodular functions in a distributional setting.

A problem instance consists of a distribution on {0,1}ⁿ and a real-valued function on {0,1}ⁿ that is non-negative, monotone and submodular. We are given poly(n) samples from this distribution, along with the values of the function at those sample points. The task is to approximate the value of the function to within a multiplicative factor at subsequent sample points drawn from the same distribution, with sufficiently high probability. We prove several results for this problem.

If the function is Lipschitz and the distribution is a product distribution, such as the uniform distribution, then a good approximation is possible: there is an algorithm that approximates the function to within a factor O(log (1/ε)) on a set of measure 1-ε, for any ε>0.
If we do not assume that the distribution is a product distribution, then the approximation factor must be much worse: no algorithm can approximate the function to within a factor of O(n^1/3/log n) on a set of measure 1/2+ε, for any constant ε>0. This holds even if the function is Lipschitz.
On the other hand, this negative result is nearly tight: for an arbitrary distribution, there is an algorithm that approximates the function to within a factor of n^1/2 on a set of measure 1-ε.

Our work combines central issues in optimization (submodular functions and matroids) with central topics in learning (distributional learning and PAC-style analyses) and with central concepts in pseudorandomness (lossless expander graphs). Our analysis involves a twist on the usual learning theory models and uncovers some interesting structural and extremal properties of submodular functions, which we suspect are likely to be useful in other contexts. In particular, to prove our general lower bound, we use lossless expanders to construct a new family of matroids which can take wildly varying rank values on superpolynomially many sets; no such construction was previously known.

Keywords: Learning Theory, Submodular Functions, Matroids

Notes: In this paper we use Talagrand's inequality to prove a concentration inequality for Lipschitz, monotone submodular functions. We were unaware of prior work proving concentration for subadditive functions by Schechtman in 2003, using Talagrand's inequality, and by Hajiaghayi et al. in 2005, using self-bounding functions.

In 2009, Chekuri and Vondrak also proved a concentration inequality for Lipschitz, monotone submodular functions, using the FKG inequality. Their inequality achieves Chernoff-type concentration, which is somewhat better than the Talagrand-like concentration that we achieve. An earlier version of our paper only proved concentration for matroid rank functions. After learning of the work of Chekuri and Vondrak, we improved our inequality to its current form, in which it applies to arbitrary Lipschitz, monotone, submodular functions.

Perhaps the nicest way to prove such inequalities is using self-bounding functions. In particular, this approach yields concentration for non-monotone functions; it is not clear whether the FKG or Talagrand approaches do too. See the note by Vondrak from 2010.

	Graph Sparsification by Edge-Connectivity and Random Spanning Trees	2011/06
	Wai Shing Fung, Ramesh Hariharan, Nicholas J. A. Harvey, Debmalya Panigrahi. 43rd Annual ACM Symposium on Theory of Computing (STOC 11), San Jose, CA, June 2011. PDF. arXiv, May 2010. PDF. We present two new approaches to constructing graph sparsifiers -- weighted subgraphs for which every cut has the same value as the original graph, up to a factor of (1 ± ε). The first approach independently samples each edge uv with probability inversely proportional to the edge-connectivity between u and v. The fact that this approach produces a sparsifier resolves an open question of Benczur and Karger. Concurrent work of Hariharan and Panigrahi (2010) also resolves this question. The second approach samples uniformly random spanning trees. Both of our approaches produce sparsifiers with O(n log²(n)/ε²) edges. Our proofs are based on extensions of the Karger-Stein contraction algorithm which allow it to compute minimum "Steiner cuts".
	Keywords: Graph Algorithms, Random Sampling, Contraction Algorithm, Splitting Off, Random Walks

Notes: Many of the results in this paper were independently discovered by Fung & Harvey and by Hariharan & Panigrahi. Actually Hariharan & Panigrahi had many more algorithmic results than we did, including the linear time algorithm for unweighted graphs. They graciously offered to merge the papers for a joint STOC submission.

	On disjoint common bases in two matroids	2011/00
	Nicholas J. A. Harvey, Tamas Kiraly, Lap Chi Lau. Accepted to SIAM Journal on Discrete Mathematics. PDF. We prove two results on packing common bases of two matroids. First, we show that the computational problem of common base packing reduces to the special case where one of the matroids is a partition matroid. Second, we give a counterexample to a conjecture of Chow, which proposed a sufficient condition for the existence of a common base packing. Chow's conjecture is a generalization of Rota's basis conjecture.
	Keywords: Matroid Intersection, Common Bases, Packing, Dijoins, Rota's conjecture

Approximating Submodular Functions Everywhere

2009/01

Michel X. Goemans, Nicholas J. A. Harvey, Satoru Iwata, Vahab Mirrokni.
ACM-SIAM Symposium on Discrete Algorithms (SODA 09), New York, NY, January 2009. PDF Slides.

Submodular functions are a key concept in combinatorial optimization. Algorithms that involve submodular functions usually assume that they are given by a (value) oracle. Many interesting problems involving submodular functions can be solved using only polynomially many queries to the oracle, e.g., exact minimization or approximate maximization.

In this paper, we consider the problem of approximating a monotone submodular function f on a ground set of size n everywhere, after only poly(n) oracle queries. Our main result is a deterministic algorithm that makes poly(n) oracle queries and derives a function g such that, for every set S, g(S) approximates f(S) within a factor α(n), where α(n)=sqrt(n+1) for rank functions of matroids and α(n)=O(sqrt(n) log n) for general monotone submodular functions. Our result is based on approximately finding a maximum volume inscribed ellipsoid in a symmetrized polymatroid, and the analysis involves various properties of submodular functions and polymatroids.

Our algorithm is tight up to logarithmic factors. Indeed, we show that no algorithm can achieve a factor better than Ω(sqrt(n) / log n), even for rank functions of a matroid.

Keywords: Submodular Functions, Approximation, Maximum Volume Ellipsoids

	On the Complexity of Reconfiguration Problems	2008/12
	Takehiro Ito, Erik D. Demaine, Nicholas J. A. Harvey, Christos Papadimitriou, Martha Sideri, Ryuhei Uehara, Yushi Uno. 19th International Symposium on Algorithms and Computation (ISAAC 2008), Gold Coast, Australia, December 2008. PDF. Theoretical Computer Science, March 2011. PDF. Reconfiguration problems arise when we wish to find a step-by-step transformation between two feasible solutions of a problem such that all intermediate results are also feasible. We demonstrate that a host of reconfiguration problems derived from NP-complete problems are PSPACE-complete, while some are also NP-hard to approximate. In contrast, several reconfiguration versions of problems in P are solvable in polynomial time.
	Keywords: Reconfiguration, PSPACE complete

	Sketching and Streaming Entropy via Approximation Theory	2008/10
	Nicholas J. A. Harvey, Jelani Nelson, Krzysztof Onak. IEEE Information Theory Workshop (ITW 08), Porto, Portugal, May 2008. PDF Slides. IEEE Symposium on Foundations of Computer Science (FOCS 08), Philadelphia, PA, October 2008. PDF. We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-optimal bounds in the insertion-only model without sketching. Our high-level approach is simple: we give algorithms to estimate Rényi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. The accuracy of our estimates is proven using approximation theory arguments and extremal properties of Chebyshev polynomials, a technique which may be useful for other problems. Our work also yields the best-known and near-optimal additive approximations for entropy, and hence also for conditional entropy and mutual information.
	Keywords: Streaming Algorithms, Shannon Entropy, Renyi Entropy, Stable Distributions, Chebyshev Polynomials

Matroid Intersection, Pointer Chasing, and Young's Seminormal Representation of S_n

2008/01

Nicholas J. A. Harvey.
ACM-SIAM Symposium on Discrete Algorithms (SODA 08), San Francisco, CA, January 2008. PDF Slides.

We consider the number of queries needed to solve the matroid intersection problem, a question raised by Welsh (1976). Given two matroids of rank r on n elements, it is known that O(nr^1.5) independence queries suffice. However, no non-trivial lower bounds are known for this problem.

We make the first progress on this question. We describe a family of instances of rank r=n/2 based on a pointer chasing problem, and prove that (log₂ 3) n - o(n) queries are necessary to solve these instances. This gives a constant factor improvement over the trivial lower bound of n for matroids of this rank.

Our proof uses methods from communication complexity and group representation theory. We analyze the communication matrix by viewing it as an operator in the group algebra of the symmetric group and explicitly computing its spectrum.

Keywords: Matroid Intersection, Lower Bound, Communication Complexity, Group Representations, Symmetric Group, Jucys-Murphy Elements

Notes: The abstract is somewhat misleading. For matroids of rank 1, there is a trivial lower bound of 2n-2 queries. Another trivial argument gives a lower bound of n queries for matroids of rank n/2. The matroids in this paper also have rank n/2, but our lower bound is roughly 1.58n queries.

	Iteratively Constructing Preconditioners via the Conjugate Gradient Method	2007/06
	John Dunagan, Nicholas J. A. Harvey. 39th Annual ACM Symposium on Theory of Computing (STOC 2007), San Diego, CA, June 2007. PDF PS. We consider the problem of solving a symmetric, positive definite system of linear equations. The most well-known and widely-used method for solving such systems is the preconditioned Conjugate Gradient method. The performance of this method depends crucially on knowing a good preconditioner matrix. We show that the Conjugate Gradient method itself can produce good preconditioners as a by-product. These preconditioners allow us to derive new asymptotic bounds on the time to solve multiple related linear systems.
	Keywords: Conjugate Gradient Method, Preconditioning

	A "Chicken and Egg" Network Coding Problem	2007/06
	Nicholas J. A. Harvey, Robert D. Kleinberg, Chandra Nair, Yunnan Wu. IEEE International Symposium on Information Theory (ISIT 2007), Nice, France, June 2007. PDF. We consider the multi-source network coding problem in cyclic networks. This problem involves several difficulties not found in acyclic networks, due to additional causality requirements. This paper highlights the difficulty of these causality conditions by analyzing two example cyclic networks. The networks appear quite similar at first glance, and indeed both have invalid rate-1 network codes that violate causality. However, the two networks are actually quite different: one also has a valid rate-1 network code obeying causality, whereas the other does not. This unachievability result is proven by a new information inequality for causal coding schemes in a simple cyclic network.
	Keywords: Network Coding, Causality, Rate, Information Inequalities

	Non-Adaptive Fault Diagnosis for All-Optical Networks via Combinatorial Group Testing on Graphs	2007/05
	Nicholas J. A. Harvey, Mihai Patrascu, Yonggang Wen, Sergey Yekhanin, Vincent Chan. 26th Annual IEEE Conference on Communications (INFOCOM), Anchorage, Alaska, May 2007. PDF. We consider the problem of detecting network faults. Our focus is on detection schemes that send probes both proactively and non-adaptively. Such schemes are particularly relevant to all-optical networks, due to these networks' operational characteristics and strict performance requirements. This fault diagnosis problem motivates a new technical framework that we introduce: group testing with graph-based constraints. Using this framework, we develop several new probing schemes to detect network faults. The efficiency of our schemes often depends on the network topology; in many cases we can show that our schemes are optimal or near-optimal by providing tight lower bounds.
	Keywords: Networks, Faults, Failures, Detection, Group Testing, Superimposed Codes, All-Optical

	An Algebraic Algorithm for Weighted Linear Matroid Intersection	2007/01
	Nicholas J. A. Harvey. ACM-SIAM Symposium on Discrete Algorithms (SODA 07), New Orleans, LA, January 2007. PDF Slides. We present a new algebraic algorithm for the classical problem of weighted matroid intersection. This problem generalizes numerous well-known problems, such as bipartite matching, network flow, etc. Our algorithm has running time O(nr^{w-1} W^{1+eps}) for linear matroids with n elements and rank r, where w is the matrix multiplication exponent, and W denotes the maximum weight of any element. This algorithm is the fastest known when W is small. Our approach builds on the recent work of Sankowski (2006) for Weighted Bipartite Matching and Harvey (2006) for Unweighted Linear Matroid Intersection.
	Keywords: Matroid Intersection, Algebraic Algorithms, Fast Matrix Multiplication

	Algebraic Algorithms for Matching and Matroid Problems	2006/10
	Nicholas J. A. Harvey. IEEE Symposium on Foundations of Computer Science (FOCS 06), Berkeley, CA, October 2006. Received the Best Student Paper (Machtey) Award. PDF PS Slides. SIAM Journal on Computing, 39(2):679-702, 2009. PDF. We present new algebraic approaches for several well-known combinatorial problems, including non-bipartite matching, matroid intersection, and some of their generalizations. Our work yields new randomized algorithms that are the most efficient known. For non-bipartite matching, we obtain a simple, purely algebraic algorithm with running time O(n^ω) where n is the number of vertices and ω is the matrix multiplication exponent. This resolves the central open problem of Mucha and Sankowski (FOCS 2004). For matroid intersection, our algorithm has running time O(nr^ω-1) for matroids with n elements and rank r that satisfy some natural conditions.
	Keywords: Non-bipartite Matching, Matroid Intersection, Algebraic Algorithms, Fast Matrix Multiplication

Notes: I recommend reading the journal version of this paper, not the conference version. The algorithms in the journal version are different and simpler. The conference paper sketched algorithms for the basic path-matching and independent matching problems. To keep the journal paper as simple as possible, these are not discussed there. The conference papers also claims that graphic matroid intersection can be solved in O(n^2.575) time. When asked to provide the details, I could not reproduce them. Most likely I was mistaken, so I retract the claim.

	Conservative Network Coding	2006/09
	Nicholas J. A. Harvey, Kamal Jain, Lap Chi Lau, Chandra Nair, Yunnan Wu. 44th Annual Allerton Conference on Communication, Control, and Computing (Allerton 2006), Monticello, IL, September 2006. PDF Slides. Motivated by practical networking scenarios, we introduce a notion of restricted communication called "conservative networking". Consider a network of lossless links and a number of independent sources. Each node needs to recover a certain subset of the sources. However, each node is "conservative" in that all information it receives can only be a function of the sources it will ultimately recover. For acyclic networks, we show that conservative networking admits a clean characterization: (i) The rates achievable by integer routing, factional routing, and network coding are equal, and (ii) this rate is determined by a simple cut bound. However, this clean characterization does not extend to cyclic networks. We present cyclic examples showing that (i) fractional routing can be strictly better than integer routing, (ii) network coding can be strictly better than fractional routing, and (iii) the cut bound cannot be achieved in general. This work underscores the difficulties generally encountered in cyclic networks.
	Keywords: Network Coding, Capacity, Rate, Cuts

Methods for Efficient Network Coding

2006/09

Petar Maymounkov, Nicholas J. A. Harvey, Desmond S. Lun.
44th Annual Allerton Conference on Communication, Control, and Computing (Allerton 2006), Monticello, IL, September 2006. PDF Slides.

Random linear network coding is a capacity-achieving communication scheme for single sessions over lossy wireline or wireless packet networks. From the point of view of efficiently utilizing transmissions, random linear network coding is an attractive strategy. However, it is not presently attractive from the point of view of efficiently utilizing computational resources. A natural code design question arises: can we design a network code that preserves the communication efficiency of a random linear code and achieves better computational efficiency? In this paper, we give an affirmative answer to this question.

The code that we present as our answer achieves significantly better computational efficiency that existing schemes, despite being based primarily on standard techniques in the literature on erasure codes. We define and use the language of "adversarial schedules" to compare the performance of our coding scheme against other schemes with respect to a broad class of network environments in a concise manner.

Keywords: Network Coding, Rateless Codes, Precodes

	Lower Bounds for Asymmetric Communication Channels and Distributed Source Coding	2006/01
	Micah Adler, Erik D. Demaine, Nicholas J. A. Harvey, Mihai Patrascu. ACM-SIAM Symposium on Discrete Algorithms (SODA 06), Miami, FL, January 2006. PDF PS. We prove nearly tight lower bounds on the number of rounds of communication required by efficient protocols over asymmetric channels between a server (with high sending capacity) and one or more clients (with low sending capacity). This scenario captures the common asymmetric communication bandwidth between broadband Internet providers and home users, as well as sensor networks where sensors (clients) have limited capacity because of the high power requirements for long-range transmissions. An efficient protocol in this setting communicates n bits from each of the k clients to the server, where the clients' bits are sampled from a joint distribution D that is known to the server but not the clients, with the clients sending only O(H(D)+k) bits total, where H(D) is the entropy of distribution D. In the single-client case, there are efficient protocols using O(1) rounds in expectation and O(lg n) rounds in the worst case. We prove that this is essentially best possible: with probability 1/2^O(t lg t), any efficient protocol can be forced to use t rounds. In the multi-client case, there are efficient protocols using O(lg k) rounds in expectation. We prove that this is essentially best possible: with probability Omega(1), any efficient protocol can be forced to use Omega(lg k / lg lg k) rounds. Along the way, we develop new techniques of independent interest for proving lower bounds in communication complexity.
	Keywords: Sensor Networks, Asymmetric, Lower Bound, Communication Complexity, Round Elimination

	On the Capacity of Information Networks	2006/01
	Micah Adler, Nicholas J. A. Harvey, Kamal Jain, Robert D. Kleinberg, April Rasala Lehman. ACM-SIAM Symposium on Discrete Algorithms (SODA 06), Miami, FL, January 2006. PDF PS Slides. Special Issue of the IEEE Transactions on Information Theory and the IEEE/ACM Transactions on Networking, 52(6):2345-2364, June 2006. PDF. We consider the capacity of information networks in the absence of interference and noise. An outer bound on the rate region of these networks is presented, answering a question of Song, Yeung and Cai. This outer bound combines properties of entropy with a strong information inequality derived from the structure of the network. This blend of information theoretic and graph theoretic arguments generates many interesting results. For example, we exactly characterize the capacity of directed cycles, solving a question of Kramer and Savari. We also give the first known proof of a gap between the sparsity of an undirected graph and its capacity. Extending this result, we show that multicommodity flow solutions achieve the capacity in an infinite class of undirected graphs, thereby making important progress on a conjecture of Li and Li. This result is in sharp contrast to the situation with directed graphs, where we present a family of graphs in which the gap between the capacity and the rate achievable using multicommodity flows is linear in the size of the graph.
	Keywords: Network Coding, Information Theory, Capacity, Multicommodity Flow, Sparsity, Gaps, k-pairs Communication Problems, Multiple Unicast Sessions, Okamura-Seymour Graph, Infomational Dominance, Matrix Transposition Problem

Notes: Very similar results were independently obtained in Jain, Vazirani, Yuval. "On the capacity of multiple unicast sessions in undirected graphs". IEEE/ACM Transactions on Networking, 2006. The paper in SODA is a merge of these two papers, plus additional results obtained with Micah Adler.

The Complexity of Matrix Completion

2006/01

Nicholas J. A. Harvey, David R. Karger, Sergey Yekhanin.
ACM-SIAM Symposium on Discrete Algorithms (SODA 06), Miami, FL, January 2006. PDF PS Slides.

Given a matrix whose entries are a mixture of numeric values and symbolic variables, the matrix completion problem is to assign values to the variables so as to maximize the resulting matrix rank. This problem has deep connections to computational complexity and numerous important algorithmic applications. Determining the complexity of this problem is a fundamental open question in computational complexity. Under different settings of parameters, the problem is variously in P, in RP, or NP-hard. We shed new light on this landscape by demonstrating a new region of NP-hard scenarios. As a special case, we obtain the first known hardness result for matrices in which each variable appears only twice.

Another particular scenario that we consider is the simultaneous matrix completion problem, where one must simultaneously maximize the rank for several matrices that share variables. This problem has important applications in the field of network coding. Recent work has given a simple, greedy, deterministic algorithm for this problem, assuming that the algorithm works over a sufficiently large field. We show an exact threshold for the field size required to find a simultaneous completion efficiently. This result implies that, surprisingly, the simple greedy algorithm is optimal: finding a simultaneous completion over any smaller field is NP-hard.

Keywords: Matrix Completion, Mixed Matrices, NP-hard, Polynomial Identity Testing

	Tighter Cut-based Bounds for k-pairs Communication Problems	2005/09
	Nicholas J. A. Harvey, Robert D. Kleinberg. 43rd Annual Allerton Conference on Communication, Control, and Computing (Allerton 2005), Monticello, IL, September 2005. PDF PS Slides. We study the extent to which combinatorial cut conditions determine the maximum network coding rate of k-pairs communication problems. We seek a combinatorial parameter of directed networks which constitutes a valid upper bound on the network coding rate but exceeds this rate by only a small factor in the worst case. (This worst-case ratio is called the gap of the parameter.) We begin by considering vertex-sparsity and meagerness, showing that both of these parameters have a gap which is linear in the network size. Due to the weakness of these bounds, we propose a new bound called informational meagerness. This bound generalizes both vertex-sparsity and meagerness and is potentially the first known combinatorial cut condition with a sublinear gap. However, we prove that informational meagerness does not tightly characterize the network coding rate: its gap can be super-constant.
	Keywords: Network Coding, Information Theory, Capacity, Bounds, Sparsity, Informational Dominance

	Deterministic Network Coding by Matrix Completion	2005/01
	Nicholas J. A. Harvey, David R. Karger, Kazuo Murota. ACM-SIAM Symposium on Discrete Algorithms (SODA 05), Vancouver, Canada, January 2005. PDF PS Slides. We present a new deterministic algorithm to construct network codes for multicast problems, a particular class of network information flow problems. Our algorithm easily generalizes to several variants of multicast problems. Our approach is based on a new algorithm for maximum-rank completion of mixed matrices---taking a matrix whose entries are a mixture of numeric values and symbolic variables, and assigning values to the variables so as to maximize the resulting matrix rank. Our algorithm is faster than existing deterministic algorithms and can operate over a smaller field.
	Keywords: Network Coding, Multicast, Deterministic Algorithm, Matrix Completion, Maximum Rank, Mixed Matrices

Notes: After this research was done, we discovered that a deterministic algorithm for network coding multicast had already been obtained in Jaggi, Sanders, Chou, Effros, Egner, Jain, and Tolhuizen. "Polynomial time algorithms for multicast network code construction", IEEE Transactions on Information Theory 2005. Their algorithm is quite a bit simpler, especially if you're not familiar with matroids.

My master's thesis is an extended version of this work, containing a much more detailed explanation than the conference version.

	FUSE: Lightweight Guaranteed Distributed Failure Notification	2004/12
	John Dunagan, Nicholas J. A. Harvey, Michael B. Jones, Dejan Kostic, Marvin Theimer, Alec Wolman. Sixth Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, CA, December 2004. PDF. FUSE is a lightweight failure notification service for building distributed systems. Distributed systems built with FUSE are guaranteed that failure notifications never fail. Whenever a failure notification is triggered, all live members of the FUSE group will hear a notification within a bounded period of time, irrespective of node or communication failures. In contrast to previous work on failure detection, the responsibility for deciding that a failure has occurred is shared between the FUSE service and the distributed application. This allows applications to implement their own definitions of failure. Our experience building a scalable distributed event delivery system on an overlay network has convinced us of the usefulness of this service. Our results demonstrate that the network costs of each FUSE group can be small; in particular, our overlay network implementation requires no additional liveness-verifying ping traffic beyond that already needed to maintain the overlay, making the steady state network load independent of the number of active FUSE groups.
	Keywords: Failure notification, overlay networks

	Comparing Network Coding with Multicommodity Flow for the k-pairs Communication Problem	2004/09
	Nicholas J. A. Harvey, Robert D. Kleinberg, April Rasala Lehman. MIT LCS Technical Report 964, September 2004. PDF. Given a graph G = (V,E) and k source-sink pairs of vertices, this paper investigates the maximum rate r at which all pairs can simultaneously communicate. We view this problem from two perspectives and compare their advantages. In the multicommodity flow formulation, a solution provides dedicated bandwidth r between each source-sink pair. In the information flow formulation, a vertex can transmit a function of the information it received thereby allowing multiple source-sink pairs to share bandwidth. For directed acyclic graphs with n vertices, we show that the rate achievable in the information flow formulation can be a multiplicative factor n larger than the rate achievable in the multicommodity flow formulation. It is well known that for undirected graphs with n vertices, in the multicommodity flow formulation, the maximum rate achievable can be an O(1/log \|V\|) multiplicative factor smaller than the value of the sparsest cut. We extend this result to show that the maximum rate achievable in the information flow setting can be an O(1/log \|V\|) multiplicative factor smaller than the sparsest cut value. For directed acyclic graphs G, we define a parameter called the value of the most meager cut which is an upper bound for the maximum rate achievable in the information flow setting. We also present an example illustrating that this upper bound is not always tight.
	Keywords: Network Coding, Multicommodity Flow, Gaps

	The Post-Order Heap	2004/05
	Nicholas J. A. Harvey, Kevin C. Zatloukal. Third International Conference on Fun with Algorithms (FUN 2004), Elba, Italy, May 2004. PDF PS Slides. Source code is available. We propose the post-order heap, a new data structure for implementing priority queues. Our structure is a simple variant of the binary heap that requires only O(1) amortized time for Insert operations and O(log n) time in the worst case for Delete-Min operations. Like the binary heap, the post-order heap is an implicit data structure, meaning that a structure containing n elements can be stored using only the first n locations of an array and O(1) additional words of storage.
	Keywords: Priority Queue, Heap, Amortized Analysis, Implicit Data Structure

Notes: After writing the initial version of this paper, we discovered that an implicit data structure with better runtime bounds had been obtained 15 years earlier in S. Carlsson, J. I. Munro, and P. V. Poblete. "An implicit binomial queue with constant insertion time". SWAT 1998, 1-13. Our structure is somewhat simpler since we only implement a binary heap, not a binomial queue.

	Family Trees: An ordered dictionary with optimal congestion, locality, degree, and search time	2004/01
	Kevin C. Zatloukal, Nicholas J. A. Harvey. ACM-SIAM Symposium on Discrete Algorithms (SODA 04), New Orleans, LA, January 2004. PDF PS Slides. We consider the problem of storing an ordered dictionary data structure over a distributed set of nodes. In contrast to traditional sequential data structures, distributed data structures should ideally have low congestion. We present a novel randomized data structure, called a family tree, to solve this problem. A family tree has optimal expected congestion, uses only a constant amount of state per node, and supports searches and node insertion/deletion in expected O(log n) time on a system with n nodes. Furthermore, a family tree supports keys from any ordered domain. Because the keys are not hashed, searches have good locality in the sense that intermediate nodes on the search path have keys that are not far outside of the range between the source and destination.
	Keywords: Distributed Data Structure, Ordered Dictionary, Peer-to-peer, Congestion, Locality, Randomization

Notes: A related and simpler design can be found in Goodrich, Nelson, and Sun. "The rainbow skip graph: A fault-tolerant constant-degree distributed data structure", SODA 2006.

	Deterministic SkipNet	2003/07
	Nicholas J. A. Harvey, J. Ian Munro. ACM Symposium on Principles of Distributed Computing (PODC 2003), Boston, MA, July 2003. PDF PS. Information Processing Letters, 90(4):205-208, May 2004. PDF. We present a deterministic scalable overlay network. In contrast, most previous overlays use randomness or hashing (pseudo-randomness) to achieve a uniform distribution of data and routing traffic.
	Keywords: Peer-to-Peer, Deterministic, Scalable, Locality, Self-Configuring, Distributed System

Semi-Matchings for Bipartite Graphs and Load Balancing

2003/07

Nicholas J. A. Harvey, Richard E. Ladner, Laszlo Lovasz, Tami Tamir.
Workshop on Algorithms and Data Structures (WADS 2003), Ottawa, Canada, July 2003. PDF PS Slides.
Journal of Algorithms, 59(1):53-78, 2006. PDF PS.

We consider the problem of fairly matching the left-hand vertices of a bipartite graph to the right-hand vertices. We refer to this problem as the semi-matching problem; it is a relaxation of the known bipartite matching problem. We present a way to evaluate the quality of a given semi-matching and show that, under this measure, an optimal semi-matching balances the load on the right hand vertices with respect to any Lp-norm. In particular, when modeling a job assignment system, an optimal semi-matching achieves the minimal makespan and the minimal flow time for the system.

The problem of finding optimal semi-matchings is a special case of certain scheduling problems for which known solutions exist. However, these known solutions are based on general network optimization algorithms, and are not the most efficient way to solve the optimal semi-matching problem. To compute optimal semi-matchings efficiently, we present and analyze two new algorithms. The first algorithm generalizes the Hungarian method for computing maximum bipartite matchings, while the second, more efficient algorithm is based on a new notion of cost-reducing paths. Our experimental results demonstrate that the second algorithm is vastly superior to using known network optimization algorithms to solve the optimal semi-matching problem. Furthermore, this same algorithm can also be used to find maximum bipartite matchings and is shown to be roughly as efficient as the best known algorithms for this goal.

Keywords: Graph Algorithms, Bipartite Matching, Load Balancing, Scheduling, Assignment, Flow Time

Notes: The analysis of our algorithm ASM2 is due to David Bruce Wilson. A much nicer algorithm for the Optimal Semi Matching problem appears in Jittat Fakcharoenphol, Bundit Laekhanukit, Danupon Nanongkai. "Faster Algorithms for Semi-Matching Problems", ICALP 2010.

	SkipNet: A Scalable Overlay Network with Practical Locality Properties	2003/03
	Nicholas J. A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA, March 2003. Received the Best Paper Award. PDF PS HTML Slides. Long version, October 2003. PS. Source code is available. Scalable overlay networks such as Chord, CAN, Pastry, and Tapestry have recently emerged as flexible infrastructure for building large peer-to-peer systems. In practice, such systems have two disadvantages: They provide no control over where data is stored and no guarantee that routing paths remain within an administrative domain whenever possible. SkipNet is a scalable overlay network that provides controlled data placement and guaranteed routing locality by organizing data primarily by string names. SkipNet allows for both fine-grained and coarse-grained control over data placement: Content can be placed either on a pre-determined node or distributed uniformly across the nodes of a hierarchical naming subtree. An additional useful consequence of SkipNet's locality properties is that partition failures, in which an entire organization disconnects from the rest of the system, can result in two disjoint, but well-connected overlay networks.
	Keywords: Peer-to-Peer, Scalable, Locality, Self-Configuring, Range Query, Distributed System

Notes: Essentially the same idea was independently discovered by Aspnes and Shah, and published in James Aspnes, Gauri Shah, "Skip graphs". SODA 2003, 384-393.

	Efficient Recovery From Organizational Disconnect in SkipNet	2003/02
	Nicholas J. A. Harvey, Michael B. Jones, Marvin Theimer, Alec Wolman. 2nd International Workshop on Peer-to-Peer Systems (IPTPS '03), Berkeley, CA, February 2003. PDF PS. SkipNet is a scalable overlay network that provides controlled data placement and routing locality guarantees by organizing data primarily by lexicographic ordering of string names. A key side-effect of the SkipNet design is that all nodes from an organization form one or a few contiguous overlay segments. When an entire organization disconnects from the rest of the system, repair of only a few pointers quickly enables efficient routing throughout the disconnected organization; full repair is done as a subsequent background task. These same operations can be later used to efficiently reconnect an organization's SkipNet back into the global one.
	Keywords: Peer-to-Peer, Fault Tolerance, Failures, Locality, Self-Configuring