Start line:  
End line:  

Snippet Preview

Snippet HTML Code

Stack Overflow Questions
  package it.unimi.dsi.sux4j.mph;
  
  /*		 
   * Sux4J: Succinct data structures for Java
   *
   * Copyright (C) 2008-2014 Sebastiano Vigna 
   *
   *  This library is free software; you can redistribute it and/or modify it
   *  under the terms of the GNU Lesser General Public License as published by the Free
  *  Software Foundation; either version 3 of the License, or (at your option)
  *  any later version.
  *
  *  This library is distributed in the hope that it will be useful, but
  *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
  *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
  *  for more details.
  *
  *  You should have received a copy of the GNU Lesser General Public License
  *  along with this program; if not, see <http://www.gnu.org/licenses/>.
  *
  */
 
 
 
A class implementing the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique.

Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech have described in “A family of perfect hashing methods”, Comput. J., 39(6):547&minus;554, 1996, a 3-hypergraph based technique to store functions (actually, the paper uses the technique just to store a permutation of the key set, but it is clear it can be used to store any function). More generally, the procedure first generates a random 3-partite 3-hypergraph whose edges correspond to elements of the function domain. Then, it sorts the edges of the random 3-hypergraph so that for each edge at least one vertex, the hinge, never appeared before in the sorted edge list (this happens with high probability if the number of vertices is at least &gamma; times the number of edges).

Instances of this class contain the data necessary to generate the random hypergraph and apply the sorting procedure. At construction time, you provide just the desired number of edges; then, each call to generateAndSort() will generate a new 3-hypergraph using a 64-bit seed, an iterator returning the key set, and a corresponding it.unimi.dsi.bits.TransformationStrategy. If the method returns true, the sorting was successful and in the public field stack you can retrieve hinges in the opposite of the desired order (so enumerating hinges starting from the last in stack you are guaranteed to find each time a vertex that never appeared in some 3-hyperedge associated with previous hinges). For each hinge, the corresponding values in vertex1 and vertex2 complete the 3-hyperedge associated with the hinge, and the corresponding value in edge contains the index of the 3-hyperedge (i.e., the index of the key that generated the hyperedge). The computation of the last value can be disabled if you do not need it.

The public fields numEdges and numVertices expose information about the generated 3-hypergraph. For m edges, the number of vertices will be &lceil &gamma;m &rceil; + 1, rounded up to the nearest multiple of 3, unless m is zero, in which case the number of vertices will be zero, too. Note that index of the hash function that generated a particular vertex of a 3-hyperedge can be recovered dividing by partSize, which is exactly numVertices/3.

To guarantee consistent results when reading a Majewski-Wormald-Havas-Czech-like structure, the method bitVectorToEdge() can be used to retrieve, starting from a bit vector, the corresponding edge. While having a function returning the edge starting from a key would be more object-oriented and avoid hidden dependencies, it would also require storing the transformation provided at construction time, which would make this class non-thread-safe. Just be careful to transform the keys into bit vectors using the same it.unimi.dsi.bits.TransformationStrategy used to generate the random 3-hypergraph.

Support for preprocessed keys

This class provides two special access points for classes that have pre-digested their keys. The methods generateAndSort(java.util.Iterator,long) and tripleToEdge(long[],long,int,int,int[]) use fixed-length 192-bit keys under the form of triples of longs. The intended usage is that of turning the keys into such a triple using Jenkins's hash and then operating directly on the hash codes. This is particularly useful in chunked constructions, where the keys are replaced by their 192-bit hashes in the first place. Note that the hashes are actually rehashed using Hashes.jenkins(long[],long,long[])—this is necessary to vary the associated edges whenever the generated 3-hypergraph is not acyclic.

Warning: you cannot mix the bitvector-based and the triple-based constructors and static methods. It is your responsibility to pair them correctly.

Implementation details

We use Jenkin's hash in its 64-bit incarnation: beside providing an excellent hash function, it actually computes three 64-bit hash values, which is exactly what we need.

The XOR trick

Since the list of edges incident to a node is accessed during the peeling process only when the node has degree one, we can actually store in a single integer the XOR of the indices of all edges incident to the node. This approach significantly simplifies the code and reduces memory usage.

We push further this idea by observing that since one of the vertices of an edge incident to x is exactly x, we can even avoid storing the edges at all and just store for each node two additional values that contain a XOR of the other two nodes of each edge incident on the node. This approach simplifies considerably the code as every 3-hyperedge is presented to us as a distinguished vertex (the hinge) plus two additional vertices.

Rounds and Logging

Building and sorting a large 3-hypergraph may take hours. As it happens with all probabilistic algorithms, one can just give estimates of the expected time.

There are two probabilistic sources of problems: duplicate edges and non-acyclic hypergraphs. However, the probability of duplicate edges is vanishing when n approaches infinity, and once the hypergraph has been generated, the stripping procedure succeeds in an expected number of trials that tends to 1 as n approaches infinity.

To help diagnosing problem with the generation process class, this class will log at debug level what's happening.

Note that if the the peeling process fails more than twice, you should suspect that there are duplicates in the string list (unless the number of vertices is very small).

Author(s):
Sebastiano Vigna
public class HypergraphSorter<T> {
The initial size of the queue used to peel the 3-hypergraph.
	private static final int INITIAL_QUEUE_SIZE = 1024;
	private final static Logger LOGGER = LoggerFactory.getLoggerHypergraphSorter.class );

The mythical threshold (or better, a very closed upper bound of): random 3-hypergraphs are acyclic with high probability if the ratio vertices/edges exceeds this constant.
	public static final double GAMMA = 1.23;
The number of vertices in the hypergraph (&lceil; GAMMA * numEdges &rceil; + 1, rounded up to the nearest multiple of 3).
	public final int numVertices;
	public final int partSize;
The number of edges in the hypergraph.
	public final int numEdges;
For each vertex, the XOR of the values of the smallest other vertex in each incident 3-hyperedge.
	public final int[] vertex1;
For each vertex, the XOR of the values of the largest other vertex in each incident 3-hyperedge.
	public final int[] vertex2;
For each vertex, the XOR of the indices of incident 3-hyperedges.
	public final int[] edge;
The edge stack. At the end of a successful sorting phase, it contains the hinges in reverse order.
	public final int[] stack;
The degree of each vertex of the intermediate 3-hypergraph.
	private final int[] d;
If true, we do not allocate edge and to not compute edge indices.
	private final boolean computeEdges;
	private boolean neverUsed;
Initial top of the edge stack.
	private int top;
The stack used for peeling the graph.
	private final IntArrayList visitStack;

Creates a hypergraph sorter for a given number of edges.

Parameters:
numEdges the number of edges of this hypergraph sorter.
computeEdges if false, the index of the edge associated with each hinge will not be computed.
	public HypergraphSorterfinal int numEdgesfinal boolean computeEdges ) {
		this. = numEdges;
		this. = computeEdges;
		// The theoretically sufficient number of vertices
		final int m = numEdges == 0 ? 0 : (int)Math.ceil * numEdges ) + 1;
		// This guarantees that the number of vertices is a multiple of 3
		 = m + ( 3 - m % 3 ) % 3;
		 = new int ];
		 = new int ];
		 = computeEdges ? new int ] : null;
		 = new intnumEdges ];
		 = new int ];
		 = true;
	}

Creates a hypergraph sorter for a given number of edges.

Parameters:
numEdges the number of edges of this hypergraph sorter.
	public HypergraphSorterfinal int numEdges ) {
		thisnumEdgestrue );
	}

Turns a bit vector into a 3-hyperedge.

The returned edge satisfies the property that the i-th vertex is in the interval [i·partSize..i+1·partSize). However, if there are no edges the vector e will be filled with -1.

Parameters:
bv a bit vector.
seed the seed for the hash function.
numVertices the number of vertices in the underlying hypergraph.
partSize numVertices/3 (to avoid a division).
e an array to store the resulting edge.
	public static void bitVectorToEdgefinal BitVector bvfinal long seedfinal int numVerticesfinal int partSizefinal int e[] ) {
		if ( numVertices == 0 ) {
			e[ 0 ] = e[ 1 ] = e[ 2 ] = -1;
			return;
		}
		final long[] hash = new long[ 3 ];
		Hashes.jenkinsbvseedhash );
		e[ 0 ] = (int)( ( hash[ 0 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
		e[ 1 ] = (int)( partSize + ( hash[ 1 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
		e[ 2 ] = (int)( ( partSize << 1 ) + ( hash[ 2 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
	}

Turns a bit vector into a 3-hyperedge.

Parameters:
bv a bit vector.
seed the seed for the hash function.
numVertices the number of vertices in the underlying hypergraph.
e an array to store the resulting edge.
See also:
bitVectorToEdge(it.unimi.dsi.bits.BitVector,long,int,int,int[])
	public static void bitVectorToEdgefinal BitVector bvfinal long seedfinal int numVerticesfinal int e[] ) {
		bitVectorToEdgebvseednumVertices, (int)( numVertices * 0xAAAAAAABL >>> 33 ), e ); // Fast division by 3
	}

Turns a triple of longs into a 3-hyperedge.

Parameters:
triple a triple of intermediate hashes.
seed the seed for the hash function.
numVertices the number of vertices in the underlying hypergraph.
partSize numVertices/3 (to avoid a division).
e an array to store the resulting edge.
See also:
bitVectorToEdge(it.unimi.dsi.bits.BitVector,long,int,int,int[])
	public static void tripleToEdgefinal long[] triplefinal long seedfinal int numVerticesfinal int partSizefinal int e[] ) {
		if ( numVertices == 0 ) {
			e[ 0 ] = e[ 1 ] = e[ 2 ] = -1;
			return;
		}
		final long[] hash = new long[ 3 ];
		Hashes.jenkinstripleseedhash );
		e[ 0 ] = (int)( ( hash[ 0 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
		e[ 1 ] = (int)( partSize + ( hash[ 1 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
		e[ 2 ] = (int)( ( partSize << 1 ) + ( hash[ 2 ] & 0x7FFFFFFFFFFFFFFFL ) % partSize );
	}


Turns a triple of longs into a 3-hyperedge.

Parameters:
triple a triple of intermediate hashes.
seed the seed for the hash function.
numVertices the number of vertices in the underlying hypergraph.
e an array to store the resulting edge.
See also:
bitVectorToEdge(it.unimi.dsi.bits.BitVector,long,int,int,int[])
	public static void tripleToEdgefinal long[] triplefinal long seedfinal int numVerticesfinal int e[] ) {
		tripleToEdgetripleseednumVertices, (int)( numVertices * 0xAAAAAAABL >>> 33 ), e );  // Fast division by 3
	}
	private final void cleanUpIfNecessary() {
		if ( !  ) { 
			IntArrays.fill, 0 );
			IntArrays.fill, 0 );
			IntArrays.fill, 0 );
			if (  ) IntArrays.fill, 0 );
		}
		 = false;
	}
	private final void xorEdgefinal int kfinal int xfinal int yfinal int zboolean partial ) {
		//if ( partial ) System.err.println( "Stripping <" + x + ", " + y + ", " + z + ">: " + Arrays.toString( edge ) + " " + Arrays.toString( hinge ) );
		if (  ) {
			if ( ! partial ) x ] ^= k;
			y ] ^= k;
			z ] ^= k;
		}
		if ( ! partial ) {
			if ( y < z ) {
				x ] ^= y;
				x ] ^= z;
			}
			else {
				x ] ^= z;
				x ] ^= y;
			}
		}
		if ( x < z ) {
			y ] ^= x;
			y ] ^= z;
		}
		else {
			y ] ^= z;
			y ] ^= x;
		}
		if ( x < y ) {
			z ] ^= x;
			z ] ^= y;
		}
		else {
			z ] ^= y;
			z ] ^= x;
		}
		//if ( ! partial ) System.err.println( "After adding <" + x + ", " + y + ", " + z + ">" );
	}

Generates a random 3-hypergraph and tries to sort its edges.

Parameters:
iterator an iterator returning numEdges keys.
transform a transformation from keys to bit vectors.
seed a 64-bit random seed.
Returns:
true if the sorting procedure succeeded.
	public boolean generateAndSortfinal Iterator<? extends T> iteratorfinal TransformationStrategy<? super T> transformfinal long seed ) {
		// We cache all variables for faster access
		final int[] d = this.;
		final int[] e = new int[ 3 ];
		/* We build the XOR'd edge list and compute the degree of each vertex. */
		forint k = 0; k < k++ ) {
			bitVectorToEdgetransform.toBitVectoriterator.next() ), seede );
			xorEdgeke[ 0 ], e[ 1 ], e[ 2 ], false );
			de[ 0 ] ]++;
			de[ 1 ] ]++;
			de[ 2 ] ]++;
		}
		if ( iterator.hasNext() ) throw new IllegalStateException"This " + HypergraphSorter.class.getSimpleName() + " has " +  + " edges, but the provided iterator returns more" );
		return sort();
	}


Generates a random 3-hypergraph and tries to sort its edges.

Parameters:
iterator an iterator returning numEdges triples of longs.
seed a 64-bit random seed.
Returns:
true if the sorting procedure succeeded.
	public boolean generateAndSortfinal Iterator<long[]> iteratorfinal long seed ) {
		// We cache all variables for faster access
		final int[] d = this.;
		final int[] e = new int[ 3 ];
		/* We build the XOR'd edge list and compute the degree of each vertex. */
		forint k = 0; k < k++ ) {
			tripleToEdgeiterator.next(), seede );
			xorEdgeke[ 0 ], e[ 1 ], e[ 2 ], false );
			de[ 0 ] ]++;
			de[ 1 ] ]++;
			de[ 2 ] ]++;
		}
		if ( iterator.hasNext() ) throw new IllegalStateException"This " + HypergraphSorter.class.getSimpleName() + " has " +  + " edges, but the provided iterator returns more" );
		return sort();
	}

Sorts the edges of a random 3-hypergraph in “leaf peeling” order.

Returns:
true if the sorting procedure succeeded.
	private boolean sort() {
		// We cache all variables for faster access
		final int[] d = this.;
		//System.err.println("Visiting...");
		.debug"Peeling hypergraph..." );
		 = 0;
		forint i = 0; i < i++ ) if ( di ] == 1 ) peeli );
		if (  ==  ) .debug"Peeling completed." );
		else {
			.debug"Visit failed: peeled " +  + " edges out of " +  + "." );
			return false;
		}
		return true;
	}
	private void peelfinal int x ) {
		// System.err.println( "Visiting " + x + "..." );
		final int[] vertex1 = this.;
		final int[] vertex2 = this.;
		final int[] edge = this.;
		final int[] stack = this.;
		final int[] d = this.;
		final IntArrayList visitStack = this.;
		// Queue initialization
		int v;
		visitStack.clear();
		visitStack.pushx );
		while ( ! visitStack.isEmpty() ) {
			v = visitStack.popInt();
			if ( dv ] == 1 ) {
				stack++ ] = v;
				--dv ];
				// System.err.println( "Stripping <" + v + ", " + vertex1[ v ] + ", " + vertex2[ v ] + ">" );
				xorEdge ? edgev ] : -1, vvertex1v ], vertex2v ], true );
				if ( --dvertex1v ] ] == 1 ) visitStack.addvertex1v ] );
				if ( --dvertex2v ] ] == 1 ) visitStack.addvertex2v ] );				
			}
		}
	}
New to GrepCode? Check out our FAQ X