Start line:  
End line:  

Snippet Preview

Snippet HTML Code

Stack Overflow Questions
   * Licensed to the Apache Software Foundation (ASF) under one
   * or more contributor license agreements.  See the NOTICE file
   * distributed with this work for additional information
   * regarding copyright ownership.  The ASF licenses this file
   * to you under the Apache License, Version 2.0 (the
   * "License"); you may not use this file except in compliance
   * with the License.  You may obtain a copy of the License at
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.

Restrict the domain of a data attribute, often times to fulfill business rules/requirements.

Table of Contents

  • Overview
  • Concurrency and Atomicity
  • Caveats
  • Example Usage


Constraints are used to enforce business rules in a database. By checking all Puts on a given table, you can enforce very specific data policies. For instance, you can ensure that a certain column family-column qualifier pair always has a value between 1 and 10. Otherwise, the org.apache.hadoop.hbase.client.Put is rejected and the data integrity is maintained.

Constraints are designed to be configurable, so a constraint can be used across different tables, but implement different behavior depending on the specific configuration given to that constraint.

By adding a constraint to a table (see Example Usage), constraints will automatically enabled. You also then have the option of to disable (just 'turn off') or remove (delete all associated information) all constraints on a table. If you remove all constraints (see Constraints.remove(org.apache.hadoop.hbase.HTableDescriptor), you must re-add any Constraint you want on that table. However, if they are just disabled (see Constraints.disable(org.apache.hadoop.hbase.HTableDescriptor), all you need to do is enable constraints again, and everything will be turned back on as it was configured. Individual constraints can also be individually enabled, disabled or removed without affecting other constraints.

By default, constraints are disabled on a table. This means you will not see any slow down on a table if constraints are not enabled.

Concurrency and Atomicity

Currently, no attempts at enforcing correctness in a multi-threaded scenario when modifying a constraint, via Constraints, to the the org.apache.hadoop.hbase.HTableDescriptor. This is particularly important when adding a constraint(s) to the org.apache.hadoop.hbase.HTableDescriptor as it first retrieves the next priority from a custom value set in the descriptor, adds each constraint (with increasing priority) to the descriptor, and then the next available priority is re-stored back in the org.apache.hadoop.hbase.HTableDescriptor.

Locking is recommended around each of Constraints add methods: Constraints.add(org.apache.hadoop.hbase.HTableDescriptor,java.lang.Class[]), Constraints.add(org.apache.hadoop.hbase.HTableDescriptor,org.apache.hadoop.hbase.util.Pair[]), and Constraints.add(org.apache.hadoop.hbase.HTableDescriptor,java.lang.Class,org.apache.hadoop.conf.Configuration). Any changes on a single HTableDescriptor should be serialized, either within a single thread or via external mechanisms.

Note that having a higher priority means that a constraint will run later; e.g. a constraint with priority 1 will run before a constraint with priority 2.

Since Constraints currently are designed to just implement simple checks (e.g. is the value in the right range), there will be no atomicity conflicts. Even if one of the puts finishes the constraint first, the single row will not be corrupted and the 'fastest' write will win; the underlying region takes care of breaking the tie and ensuring that writes get serialized to the table. So yes, this doesn't ensure that we are going to get specific ordering or even a fully consistent view of the underlying data.

Each constraint should only use local/instance variables, unless doing more advanced usage. Static variables could cause difficulties when checking concurrent writes to the same region, leading to either highly locked situations (decreasing through-put) or higher probability of errors. However, as long as each constraint just uses local variables, each thread interacting with the constraint will execute correctly and efficiently.


In traditional (SQL) databases, Constraints are often used to enforce referential integrity. However, in HBase, this will likely cause significant overhead and dramatically decrease the number of Puts/second possible on a table. This is because to check the referential integrity when making a org.apache.hadoop.hbase.client.Put, one must block on a scan for the 'remote' table, checking for the valid reference. For millions of Puts a second, this will breakdown very quickly. There are several options around the blocking behavior including, but not limited to:
  • Create a 'pre-join' table where the keys are already denormalized
  • Designing for 'incorrect' references
  • Using an external enforcement mechanism
There are also several general considerations that must be taken into account, when using Constraints:
  1. All changes made via Constraints will make modifications to the org.apache.hadoop.hbase.HTableDescriptor for a given table. As such, the usual renabling of tables should be used for propagating changes to the table. When at all possible, Constraints should be added to the table before the table is created.
  2. Constraints are run in the order that they are added to a table. This has implications for what order constraints should be added to a table.
  3. Whenever new Constraint jars are added to a region server, those region servers need to go through a rolling restart to make sure that they pick up the new jars and can enable the new constraints.
  4. There are certain keys that are reserved for the Configuration namespace:
    • _ENABLED - used server-side to determine if a constraint should be run
    • _PRIORITY - used server-side to determine what order a constraint should be run
    If these items are set, they will be respected in the constraint configuration, but they are taken care of by default in when adding constraints to an org.apache.hadoop.hbase.HTableDescriptor via the usual method.

Under the hood, constraints are implemented as a Coprocessor (see ConstraintProcessor if you are interested).

Example usage

First, you must define a Constraint. The best way to do this is to extend BaseConstraint, which takes care of some of the more mundane details of using a Constraint.

Let's look at one possible implementation of a constraint - an IntegerConstraint(there are also several simple examples in the tests). The IntegerConstraint checks to make sure that the value is a String-encoded int. It is really simple to implement this kind of constraint, the only method needs to be implemented is Constraint.check(org.apache.hadoop.hbase.client.Put):

public class IntegerConstraint extends BaseConstraint {
public void check(Put p) throws ConstraintException {

Map&ltbyte[], List&ltKeyValue&gt&gt familyMap = p.getFamilyMap();

for (List &ltKeyValue&gt kvs : familyMap.values()) {
for (KeyValue kv : kvs) {

// just make sure that we can actually pull out an int
// this will automatically throw a NumberFormatException if we try to
// store something that isn't an Integer.

try {
Integer.parseInt(new String(kv.getValue()));
} catch (NumberFormatException e) {
throw new ConstraintException("Value in Put (" + p
+ ") was not a String-encoded integer", e);
} } }

Note that all exceptions that you expect to be thrown must be caught and then rethrown as a ConstraintException. This way, you can be sure that a org.apache.hadoop.hbase.client.Put fails for an expected reason, rather than for any reason. For example, an java.lang.OutOfMemoryError is probably indicative of an inherent problem in the Constraint, rather than a failed org.apache.hadoop.hbase.client.Put.

If an unexpected exception is thrown (for example, any kind of uncaught java.lang.RuntimeException), constraint-checking will be 'unloaded' from the regionserver where that error occurred. This means no further Constraints will be checked on that server until it is reloaded. This is done to ensure the system remains as available as possible. Therefore, be careful when writing your own Constraint.

So now that we have a Constraint, we want to add it to a table. It's as easy as:

HTableDescriptor desc = new HTableDescriptor(TABLE_NAME);
Constraints.add(desc, IntegerConstraint.class);

Once we added the IntegerConstraint, constraints will be enabled on the table (once it is created) and we will always check to make sure that the value is an String-encoded integer.

However, suppose we also write our own constraint, First, you need to make sure this class-files are in the classpath (in a jar) on the regionserver where that constraint will be run (this could require a rolling restart on the region server - see Caveats above)

Suppose that MyConstraint also uses a Configuration (see org.apache.hadoop.conf.Configurable.getConf()). Then adding MyConstraint looks like this:

HTableDescriptor desc = new HTableDescriptor(TABLE_NAME);
Configuration conf = new Configuration(false);
(add values to the conf)
(modify the table descriptor)
Constraints.add(desc, new Pair(MyConstraint.class, conf));

At this point we added both the IntegerConstraint and MyConstraint to the table, the IntegerConstraint will be run first, followed by MyConstraint.

Suppose we realize that the org.apache.hadoop.conf.Configuration for MyConstraint is actually wrong when it was added to the table. Note, when it is added to the table, it is not added by reference, but is instead copied into the org.apache.hadoop.hbase.HTableDescriptor. Thus, to change the org.apache.hadoop.conf.Configuration we are using for MyConstraint, we need to do this:

(add/modify the conf)
Constraints.setConfiguration(desc, MyConstraint.class, conf);

This will overwrite the previous configuration for MyConstraint, but not change the order of the constraint nor if it is enabled/disabled.

Note that the same constraint class can be added multiple times to a table without repercussion. A use case for this is the same constraint working differently based on its configuration.

Suppose then we want to disable just MyConstraint. Its as easy as:

Constraints.disable(desc, MyConstraint.class);

This just turns off MyConstraint, but retains the position and the configuration associated with MyConstraint. Now, if we want to re-enable the constraint, its just another one-liner:

Constraints.enable(desc, MyConstraint.class);

Similarly, constraints on the entire table are disabled via:


Or enabled via:


Lastly, suppose you want to remove MyConstraint from the table, including with position it should be run at and its configuration. This is similarly simple:

Constraints.remove(desc, MyConstraint.class);

Also, removing all constraints from a table is similarly simple:

This will remove all constraints (and associated information) from the table and turn off the constraint processing.


It is important to note the use above of

Configuration conf = new Configuration(false);
If you just use new Configuration(), then the Configuration will be loaded with the default properties. While in the simple case, this is not going to be an issue, it will cause pain down the road. First, these extra properties are going to cause serious bloat in your org.apache.hadoop.hbase.HTableDescriptor, meaning you are keeping around a ton of redundant information. Second, it is going to make examining your table in the shell, via describe 'table', a huge pain as you will have to dig through a ton of irrelevant config values to find the ones you set. In short, just do it the right way.
package org.apache.hadoop.hbase.constraint;
New to GrepCode? Check out our FAQ X