Start line:  
End line:  

Snippet Preview

Snippet HTML Code

Stack Overflow Questions
Copyright 2010 Ubiquitous Knowledge Processing (UKP) Lab Technische Universit├Ąt Darmstadt Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. /
 package de.tudarmstadt.ukp.dkpro.core.snowball;
 import static org.apache.commons.lang.StringUtils.isBlank;
 import java.util.Map;
 import java.util.Set;

UIMA wrapper for the Snowball stemmer. Annotation types to be stemmed can beconfigured by a FeaturePath.

If you use this component in a pipeline which uses stop word removal, make sure that it runs after the stop word removal step, so only words that are no stop words are stemmed.

Benjamin Herbert
Richard Eckart de Castilho
See also:
Snowball stemmer homepage
 public class SnowballStemmer
 	private static final String MESSAGE_DIGEST = SnowballStemmer.class.getName()+"_Messages";
 	private static final String SNOWBALL_PACKAGE = "org.tartarus.snowball.ext.";

Use this language instead of the document language to resolve the model.
 	@ConfigurationParameter(name = , mandatory = false)
 	protected String language;

Per default the stemmer runs in case-sensitive mode. If this parameter is enabled, tokens are lower-cased before being passed to the stemmer. Examples:
false (default)true
 	public static final String PARAM_LOWER_CASE = "lowerCase";
 	@ConfigurationParameter(name = , mandatory = false, defaultValue="false")
 	protected boolean lowerCase;
 	public static final Map<StringStringlanguages = new HashMap<StringString>();
 	static {
	protected Set<StringgetDefaultPaths()
		return Collections.singleton(Token.class.getName());
	protected void generateAnnotations(JCas jcas)
		// CAS is necessary to retrieve values
		CAS currCAS = jcas.getCas();
		for (String path : ) {
			// Separate Typename and featurepath
			String[] segments = path.split("/", 2);
			String typeName = segments[0];
			// Try to get the type from the typesystem of the CAS
			Type t = currCAS.getTypeSystem().getType(typeName);
			if (t == null) {
				throw new IllegalStateException("Type [" + typeName + "] not found in type system");
			// get an fpi object and initialize it
			// initialize the FeaturePathInfo with the corresponding part
			// get the annotations
			AnnotationIndex<?> idx = currCAS.getAnnotationIndex(t);
			FSIterator<?> iterator = idx.iterator();
			while (iterator.hasNext()) {
				AnnotationFS fs = (;
				try {
					if (this. != null) {
						// check annotation filter condition
					else { // no annotation filter specified
					// TODO Auto-generated catch block
							"error occured while creating a stem annotation"e);
		// Try language set on analysis engine
		String lang = ;
		if (isBlank(lang)) {
			lang = aCas.getDocumentLanguage();
		// Try language set in CAS.
		if (isBlank(lang)) {
			throw new AnalysisEngineProcessException("no_language_error"null);
		lang = lang.toLowerCase(.);
			try {
				String langPart = .get(lang);
				if (langPart == null) {
							"unsupported_language_error"new Object[] { lang });
				String snowballStemmerClass =  + .get(lang) + "Stemmer";
				Class<SnowballProgramstemClass = (Class<SnowballProgram>) Class
			catch (Exception e) {

Creates a Stem annotation with same begin and end as the AnnotationFS fs, the value is the stemmed value derived by applying the featurepath.

fs the AnnotationFS where the Stem annotation is created
	private void createStemAnnotation(JCas jcasAnnotationFS fs)
		// Check for blank text, it makes no sense to add a stem then (and raised an exception)
		String value = .getValue(fs);
		if (!StringUtils.isBlank(value)) {
			if () {
				// Fixme - should use locale/language defined in CAS.
				value = value.toLowerCase(.);
			Stem stemAnnot = new Stem(jcasfs.getBegin(), fs.getEnd());
			try {
				// The patched snowball from Lucene has this as a method on SnowballProgram
				// but if we have some other snowball also in the classpath, Java might
				// choose to use the other. So to be safe, we use a reflection here.
				// -- REC, 2011-04-17
			catch (Exception e) {
			// Try setting the "stem" feature on Tokens.
			Feature feat = fs.getType().getFeatureByBaseName("stem");
			if (feat != null && feat.getRange() != null
					&& jcas.getTypeSystem().subsumes(feat.getRange(), stemAnnot.getType())) {
New to GrepCode? Check out our FAQ X