Start line:  
End line:  

Snippet Preview

Snippet HTML Code

Stack Overflow Questions
wlfxb - a library for creating and processing of TCF data streams. Copyright (C) Yana Panchenko. This file is part of wlfxb. This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
 
 
 package eu.clarin.weblicht.wlfxb.io;
 
 import java.util.List;
Class TextCorpusStreamed is used for accessing specified annotation layers and (optionally) adding any new annotation layers from/to TextCorpus. Only specified in the constructor annotation layers are loaded into the memory. In case all the annotation layers should be loaded into the memory, use eu.clarin.weblicht.wlfxb.xb.WLData class.

Author(s):
Yana Panchenko
 
 public class TextCorpusStreamed extends TextCorpusStored implements Closeable {
 
     private EnumSet<TextCorpusLayerTaglayersFound = EnumSet.noneOf(TextCorpusLayerTag.class);
     private EnumSet<TextCorpusLayerTagreadSucceeded = EnumSet.noneOf(TextCorpusLayerTag.class);
     private XMLEventReader xmlEventReader;
     private XMLEventWriter xmlEventWriter;
     private XmlReaderWriter xmlReaderWriter;
     private static final int LAYER_INDENT_RELATIVE = 1;
     private boolean closed = false;

    
Creates a TextCorpusStreamed from the given TCF input stream and specified annotation layers.

Parameters:
inputStream the underlying input stream with linguistic annotations in TCF format.
layersToRead the annotation layers of TextCorpus that should be read into this TextCorpusStreamed.
Throws:
WLFormatException if an error in input format or an I/O error occurs.
 
     public TextCorpusStreamed(InputStream inputStream,
             EnumSet<TextCorpusLayerTaglayersToRead)
             throws WLFormatException {
         super("unknown");
         getLayersToReadWithDependencies(layersToRead);
         try {
             initializeReaderAndWriter(inputStreamnullfalse);
             process();
         } catch (WLFormatException e) {
             .close();
             throw e;
         }
     }

    
Creates a TextCorpusStreamed from the given TCF input stream, specified annotation layers and the output stream.

Parameters:
inputStream the underlying input stream with linguistic annotations in TCF format.
layersToRead the annotation layers of TextCorpus that should be read into this TextCorpusStreamed.
outputStream the underlying output stream into which the annotations from the input stream and any new created annotations will be written (in TCF format).
Throws:
WLFormatException if an error in input format or an I/O error occurs.
    public TextCorpusStreamed(InputStream inputStream,
            EnumSet<TextCorpusLayerTaglayersToReadOutputStream outputStream)
            throws WLFormatException {
        super("unknown");
        getLayersToReadWithDependencies(layersToRead);
        try {
            initializeReaderAndWriter(inputStreamoutputStreamfalse);
            process();
        } catch (WLFormatException e) {
            .close();
            throw e;
        }
    }

    
Creates a TextCorpusStreamed from the given TCF input stream, specified annotation layers and the output stream.

Parameters:
inputStream the underlying input stream with linguistic annotations in TCF format.
layersToRead the annotation layers of TextCorpus that should be read into this TextCorpusStreamed.
outputStream the underlying output stream into which the annotations from the input stream and any new created annotations will be written (in TCF format).
outputAsXmlFragment true if the output should not contain xml headers, false otherwise.
Throws:
WLFormatException if an error in input format or an I/O error occurs.
    public TextCorpusStreamed(InputStream inputStream,
            EnumSet<TextCorpusLayerTaglayersToReadOutputStream outputStream,
            boolean outputAsXmlFragment)
            throws WLFormatException {
        super("unknown");
        getLayersToReadWithDependencies(layersToRead);
        try {
            initializeReaderAndWriter(inputStreamoutputStreamoutputAsXmlFragment);
            process();
        } catch (WLFormatException e) {
            .close();
            throw e;
        }
    }

    
Creates a TextCorpusStreamed from the given TCF input stream, specified annotation layers, output stream and meta data.

Parameters:
inputStream the underlying input stream with linguistic annotations in TCF format.
layersToRead the annotation layers of TextCorpus that should be read into this TextCorpusStreamed.
outputStream the underlying output stream into which the annotations from the input stream and any new created annotations will be written (in TCF format).
metaDataToAdd meta data to be added to the output TCF.
Throws:
WLFormatException if an error in input format or an I/O error occurs.
    public TextCorpusStreamed(InputStream inputStream,
            EnumSet<TextCorpusLayerTaglayersToReadOutputStream outputStream,
            List<MetaDataItemmetaDataToAdd)
            throws WLFormatException {
        super("unknown");
        getLayersToReadWithDependencies(layersToRead);
        try {
            initializeReaderAndWriter(inputStreamoutputStreamfalse);
            addMetadata(metaDataToAdd);
            process();
        } catch (WLFormatException e) {
            .close();
            throw e;
        }
    }
    private void initializeReaderAndWriter(InputStream inputStreamOutputStream outputStreamboolean outputAsXmlFragmentthrows WLFormatException {
        if (inputStream != null) {
            try {
                XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
                 = xmlInputFactory.createXMLEventReader(inputStream"UTF-8");
            } catch (XMLStreamException e) {
                throw new WLFormatException(e.getMessage(), e);
            }
        }
        if (outputStream != null) {
            try {
                XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
                 = xmlOutputFactory.createXMLEventWriter(outputStream"UTF-8");
            } catch (XMLStreamException e) {
                throw new WLFormatException(e.getMessage(), e);
            }
        }
        .setOutputAsXmlFragment(outputAsXmlFragment);
    }
    private void addMetadata(List<MetaDataItemmetaDataToAddthrows WLFormatException {
        try {
            marshall(metaDataToAdd);
            // rewrite metadata end element
            XMLEvent event = .nextEvent();
            .add(event);
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
    }
    private void process() throws WLFormatException {
        try {
            // process TextCorpus start element
            XMLEvent event = .nextEvent();
            super. = event.asStartElement().getAttributeByName(new QName("lang")).getValue();
            // add processed TextCorpus start back
            .add(event);
            // create TextCorpus object
            // read layers requested stopping before TextCorpus end element
            processLayers();
            super.connectLayers();
            // if no writing requested finish reading the document
            if ( == null) {
                .readWriteToTheEnd();
            }
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
        if (.size() != .size()) {
            .removeAll();
            throw new WLFormatException("Following layers could not be read: " + .toString());
        }
    }
    private void processLayers() throws WLFormatException {
        boolean textCorpusEnd = false;
        XMLEvent peekedEvent;
        try {
            peekedEvent = .peek();
            while (!textCorpusEnd && peekedEvent != null) {
                if (peekedEvent.getEventType() == .
                        && peekedEvent.asEndElement().getName().getLocalPart().equals(.)) {
                    textCorpusEnd = true;
                } else if (peekedEvent.getEventType() == .) {
                    processLayer();
                    peekedEvent = .peek();
                } else {
                    XMLEvent readEvent = .readEvent();
                    .add(readEvent);
                    peekedEvent = .peek();
                }
            }
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
        if (!textCorpusEnd) {
            throw new WLFormatException(. + " end tag not found");
        }
    }
    private void processLayer() throws WLFormatException {
        XMLEvent peekedEvent;
        try {
            peekedEvent = .peek();
            // now we assume that this event is start of a TextCorpus layer
            String tagName = peekedEvent.asStartElement().getName().getLocalPart();
            TextCorpusLayerTag layerTag = TextCorpusLayerTag.getFromXmlName(tagName);
            if (layerTag == null) { // unknown layer, just add it to output
                //readWriteElement(tagName);
                .readWriteElement(tagName);
            } else {
                if (this..contains(layerTag)) { // known layer, and is requested for reading
                    // add it to the output, but store its data
                    readLayerData(layerTag);
                } else { // known layer, and is not requested for reading
                    // just add it to the output
                    .readWriteElement(tagName);
                }
                .add(layerTag);
            }
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
    }
    private void readLayerData(TextCorpusLayerTag layerTagthrows WLFormatException {
        JAXBContext context;
        Unmarshaller unmarshaller;
        try {
            context = JAXBContext.newInstance(layerTag.getLayerClass());
            unmarshaller = context.createUnmarshaller();
            TextCorpusLayerStoredAbstract layer = (TextCorpusLayerStoredAbstractunmarshaller.unmarshal();
            super.[layerTag.ordinal()] = layer;
            marshall(super.[layerTag.ordinal()]);
        } catch (JAXBException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
        .add(layerTag);
    }
    private void marshall(TextCorpusLayer layerthrows WLFormatException {
        if ( == null) {
            return;
        }
        TextCorpusLayerTag layerTag = TextCorpusLayerTag.getFromClass(layer.getClass());
        if (.contains(layerTag)) {
            throw new WLFormatException(layerTag.getXmlName() + " cannot be marshalled: the document already contains this annotation layer.");
        }
        JAXBContext context;
        try {
            context = JAXBContext.newInstance(layer.getClass());
            Marshaller marshaller = context.createMarshaller();
            marshaller.setProperty(.true);
            marshaller.setProperty(.true);
            marshaller.marshal(layer);
        } catch (JAXBException e) {
            throw new WLFormatException(e.getMessage(), e);
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
    }
    private void marshall(List<MetaDataItemmetaDataToAddthrows WLFormatException {
        if ( == null) {
            return;
        }
        JAXBContext context;
        try {
            context = JAXBContext.newInstance(MetaDataItem.class);
            Marshaller marshaller = context.createMarshaller();
            marshaller.setProperty(.true);
            marshaller.setProperty(.true);
            for (MetaDataItem mdi : metaDataToAdd) {
                marshaller.marshal(mdi);
            }
        } catch (JAXBException e) {
            throw new WLFormatException(e.getMessage(), e);
        } catch (XMLStreamException e) {
            throw new WLFormatException(e.getMessage(), e);
        }
    }

    
Closes the input and output streams associated with this object and releases any associated system resources. Before the streams are closed, all in-memory annotations of the TextCorpusStreamed and not-processed part of the input stream are written to the output stream. Therefore, it's important to call close() method, so that all the in-memory annotations are saved to the output stream. Once the TextCorpusStreamed has been closed, adding further annotations will have no effect on the output stream.

Throws:
WLFormatException if an error in input format or an I/O error occurs.
    @Override
    public void close() throws WLFormatException {
        if () {
            return;
        }
         = true;
        try {
            boolean[] layersRead = new boolean[super..length];
            for (TextCorpusLayerTag layerRead : ) {
                layersRead[layerRead.ordinal()] = true;
            }
            for (int i = 0; i < super..lengthi++) {
                // if it's a newly added layer
                if (super.[i] != null && !layersRead[i//&& !super.layersInOrder[i].isEmpty() 
                        ) {
                    marshall(super.[i]);
                }
            }
        } finally {
            .readWriteToTheEnd();
        }
    }
    private void getLayersToReadWithDependencies(EnumSet<TextCorpusLayerTaglayersToRead) {
        this. = EnumSet.copyOf(layersToRead);
        for (TextCorpusLayerTag tag : layersToRead) {
            this..addAll(tag.withDependentLayers());
        }
    }
New to GrepCode? Check out our FAQ X