package com.arjuna.ats.internal.arjuna.objectstore;
This is the transaction log implementation. It is optimised for the typical
mode of the coordinator: write-once and never read or update. Reads or
updates occur only in the case of failures, which hopefully are rare; hence
the reason we optimise for the non-failure case. This does mean that recovery
may take longer than when using other log implementations.
There are several implementations of this approach, some of which perform better
on one operating system than another. We may put them in to the source eventually
and make it clear for which OS combination they are best suited. However, this
implementation works well on all operating systems we have tested so is a good
default.
- Author(s):
- Mark Little (mark@arjuna.com)
- Version:
- $Id: LogStore.java,v 1.4 2004/11/11 12:22:21 nmcl Exp $
- Since:
- JTS 1.0.
Algorithm used: During normal execution of a transaction, we only ever write
and then remove the log entry; we never read it. Therefore, optimise for that
situation. The log continually builds up in size until a maximum capacity is
reached and in which case, we switch to another log. Meanwhile, the recovery
manager periodically runs through completed logs and removes those that are
no longer needed, truncating those that require recovery (which cannot
complete at this time). When writing the initial log entry, we write a
redzone marker, followed by the entry size and then the actual entry. Since a
log is never shared between VMs, we only need to synchronize between the
threads within a given VM: the recovery manager never works on a log that is
being used by another VM anyway. The end of a log is marked with a
termination record. Obviously if a crash occurs, then no such record will
have been written and in which case, the recovery manager determines that the
log is no longer required via timeout heuristics.
The implementation normally writes removal records to the end of the log
when an entry is deleted. This can be disabled and in which case we end up in
the same situation as if a failure occurred as the removal record was being written
or a crash happened before remove_committed could succeed on any of the other
file-based object store implementations: we potentially try to commit transactions
that have terminated (either committed or rolled back). In which case we ...
(i) call commit on a state that has already been committed and fail to do so. Will
eventually move the log record elsewhere and the administrator can deal with it.
(ii) call commit on a state that has already been rolled back and again fail to do so.
Will eventually move the log record elsewhere as above.
If we do not write removal records then we would end up in a situation of trying to
commit every log instance multiple times. As such we always try to write records but
do them either synchronously or asynchronously (periodically). Of course there's still
the chance that a failure will cause problems in both sync and async cases, but we
have reduced the probability as well as the number of such problem items. The periodicity
of this is the same as pruning the log, i.e., the same thread does both jobs.
By default we synchronously add the removal marker to the log, i.e., when remove_committed
returns, the marker entry has been appended to the log.
NOTE: there is a race where we terminate the log instance and yet transactions may
still be using it. This happens with other object store implementations too. However, in
this case we could end up with a log that should be deleted because all of the entries
have gone. We try to fix this up through allObjUids. If recovery works correctly then
these states will eventually get deleted.
TODO
When truncating logs we write a shadow and then overwrite the original with the shadow
when finished. If there is a crash we could end up with the shadow as well as the
original. Recovery could tidy this up for us - as long as we have the original then
we can continue to recover - the shadow instance may be corrupted so best to ignore
it and simply delete it. But we would need to ensure that we didn't delete a shadow that
is actually still active.
Also we do not use a primary and backup log approach. Whenever we need a new log instance we
create one. This means that there could be many logs being used at the same time, which could
be a problem for disk space (unlikely these days, but possible). If this approach gets to
be an issue then we can limit the number of log instances created.
Represents a specific log instance.
public final void resize (long size)
private enum Status {ACTIVE, PASSIVE, TERMINATED};
Poke the thread into doing some work even if it normally
would not.
public static final long LOG_SIZE = 10 * 1024 * 1024;
Normally returns the current state of the log entry. However, this is
never called during normal (non-recovery) execution. Therefore, the
overhead of having to scan all of the logs (if it's not one we're using)
is minimal.
Commit a previous write_state operation which was made with the SHADOW
StateType argument. This is achieved by renaming the shadow and removing
the hidden version.
tsLogger.logger.trace("LogStore.reveal_state(" + u + ", " + tn + ")");
tsLogger.logger.trace("LogStore.read_uncommitted(" + u + ", " + tn + ")");
tsLogger.logger.trace("LogStore.remove_uncommitted(" + u + ", " + tn + ")");
tsLogger.logger.trace("LogStore.write_uncommitted(" + u + ", " + tn + ", " + s
This is a recovery-only method and should not be called during normal
execution. As such we need to load in all of the logs we can find that
aren't already loaded (or activated).
for (int i = 0; i < txs.size(); i++)
super(objectStoreEnvironmentBean);
Unlock and close the file. Note that if the unlock fails we set the
return value to false to indicate an error but rely on the close to
really do the unlock.
tsLogger.logger.trace("RandomAccessFile.unlockAndClose(" + fd + ", " + rf + ")");
boolean closedOk = unlock(fd);
write_state saves the ObjectState in a file named by the type and Uid of
the ObjectState. If the second argument is SHADOW, then the file name is
different so that a subsequent commit_state invocation will rename the
file.
We need to make sure that each entry is written to the next empty location
in the log even if there's already an entry for this tx.
tsLogger.logger.trace("ShadowingStore.write_state(" + objUid + ", " + tName
int imageSize = (int) state.length();
boolean setLength = !fd.exists();
buff.putInt(uidString.length);
ofile.seek(theLogEntry.offset);
"ShadowingStore::write_state() - write failed to sync for "
"ShadowingStore::write_state() - write failed to locate file "
"ShadowingStore::write_state() - write failed for "
"ShadowStore::write_state - "
Shouldn't be called during normal execution only during recovery.
if ((states == null) || (states.size() == 0))
for (int i = 0; i < states.size(); i++)
Does nothing except indicate that this thread is finished with the log on
behalf of this transaction.
protected boolean lock(File fd, int lmode, boolean create)
if ((objectStates != null) && (objectStates.size() > 0))
for (int i = 0; i < objectStates.size(); i++)
buff.putInt(uidString.length);