Doctor of Philosophy (PhD)


Computer Science

Document Type



Long-running programs, e.g., in high-performance computing, need to

write periodic checkpoints of their execution state to disk to allow

them to recover from node failure. Manually adding checkpointing code

to an application, however, is very tedious. The mechanisms needed

for writing the execution state of a program to disk and restoring it

are similar to those needed for migrating a running thread or a mobile

object. We have extended a source-to-source translation scheme that

allows the migration of mobile Java objects with running threads to

make it more general and allow it to be used for automated

checkpointing. Our translation scheme allows serializable threads to

be written to disk or migrated with a mobile agent to a remote

machine. The translator generates code that maintains a serializable

run-time stack for each thread as a Java data structure. While this

results in significant run-time overhead, it allows the checkpointing

code to be generated automatically. We improved the locking mechanism

that is needed to protect the run-time stack as well as the translation

scheme. Our experimental results demonstrate an speedup of the

generated code over the original translator and show that the approach

is feasible in practice.



Committee Chair

Baumgartner, Gerald