email me: bhakta_f@yahoo.com
Translating Machine Instructions across platforms
The jewel of High Level Languages is it's portability. Once you've coded a program, you can port it to another operating
system or CPU by using a different complilier (for CPU) and change of libaries (for OS api). But what if we could translte
the asmembly code of an x86 into motorola's 68k, or even a RISC CPU like ARM.
What would it take? The translater must pay attention to
1) CPU mode -like real mode/protected mode on x86 - like User/Supervisor on 68k
2) Memory Adressing Styles: data access and manipulation must be consistent
3) Non-translateable instructions: instruction block must be Transposed
It will also require the coder to code in a certain style. Why? The 3 criteria above present an immense task. I just might
be TOO HARD to translate a complex program written intimately for a certain CPU, into the assembly of another. But if the
coder wrote his code that made the translation easier, then it would be possible.
Perhaps the translter could do one procedure at a time, then a remaining problem would be to coordinate those procedures.
Criteria #1 is a hard one.
Email me with thoughts
Criteria #2 is quite something.
Email me with thoughts
Criteria #3 will be a cool one to tackle. Imagine CISC to RISC translations. one CISC instrction would become many
RISC instructions. Let's image that a MOVSB (this moves byte at DS:SI into ES:DI and increments SI and DI). First off,
other CPUs don't even have an SI, DI, DS, or ES. So how to chose which pointer registars to use?
Perhaps this freedom to choose is a plus. Don't C-compilers have to make these same choices?
Those complier writers are badasses and I think they're gear to make this thing. I know I'm down. I just need a few more
years of education.
Guess what I found on the net. BINARY TRANSLATORS!!
The ones I've found are dynamic tanslators which read an executable, and translate it's code on the fly.
My aim is to have the Translator work with source code.
Then the coder can review the output and make optimizations (even by changing translator parameters).
Directives can be placed in source code to specify how something should be transalted, including which registers to perferrably use.
http://www.eecs.harvard.edu/~nr/toolkit/ - The New Jersey Machine-Code Toolkit
Real cool. Provides code that reads descriptors of Instruction Sets.
Used to extract the MEANING of what an instruction does. Used for assembliers, dessemblers, etc.
SLED: uses the toolkit, and is a way to save programs in the toolkit's native form.
It's native form is Opcode = meaning. Operand = data. At run time,he instructions are translated, and their Opcodes are laid out with them: All at the correct locations.
Normal Program loaders (EXE for example) usually have to do some Relocation, which involves making adjustments to Addressing Values within machine Instructions.
SLED does this to a higher degree and is platform independant (once the SLED loader known about the platform)
This is what I gathered, I could over simplifying.