Coding for SEO 101: Understanding source code, compressed code and compiled code
Why can’t computers read human language? Why do some source files look like crazy character noise? Are computer programmers magicians?
You may know that source code is almost always just text files written using a computer language ‘syntax,’ which amounts to a set of instructions for the computer.
The common language that both humans and computers understand is mathematics. If you don’t initially think of math as a language, then remember that morse code transmits human language using a syntax that could easily be described in terms of mathematics.
Computers understand mathematical systems.
Why do some source files look like crazy character noise? Good programmers write source code that looks logically organized. It just gets transformed through processing. If you open a file that you can’t immediately read, you may be looking at compressed data, binary code, or source code that has been reduced or ‘minified’ by removing unnecessary white space.
Minified Source Code
This last case is probably what you see most often when you use the ‘View Source’ feature of your web browser. Think about this article and its text. Think about how it would look if we removed all the spaces between all the words. You could probably read it, but there would be troublesome spots and it would take much longer. Spaces are pretty necessary. A minifying procedure wouldn’t remove necessary space.
What if the style guide for this sentence requires double-space? Two spaces between words in article writing are not an absolute necessity but they make it easier for human readers. In these cases, a minifying process for efficient transmission across great distances could remove one redundant space in order to reduce the total file size.
Programmers Space Things Out
Double-spaced text is easier to read and computer programmers use a lot of extra white space for precisely that reason. Computer source code is harder to read than plain text, and therefore we use far more whitespace than even a double-spaced article would. Whitespace is how programmers structure Python code, for example.
Sometimes we use 2, 4, or 8 spaces in a row to simulate tab characters, and sometimes we use the tab characters themselves. We use carriage-return ‘characters’ (the notion of a carriage-return is from our old typewriter days). The computer simulates carriage returns which allows us to use the ‘return’ character (or newline) as whitespace in order to organize our code and make it easier to read.
How we organize our code with white space is usually dictated by some sort of personal, traditional, or company-required logic so that humans can read our instructions before they get compressed or get translated into machine code by a compiler.
These alternate forms of text are much harder, or even impossible, to read. When text is minified, you can usually figure out what simple code is doing, even though it’s more difficult to read when extra whitespace has been removed. When you’re looking at a text file that has been compressed, however, it is completely obfuscated.
File Compression
Compression nearly suffices as a sort of crude (not secure) cryptography. Compression algorithms use mathematical formulas along with a table (or crosswalk/dictionary) to substitute for characters and their positions throughout an original text.
When you decompress a file, the computer uses that table in combination with the generated formulas in reverse to restore an original text.
Compiled Source Code
Ultimately, when we’re writing computer programs, we’re writing programs that need to be processed by a CPU. When we write (client-side) JavaScript, our instructions need to get ‘interpreted’ by the browser and translated into machine code for the user’s CPU to process. That’s why JavaScript can crash your browser (and why Google measures the CPU load of the scripts you write).
Compiled source code starts as text files. Text is then transformed into machine code instructions by a corresponding compiler for performance boosts over code that is otherwise interpreted at run time. When you open machine code binaries, you’re going to have a hard time understanding any of it. That’s because it’s streamlined code for computer processing and is not in a form that any of us should open.
In summary, there are three ways you might see computer code noise that looks totally arcane:
- Minified source code.
- Compressed files (source code or other media).
- Compiled machine code (binaries or possibly assembly language).
Of all these, only assembly language is anything a computer programmer might write. If you’re writing code in assembly language, then you’re probably a magician. At some point in your journey you may end up writing something like Assembly or Perl that, to the ordinary eye, still looks like a bunch of crazy noise.