Communication and Employability Skills – Assignment 3 – Excerpt

Software and Programming: A Report [P5] [P6]

When developing software, it's important to understand the different forms of programming languages, and the way in which instructions make their way down to CPU-level. There are many different programming languages and methods by which human-readable source code that developers typically write is converted to basic instructions a computer can perform. These methods and techniques, that teams of developers use to make the creation of software easier, are explored below.

Machine Code

At the lowest level of programming is machine code. This is held in binary files, rather than text files that can easily be read by a variety of programs, and is most commonly created using a piece of software called a compiler. Machine code is very impractical to write directly, and for this reason, it very rarely is. Because machine code consists of instructions for the CPU, it can only be executed on CPUs of a particular architecture. For example, machine code written or compiled for an ARM chip – often found in smartphones and other hand-held devices – cannot run on a desktop x86-64 processor. The most common architectures in use today are x86 (desktop 32-bit), x86-64 (desktop 64-bit), ARM (mobile 32-bit) and recently ARMv8-A (mobile 64-bit). Many believe that within around five years, 32-bit chips will have become a rarity, due to the limitations imposed on developers that must support them. As an example, in a 32-bit address space, up to only 4GB of RAM can be addressed, meaning no more than that can be used by the system. In contrast, an AMD 64-bit system can address up to 256 tebibytes (at least) and up over 18 exbibytes in theory.

Programs compiled into machine code are often referred to as binaries or executables. On Microsoft Windows, they most often have the file extension .exe, but some installers are also found with the .msi extension. In OS X, Linux and most other Unix-like operating systems, executables usually have no extension, leading to less confusion when running programs as commands in a terminal, but occasionally .bin files are found. Executables are files that the machine is able to run immediately, and which the user has sufficient permissions to run.

Assembly

Most commonly used to write low-level software for new architectures or integrated systems (e.g., those in home appliances), assembly is a form of programming language centred around basic instructions, in a similar way to machine code. Assembly language is converted to machine code for execution, and this process is quicker than for higher-level programming languages. Assembly language converts to machine code at a 1:1 scale for the most part, meaning that like machine code, it is not portable between different CPU architectures. When a new architecture is introduced, the first programs, likely including a C compiler, will often be written in assembly. Once a compiler or interpreter for a more advanced and high-level language has been ported, it becomes much easier to port other popular pieces of software to the system.

Writing in assembly gives the programmer an unprecedented level of control over the system, and allows them to create incredibly efficient and well optimised programs. That said, doing so requires skill and familiarity with the hardware of the system. It's important for the efficiency of software to free the memory assigned to the program, otherwise causing memory leaks. This memory management is undertaken automatically by other languages, making development easier, but must be explicitly programmed when working with assembly.

Compilers and Interpreters

There are three major ways in which software can be developed and distributed. Compiled, semi-compiled and interpreted languages are detailed below.

Compiled Languages

A diagram illustrating how a program compiled for a particular CPU architecture (x86 in this case) will not be compatible and will not run on another architecture (ARM in this case).
Image by myself; see license

Offering a balance of control and ease of use, high-level compiled programming languages are used to build some of the most complex pieces of software. C, namely, and derivatives including C++, Objective-C and C# are used to write core operating system components, mission-critical software and advanced desktop applications. As an example, the Linux kernel, Microsoft Windows and OS X are all written almost exclusively in C and C++. Adobe Photoshop is also written in C++.

As all code must be machine code before it can be executed, compiled languages are fed through a conversion process beforehand. As machine code is unique to a CPU architecture, compiled code can only be run on the machine it was compiled on, or another with a similar CPU. Because the source code of compiled languages is converted to machine code as part of the development process, the code can execute on the CPU quickly, and needn't be compiled or interpreted when it's called upon. The process of compilation also means that repeated blocks of code and other common programming patterns can be optimised so that they consume less disk space and run more efficiently.

The trade-off one must consider, alongside the performance and control benefit, when using these languages, is that once code has been compiled, it cannot be run on another CPu architecture, as the binary will use the instruction set of the CPU it was compiled for. This means that it can be difficult to distribute software for both 32-bit and 64-bit computers, and is the reason that software installers and downloads are often available in both formats. Unlike with assembly code, however, source code written in C, for example, is largely portable to other platforms. A simple Hello, World program, shown below, could successfully be compiled on 64-bit Linux, 64-bit OS X, or even 32-bit Windows. This means that the majority of Photoshop's source code, for example, is the same for both Windows and OS X.

#include <stdio.h>              // imports standard input/output code from the C library

int main(void) {                // creates the main function, which is executed at runtime

    printf("Hello, World!\n");  // output the text "Hello, World!" to the shell
    return 0;                   // return a successful status code, indicating there were no errors

}

When distributing software written in compiled languages, it's important that users are able to easily install and run the software. This means that for true cross-platform support, one must compile the software on 32- and 64-bit Windows, OS X and Linux machines, a process known as ahead-of-time (AOT) compilation. If the developer wishes to support other platforms as well, such as Android, the software must be compiled for such as well. This process is time consuming, and the software may well need to be adapted in places, as parts of most compiled languages are not entirely portable. This work is not required – at least not to such an extent – with semi-compiled or interpreted languages.

Just-In-Time and Semi- Compiled Languages

A diagram illustrating how an unchanged Java executable (.jar file) can be run on multiple different architectures, granted a Java environment or JVM is available.
Image by myself; see license

Similar to compiled languages from the perspective of a developer, semi-compiled languages offer more portability across different platforms, while being slightly more limited in terms of performance. The most popular example of a semi-compiled language is Java, but C# is also semi-compiled. When a program is written in one of these languages, it can only be run once it has been heavily optimised for the interpreter. Although not compiled, the output of a semi-compiler is a binary file. This binary can be executed, granted the required environment is available. For Java, a JVM (Java Virtual Machine) is needed, which takes the bytecode created by the developer's IDE (integrated development environment; software writing application) and converts it in realtime to machine code. This is known as just-in-time, or JIT, compilation. Bytecode can be compiled and executed quicker than software written in interpreted languages can be run, but still doesn't provide the performance of pre-compiled machine code.

The portability is one of the main reasons developers choose to use semi-compiled languages. For Java in particular, a program written and effectively compiled for the JVM will run on any computer that has a specification-compliant virtual machine installed. This means that a program can be written; compiled to a degree that obscures the source code; distributed as a single package; and then executed on 32-bit and 64-bit CPUs of many different architectures. Oracle, the developer of Java since its acquisition of Sun Microsystems, provide JVM downloads for Windows, OS X, Linux and Solaris operating systems, making Java a viable choice for server, desktop and mobile computing. Hardware chips have also been developed that run Java bytecode natively, as well as being able to execute compiled C/C++. These chips offered to improve the performance of Java by up to twenty times, but this has become less and less relevant as the Oracle JVM has improved.

A JVM is considered one of the important pieces of software to port when new operating systems and CPU architectures are developed, as a fully compliant JVM will enable the use of thousands of pieces of software which have been written in Java. Good software being able to run on a platform is key to its success.

Interpreted Languages

A diagram illustrating how a single script written in an interpreted language, given the correct interpreter is available and built for the correct architecture. The rounder interface between the script and interpreter (relative to that between the Java program and VM) symbolises the typically simpler features of interpreted languages.
Image by myself; see license

The highest-level programming languages often fit into the interpreted category. These languages are not compiled to any degree, and include a lot of scripting languages. In order to be executed, they depend on an interpreter being available, which reads the script, identifies tasks to be completed, and executes them. An interpreter could be written in a compiled language (which is preferable and most likely), a semi-compiled language or even another interpreted language.

Common examples of interpreted languages include Ruby, Python, JavaScript, ActionScript and Perl. That said, JavaScript is compiled before execution by engines such as V8 (used in Google Chrome and by node.js), and projects and tools exist or have been started that aim to compile Python and Ruby scripts to improve performance and remove dependency on an interpreter being installed.

Markup Languages

There are a handful of languages often mistaken for programming languages, which are in fact markup or key-value data storage formats. Popular examples of these languages include HTML, XML, YML, JSON and CSS. The distinguishing attribute of a markup language versus a programming language is that the former is not executed like the latter. No interpreter, VM or compiler reads the file from top to bottom, converting instructions it encounters to machine code. Instead, a parser picks pieces of information out of the file, and builds a data model from it. In the case of SGML-based languages – particularly HTML – this is in the form of a document tree, containing element nodes and text nodes. In the case of XML, YML and JSON, the result is an object containing the data represented in the markup format. In the case of CSS, the result is styling applied to elements of another markup format's elements, most commonly those of an HTML or SVG document.

In key and value pair formats, such as JSON, the value of a variable can often be stored in one of many formats. Common variable data types found in most programming languages are usually available, including integers, strings and arrays. The majority of formats also allow dictionaries, which are arrays of information with string identifiers, essentially what a key-value pair data storage format is.

Building Applications

Understanding and becoming fluent in a programming language – or even several – is not all that it takes to become a productive developer. It simply isn't feasible for software to be written from the ground up with each new project, and for this reason developers use libraries, SDKs and frameworks built by others to save themselves time. Building oneself a customised and tailored development environment is also invaluable, and there are a number of ways in which professional developers achieve this. The following paragraphs will explain the tools and resources developers commonly use, with development for Apple's iOS mobile platform used as an example.

SDKs

Software Development Kits, or SDKs, are packages of resources, tools and templates released by developers of software and hardware platforms, enabling others to create applications for the platform. For example, the iOS SDK can be downloaded from Apple's Developer Centre, and makes the development of apps for iOS considerably easier. With each new release of the operating system, a corresponding SDK is made available, including support for the new features and improvements made as part of the update. Some of the most common features present in an SDK are detailed in the paragraphs below.

Compilers

Used to convert the source code written by the developer to executable machine code, the compiler is essential in any high-level application development workflow. As different platforms each have their own intricacies and caveats, the compiler packages are often adjusted and optimised for the system they are intended to compile for, meaning that the compilers already present on the developers computer may not be sufficient. Interaction with the compiler is most often handled by the IDE, leaving the developer to simply press a button in order to see their application running on a device.

Debuggers

Allowing the developer to easily find issues in compiled code, a debugger will report detailed information about any errors that occur during execution. The developer can also write particular instructions into their software that will send messages to the debugger output, allowing them to better understand where an error may be occurring.

IDEs

A screenshot of CLion, an early-access C/C++ IDE, running on Linux.
Image by myself; see license.

Integrated development environments are software packages designed to take care of as much of development as possible, without having a negative impact on the quality of the resulting software. An IDE makes the process of developing large-scale applications a lot easier. By being able to focus on particular elements of a software project, isolate modules, see hints on how to fix erroneous code and manage different versions of the software over time, IDEs aim to make development more productive in all areas. In the case of Xcode for OS X and particularly iOS, the IDE is well integrated with the rest of the SDK, meaning that development is more streamlined and developers needn't waste time integrating several pieces of software into their workspace.

Templates and examples

As new features are made available to developers, it's important that they are shown in an effective way how such new content can be used. Examples and templates, as well as code to perform common tasks, is often included as part of an SDK.

Libraries

Libraries specific to the platform allow developers to focus on the parts of their software applications that are unique. The inclusion of libraries built and optimised for certain devices leads to easier and quicker development; standards held across great ranges of software; and a more consistent end-user experience.

Emulators

Enabling the developer to run apps built for mobile devices, in particular, emulators mean that the developer is not required to repeatedly look to their phone or tablet during testing. It also means that a developer is still able to work without the device, and potentially be able to access more in-depth debugging and execution analysis that comes with virtualised and emulated environments.

Publishing tools

For the time applications are complete and ready to be submitted to the App Store, Play Store, Windows Store, or similar, an IDE will often provide a simple process for the developer to take. Adding applications icons, descriptions and screenshots will all be covered, allowing the developer to enter the application approval queue as quickly as possible, and place their software in customers' hands promptly.