Lexical Structure

The lexical structure of a programming language is the set of elementary rules that define what are the tokens or basic atoms of the program. It is the lowest level syntax of a language and specifies what is punctuation, reserved words, identifiers, constants and operators. Some of the basic rules for Java are:

  • Java is case sensitive.
  • Whitespace, tabs, and newline characters are ignored except when part of string constants. They can be added as needed for readability.
  • Single line comments begin with //
  • Multiline comments begin with /* and end with */
  • Documentary comments begin with /** and end with **/
  • Statements terminate in semicolons! Make sure to always terminate statements with a semicolon.
  • Commas are used to separate words in a list
  • Round brackets are used for operator precedence and argument lists.
  • Square brackets are used for arrays and square bracket notation.
  • Curly or brace brackets are used for blocks.
  • Keywords are reserved words that have special meanings within the language syntax.
  • Identifiers are names for constants, variables, functions, properties, methods and objects. The first character must be a letter, underscore or dollar sign. Following characters can also include digits. Letters are A to Z, a to z, and Unicode characters above hex 00C0. Java style begin object identifiers with a capital letter, uppercase constant ids and lowercase property, method and variable ids.
    Note: an identifier must NOT be any word on the Java Reserved Word List.

Literal Constants

Literal constants are values that do not change within a program. Numeric constants default to integer or double unless a suffix is appended. Note that a character can be represented by an ASCII equivalent. Some literal constant types are:

  • Boolean: true, false
  • Integer: 5, 0xFF (hexadecimal)
              073 [leading zero] (octal)
  • Long Integer: 5l, 0xFFlD (hex)
  • Double Precision: 2.543, 8e12, -4.1E-6
  • Floating Point: 2.543f, 8e12f
  • Char (ie. character): 'c', '\f', 65
  • String: "Fred", "Fred and Ethel"

Warning: Do not use leading zeros to format integers because you may get an unintended octal meaning. Use spaces instead!

Escape Characters

Escape (aka backslash) characters are used inside literal strings to allow print formatting as well as preventing certain characters from causing interpretation errors. Each escape character starts with a backslash. The available character sequences are:

SeqName SeqName
\bbackspace \fformfeed
\thorizontal tab \"double quote
\nnewline \'single quote
\rcarriage return \\backslash
\###Latin encoded character \uHHHHUnicode encoded character

Syntax Notation

Throughout this set of tutorials Java language constructs will be given with complete details of their syntax or makeup. This syntax will be shown in blue and follow these rules:

  • Keywords will be quoted like "this". The quotes are not used when you type the word.
  • Identifiers are written unquoted.
  • Square brackets indicate an optional entry. The square bracket is not typed as part of the line.
  • The vertical bar indicates alternatives. It is not typed.
  • Ellipses (ie ... ) indicates more of the same. It is not typed.

For example, the specification of a class has the following syntax:

["public"] ["abstract"|"final"]"class" class_name ["extends" object_name]
"{"
// properties declarations
// behavior declarations
"}"

The meaning of the reserved words will be explained as you work through the tutorials. Essentially syntax defines the 'rules' which the compiler will use to check your programs for compilation. Whether they execute correctly is a whole different issue ;-[

Variables

Variables are temporary data holders. Variable names are identifiers. Variables are declared with a datatype. Java is a strongly typed language as the variable can only take a value that matches its declared type. This enforces good programming practice and reduces errors considerably. When variables are declared they may or may not be assigned or take on a value. Examples of each of the primitive datatypes available in Java are as follows:

byte x,y,z;             /* 08bits long, not assigned, multiple declaration */
short numberOfChildren; /* 16bits long */
int counter; /* 32bits long */
long WorldPopulation; /* 64bits long */

float pi; /* 32bit single precision */
double avagadroNumber; /* 64bit double precision */
boolean signal_flag; /* true or false only */
char c; /* 16bit single Unicode character */

Variables can be made constant or read only by prepending the modifier final to the declaration. Java convention uses all uppercase for final variable names.

Arrays

Arrays allow you to store several related values in the same variable (eg. a set of marks). Note that in a declaration the brackets are left blank. Declaration of an array only forms a prototype or specification for the array. Multi-dimensional arrays are considered to be arrays of arrays (of arrays...).

int i[];        /* one dimension array */
char c[][]; /* two dimension array */
float [] f; /* geek speak way */
Bowl shelfA[]; /* array of objects */
String flintstones[] = {"Fred", "Wilma", "Pebbles"}; //init values as well

Array memory allocation is assigned explicitly with the new operator (discussed later) and requires known static bounds (ie. number of elements).

Operators

Operators are actions that manipulate, combine or compare variables. They fall into several categories as follows:

Assignment:           = += -= *= \= %=
Arithmetic: + - * / % (modulus) ++ (increment) -- (decrement)
String Concatenation: +
Comparison: == != > >= < <=
Boolean Comparison: ! & | ^ && || (&& are short circuit ops)
Bitwise Comparison: ~ & | ^ (xor) << >> >>>
Bitwise Assignment: &= |= ^= (xor) <<= >>= >>>=
Conditional: ? (eg (expr1) ? expr2 : expr3 )
Object Creation: new (eg int a[] = new int[10];)
Casting of Type: (var_type)

Note: Primative array objects are created and allocated memory based on their static array size and type by using the new reserved word. The number of items in an array can then be determined by accessing its length property. For a two dimensional array named M, M.length gives the number of elements in its first dimension and M[0].length gives the number of elements in its second dimension. Some examples of using new for arrays are:

Array1 = new int[5]; //previously declared, now created!
int markArray[] = new int[9]; //declaration and allocation at same time
int grades[] = new int[maxMarks]; //maxMarks must be a positive integer

Note: Since Java is a strongly typed language, required changes in data type must be explicitly done with a cast operation. For example a = (int) b; (assumes a is of type int and b is type char).

Expressions, Conditions and Statements

Expressions are phrases used to combine values and/or operands using operators to create a new value. One example of an expression is 5 + 3.

Conditions are phrases that can be evaluated to a boolean value such as a comparison operator between two constants, variables or expressions used to test a dynamic situation. Examples are x <= 5 and bool_flag != true.

Statements are complete program instructions made from constants, variables, expressions and conditions. Statements always end with a semicolon. A program contains one or more statements.

Assignment statements use an assignment operator to store a value or the result of an expression in a variable. Memory allocation is done at the time of assignment. Primitive datatypes have static allocation with size determined by their type. Simple examples include first_name = "Fred"; and count +=;

Variables may be assigned an initial value when declared. This is considered good programming practice. Examples are boolean fileOpenFlag = true;, int finalScore = null; and final float PI = 3.14159;

Array variables can use a shortcut method of initial value assignment. Examples are:

int v[] = {2,4,20}; //declaration/creation/assignment in one step!
int m[][] = {{2,3,4}, {4,5,6}, {1,1,1}}; // two dimensional array

Local variables must be assigned a value prior to use. There is no default assumption. Failure to initialize will cause a compiler error! Field variables (aka properties) have defaults but initialization is good programming practice.

Execution blocks are sets or lists of statements enclosed in curly brackets. Variables maintain their definition (or 'scope') until the end of the execution block that they are defined in. This is the reason why variable declaration and assignment can be a two step process.

Note: It is a good rule of thumb to declare variables in as nested a scope as possible to limit the chance of spurious assignment.

Beware: A variable's content can be hidden by redeclaration of its name within a nested execution block. At times this is convenient but beginning programmers should avoid reuse (ie. 'overload') of variable names.