Introduction

Processing time and memory are two primary resources that computer programs use and it is important to analyze the limits we have. Computers are super fast at making calculations compared to humans, but humans have much more "memory" than computers currently do. For example, computers can add two 100-digit numbers together much more quickly than any human possibly can. Standard computers have around 8GB of RAM and some higher end machines may have 16-32GB. Although hard-disks can store terabytes of memory, we use RAM (flash memory) when analyzing computer memory because it is much faster than disk storage. As an analogy, RAM can be thought of as grabbing an object in another room whereas disk memory is driving 20 min away to get that object.

Limits

There are many different methods of implementing different things but most of the time we care most about implementations that are the fastest and use the least amount of memory. Let's go through some basic benchmarks about computers that you should know. Adding two numbers takes a nanosecond (1 billionth of a second) for an average computer to process as of this date of writing. For practical purposes, we'll assume that the average computer program can hold up to 1GB of RAM or about 250 million 32 bit integers.

Memory Operations (per second)
1 GB (250,000,000 ints) 1,000,000,000 operations

Big O Notation

When we design algorithms and data structures, we are concerned with their runtime and memory complexities. The runtime complexity is the number of operations required for a program the run. We will consider reading a unit of data, writing a unit of data, simple arithmetic and logic operations to be a single operation. For example, if we had a program that read two numbers, added them together and stored them the result, it would take 2 operations to read the two numbers, 1 operation to add the numbers and 1 more operation to store the result.The size of the inputs determines the space and runtime complexities of algorithms and data structures. The memory complexity is the additional amount of data required by an algorithm or data structure. For example, if you had an array of 10 integers, the memory complexity would be 10 integers.

When we compare how efficient algorithms or data structures are to another, we want to be able to describe them such that they can be compared by a quantifiable amount. We use a notation called Big O notation to do this. We can use it to describe the runtime and memory complexities of algorithms and data structures. The complexities of an algorithm or data structure will depend on the size of its inputs. For example, if we have a program that calculates the sum of N integers, the runtime would take much longer if N was 1000 versus if N was 10. Thus we want to determine how long a program will take to run or how much memory it will use relative to the size of the inputs. The Big O notation takes the largest factor of an input to compare computation times / memory usage. When we take the largest factor, we ignore smaller factors, coefficients and constants because they become insignificant at very large values. For example: O(3n2 + 12n + 20) is simply O(n2) because it is the largest factor.

Here is a list of common Big O notations based on complexity:

Big O Limit of N for 1 second
O(1) Constant Time runtime independent of N
O(log N) Sublinear time a very big number
O(N) Linear Time 100,000,000
O(N log N) 5,000,000
O(N2) Quadratic Time 10000
O(N3) Cubic Time 450
O(2N) Exponential Time 27
O(N!) Factorial Time 11

Keep in mind that in a couple of years, this chart will be outdated as technology improves.

Runtime and Memory Analysis

In a program we are usually concerned with two main factors: runtime and memory. Runtime complexity is the amount of time it takes for a program to run. Memory complexity is usually used as the additional amount of memory other than the input that a program takes. Note that if you are able to stream your input in your program (read in chunks at a time), then the input memory footprint will be reduced to your stream size.


Example

Suppose we have a function that has an array of size N as an input that returns the maximum element.

// Assume that array has a positive length.
int findMax(int[] array) {
  int maxVal = array[0]; //O(1) memory to store max element, O(1) time for assignment.
  for(int i = 1; i < array.length; i++) { //O(n) loop runtime.
    if (maxVal < array[i]) { //O(1) to compare.
      maxVal = array[i]; //O(1) to assign new value.
    }
  }
  return maxVal;
}

Memory: The array takes O(N) memory, and storing the max value is O(1) more memory. However, usually when analyzing programs, we ignore the input memory sizes and take into account additional memory that is required to produce the output. So the memory footprint of the function is O(1).

Time: The first assignment of maxVal takes O(1). The loop runs N-1 times and of each of those N-1 times it checks if the current array element is greater than the current max which takes O(1). If it is greater, then we reassign maxVal which is O(1). When analyzing time complexity, we usually take worst case so we have: O(1+(N-1)(2)) and this simplifies to O(N) since it is the largest factor.


Example

Suppose we have a function that sets all the values of a 2D NxN array to 0.

void zeroArrays(int[][] grid) {
  //O(n) to loop through rows.
  for (int i = 0; i < grid.length; i++) {
    //O(n) loop through columns.
    for (int j = 0; j < grid[i].length; j++) {
      //O(1) to set element in 2D array.
      grid[i][j] = 0;
    }
  }
}

There are NxN cells we need to set to 0, so we need O(n2) operations. We can also think of it as another way. There are two nested loops and each loop is O(n) so we multiply them and O(n) x O(n) = O(n2). Generally, when we have nested loops, we multiply their complexities. Since we do not use any extra memory in this function, the space complexity is O(1).


Example

Suppose we have another function that prints out the binary number backwards for each number from 1 to N. For example, the binary number backwards for 25 is 10011.

void binaryNumbers(int n) {
  // O(1) memory to store number.
  int bin;
  // O(n) loop from 1 to n.
  for(int x = 1; x < n; x++) {
    bin = x;
    // O(log n) to print binary of n.
    while(bin > 0) {
      // O(1) runtime to print one binary digit.
      System.out.print(bin % 2);
      // O(1) runtime to perform integer division.
      bin = bin / 2; 
    }
  // O(1) to print new line.
  System.out.println();
  }
}

For each number x, it will take O(ln2 x) to convert the number to binary because we dividing by 2. Since O(ln2 x) = O( (ln x)/ (ln 2) ), and the denominator is a constant, we can rewrite this as O(ln x). We can also rewrite this as O(log x) which most people use when referring to logarithmic complexities. We do this operation N times so the total runtime complexity is: O(n log n). We need O(1) additional memory to store the variable bin and x.


Example

Suppose we have a function that takes in an array that contains N integers from 1 to M and prints out one of the modes (number with most occurrences). We can do this in two different ways.

Our first approach is to loop through the array and count the number of occurrences that number appears in the rest of array.

public int mode1(int[] arr, int m) {
  int n = arr.length;
  int maxMode = 0;
  int maxOccurences = 0;
  
  // O(n) to loop through each number in array.
  for (int i = 0; i < n; i++) {
    int occurances = 0;
    // O(n) to count number of occurnaces of that number in array.
    for (int j = 0; j < n; j++) {
      if (arr[j] == arr[i]) {
        occurances++;
      }
    }
    // O(1) to update mode number of occurrences is greater. 
    if (occurances > maxOccurences) {
      maxOccurences = occurances;
      maxMode = arr[i];
    }
  }
  return maxMode;
}

We first have to loop through each number in the array which takes O(n). Inside that loop, we have another array that counts the number of occurrences which is O(n) again. Thus the total runtime of the method is O(n2). We store only the max mode and the number of occurrences for the max mode so the memory complexity is O(1).

In our second approach, we will keep an array that stores the number of occurrences of all the numbers from 1 to M. We will go through the array and increment the occurrences of a number when we see it. Then we will find the number that has the maximum number of occurrences.

public int mode2(int[] arr, int m) {
  int n = arr.length;
  // O(m) memory to store occurrences of each number. Let occurrences[i] be 
  // number of times i appears in arr.
  int[] occurrences = new int[m+1];
  // O(m) runtime to set number of occurrences to 0 for each number.
  for (int i = 0; i <= m; i++) {
    occurrences[i] = 0;
  }
  // O(n) runtime to loop through array and increment number of occurrences.
  for (int j = 0; j < n; j++) {
    occurrences[arr[j]]++;
  }
  int mode = 0;
  // O(m) runtime to loop through occurrences.
  for (int i = 0; i <= m; i++) {
    // O(1) runtime to compare update mode if occurrences of current number is bigger.
    if (occurrences[i] > occurrences[mode]) {
      mode = i;
    }
  }
  return mode;
}

We first initialize the array of occurrences to 0 which takes O(m). We then loop through the array of numbers which takes O(n). Finally, we loop through the array of occurrences to find the mode which takes O(m). Thus the final runtime is O(n + m). We need to store the mode which takes O(1) and the array of occurrences which takes O(m). Thus the memory footprint is O(m).

Note that although the two approaches do the same thing, they have different complexities for runtime and space. If m was large and the cost of storing m integers was too expensive, then we would use the first approach at the cost of having O(n2) quadratic runtime. However, if m was small and we wanted a fast solution, then we would use the second approach at the cost of storing extra O(m) integers. Problems can have solutions that optimize for speed and other solutions that optimize for space. Sometimes space can be more expensive than speed and vice versa.

Exercises

  1. Determine the runtime and memory complexities of finding the average of an integer array.
  2. Determine the runtime and memory complexities of adding two NxN matrices together.
  3. Determine the runtime complexity of guessing a N digit numeric password.
  4. Determine the runtime complexity of summing the digits of a number.
  5. Determine the memory complexity of counting the occurrences of letters in a book.
  6. Determine the memory complexity of counting the occurrences of words in a book.