class: center, title-slide
# CSCI-UA 480: APS ## Algorithmic Problem Solving
## Fundamentals .author[ Instructor: Joanna Klukowska
] .license[ Copyright 2020 Joanna Klukowska. Unless noted otherwise all content is released under a
[Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
Background image by Stewart Weiss
] --- layout:true template: default name: section class: inverse, middle, center --- layout:true template: default name: challenge class: challenge --- layout:true template: default name: poll class: inverse, full-height, center, middle --- layout:true template: default name: breakout class: breakout --- layout:true template:default name:slide class: slide .bottom-left[© Joanna Klukowska. CC-BY-SA.] --- ## This Course - Course website: https://cs.nyu.edu/~joannakl/aps_s21/ This page contains the syllabus and daily summaries as well as loads of links to all other resources and services you will need for this class. - Recitations (required): - Wednesdays 3:30 - 4:45pm - plan to bring your laptop (there are outlets available) - Course message board / discussion: Ed - you can self-sign up at https://us.edstem.org/ - Online judge: - Gradescope (for homework assignments and exams) - [Vjudge](https://vjudge.net/) - Grades posted on Brightspace --- ## Operations Count - How fast your program runs depends on how many things it does _things_ == operations that the CPU performs on behalf of the program - How many things the computer does depends on how you (the programmer) write the code, what algorithm you chose, etc -- So, how many operations does this program perform: .left-column2[ ```C++ #include
using namespace std; int main() { for (int i = 1; i <= 10; i++ ) { for (int j = 1; j <= 10; j++ ) { printf("%3d\t", i*j); } printf("\n"); } return 0; } ``` ] -- .right-column2[ .huge[.center[???]] ] --- ## Counting operations is not easy (possible) - hard to figure out what to count, some operations in the high level programming language may be actualy multiple operations on the CPU - some CPU operations are faster then others - some operations require I/O and that slows them down -- .big[we need a simplified way of deciding how fast the program runs] --- template: section # Asymptotic Analysis --- ## Asymptotic Analysis __$O(g(n)) = f(n)$__ There exist positive constants _c_ and _$n_0$_ such that $0 <= f(n) <= cg(n)$ for all $n >= n_0$ --
__ $\Omega(g(n)) = f(n)$__ There exist positive constants _$c$_ and _$n_0$_ such that $0 <= cg(n)<= f(n) $ for all $n >= n_0$ --
__ $\Theta(g(n)) = f(n)$__ There exist positive constants _$c_1$_, _$c_2$_ and _$n_0$_ such that $0 <= c_1g(n) <= f(n) <= c_2g(n) $ for all $n >= n_0$ --- ## Asymptotic Analysis: Big O .center[
] Example of Big O notation: f(x) ∈ O(g(x)) says that there exists c > 0 (e.g., c = 1) and x0 (e.g., x0 = 5) such that f(x) ≤ cg(x) whenever x ≥ x0. .footnote[Image and descriptions in Public Domain. Retrieved from Wikipedia: https://commons.wikimedia.org/wiki/File:Big-O-notation.png ] --- template: challenge ## Challenge For each of the following functions state if it is $O(N^2)$: - $3 N^2 + 25 N - 20000$ - $0.00001 N^3 + 5 N + 2$ - $3N!$ - $2^N$ - $\log N$ - $1$ --- ## "Abuse" of Big O People often say "The algorithm is $O(N^3)$ - this is slow". An algorithm that performs in constant time is $O(N^3)$ so the above does not make much sense. But this is a common abuse of the terminology. We really mean that the algorithm is $\Theta(N^3)$, even though we use the Big O description. --- ## From Big O to Actual Execution Time - a typical machine executes approximately $10^9$ operations per second (on average, since some are slower than others) -- - to determine how long it might take the program to execute on the largest possible input size (this is specified as part of the constrains for the problem) - determine the the Big O term for the maximum N - divide the result by $10^9$ (this will give you the number of seconds it should take your program to solve the problem - again, this is an approximation, but it gives us the sense of what might happen) --- ## From Big O to Actual Execution Time __Example__: check all pairs of value in the input set - Performance is $O(N^2)$ -- - when $N = 10$, $N^2 = 100$, time = $100 / 10^9$ => program finishes immediately -- - when $N = 1000$, $N^2 = 1,000,000$, time = $10^6 / 10^9$ => program finishes in 0.001s (practically instantly) -- - when $N = 10^5$, $N^2 = 10^{10}$, time = $10^{10} / 10^9$ => program finishes in 10s (too long for most of the kinds of problems that we will be looking at) --- ## What's the performance of ... ```C++ int ans = 0; for (int i = 0; i < n; i++ ) { for (int j = 0; j < n; j++ ) { for (int k = 0; k < n; k++ ) { ans += i*j + j*k + k*i; } } } ``` -- .big[$O(N^3)$] --- ## What's the performance of ... ```C++ int ans = 0; for (int i = 0; i < n; i++ ) { for (int j = i+1; j < n; j++ ) { for (int k = j+1; k < n; k++ ) { ans += i*j + j*k + k*i; } } } ``` -- .big[$O(N^3)$] --- template: challenge ## Challenge Write a function that checks if a string Y is a substring of another string X. The function should return `true` or `false`. Ex. X = 'abcdaaaabbbbcddd' - Y = 'ab' => function returns `true` - Y = 'abab' => function returns `false` --- template: challenge ## What's the performance of ... check if a string Y is a substring of another string X ```Java public static boolean substringMatch ( String X, String Y ) { for (int i = 0; i < X.length(); i++ ) { boolean matched = true; for (int j = 0; j < Y.length(); j++) { if (i+j >= X.length()) { matched = false; break; } if ( X.charAt(i+j) != Y.charAt(j) ) matched = false; } if (matched) return true; } return false; } ``` -- .big[O(X.length * Y.length)] (Note: there are more efficient solutions. We will discuss them later in the semester. ) --- template: challenge ## What's the performance of ... check if a string Y is a substring of another string X ```Java public static boolean substringMatch ( String X, String Y ) { for (int i = 0; i < X.length(); i++ ) { boolean matched = true; for (int j = 0; j < Y.length(); j++) { if (i+j >= X.length()) { matched = false; break; } if ( X.charAt(i+j) != Y.charAt(j) ) matched = false; } if (matched) return true; } return false; } ``` Can we reduce the time further? ( constant factor optimization ) --
Yes, add
` if (!matched) break; `
as the last statement in the inner for loop --- ## Constant factor optimization - good in practice - may or may not make significant improvements for problems you encounter in this class --- template: challenge ## Challenge Given a sorted array of values, remove all duplicates. ex. given array: [1, 1, 4, 5, 5, 5, 6, 7, 7, 9, 9, 9, 9] revised array: [1, 4, 5, 6, 7, 9] --- template: challenge ## What's the performance of ... Given a sorted array of values, remove all duplicates. ```Java // assume data is in an ArrayList object called array for (int i = 0; i < array.size(); i++ ) { int j = i+1; while (j < array.size() && array.get(j) == array.get(i) ) { array.remove(j); } } ``` -- .big[$O(n^2)$] where $n$ is the size of the original array --- template: challenge ## Challenge Given a sorted array of values, create a new sorted array that contains only the unique elements. ex. given array: [1, 1, 4, 5, 5, 5, 6, 7, 7, 9, 9, 9, 9] new array: [1, 4, 5, 6, 7, 9] --- template: challenge ## What's the performance of ... Given a sorted array of values, create a new sorted array that contains only the unique elements. ```C++ for (int i = 0; i < n; i++ ) { int j = i; while (j < n && oldArray[j] == oldArray[i] ) j++; newArray.push(oldArray[i]); i = j - 1; } ``` -- .big[$O(n)$] where $n$ is the size of the original array (even though we have nested loops in the code) --- ## Using Library Function - good in practice, save a lot of time - be careful when analyzing performance --- ## Rule of Thumb: ## Complexity and Actual Time
| N | worst algorithm to pass on OJ | |:---|:---| | <= 10 | $O(n!)$, $O(n^6)$ | | <=[15 .. 18 ] | $O(2^n * n^2)$ | | <=[18 .. 22 ] | $O(2^n * n)$ | | <= 100 | $O(n^4)$ !!! | | <= 400 | $O(n^3)$ | | <= 2K | $O(n^2 \log n)$ | | < 10K | $O(n^2)$ !!! | | <= 1M | $O(n \log n)$ | | <= 100M | $O(n)$ !!! , $O(\log n)$, $O(1)$ | !!! but getting close to 10^8 may be dangerous --- template: section # Data Types and Their Representation --- ## Data Types All data represented as binary in hardware. Primitive types: |type| size| |---|---| |`boolean` (in C), `char` (in C/C++)| 1 byte | |`short`, `char` (in Java) | 2 bytes | |`int`, `float` | 4 bytes| |`long`, `double` | 8 bytes | -- String - not a primitive type - stored as an array of characters - WARNING: comparing two strings for equality is NOT constant time --- ## Binary Representation of Integers - use of base-2 - binary to decimal - $(101)_2 = 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 5$ - $(11010)_2 = 1 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 26$ - decimal to binary: mod by 2, record the remainder, divide by 2, and finally reverse the resulting string - `26 % 2 = 0 ` ` 26 / 2 = 13 ` - `13 % 2 = 1 ` ` 13 / 2 = 6 ` - ` 6 % 2 = 0 ` ` 6 / 2 = 3 ` - ` 3 % 2 = 1 ` ` 3 / 2 = 1 ` - ` 1 % 2 = 1 ` ` 1 / 2 = 0 ` --- ## Use of other bases __Base 3__ - $(201)_3 = 2 \times 3^2 + 0 \times 3^1 + 1 \times 3^0 = 19$ __Base 9__ - $(480)_9 = 4 \times 9^2 + 8 \times 9^1 + 0 \times 9^0 = 396$ __Base 16__ - ... --- ## Largest/Smallest value with N bits - A type using 4-bits can represent integers $(0000)_2$ to $(1111)_2$ - __Non-negative numbers__: a type using N-bits can represent values in the range $[0, 2^N)$
- upperbound is NOT included - this assumes that we represent only non-negative values - __Negative numbers__: a type using N-bits can represent values in the range $[-2^{N-1}, 2^{N-1})$
- upperbound is NOT included - the leading bit is associated with a negative multiplier __For signed numbers the limit on the `int` is $2^{31}-1 = 2,147,483,647$
(or approximately $2\times 10^9)$.__ --- ## ASCII/UTF-8 and `char` type - characters are just numbers that use fewer bits (8 bits or 16 bits) - each character has a corresponding numerical value - 'a' = 97 - 'A' = 65 - '2' = 50 - '!' = 33 - this comes in handy when comparing strings or processing characters for other purposes - but be careful about lexicographical ordering - "aaa" < "ab" - "AAA" < "aaa" - "ZZZ" < "aaa" - "10" < "100" - "2000" < "30" --- template: challenge ## Challenge: Split a Number We define the operation of splitting a binary number `n` into two numbers `a(n)` and `b(n)` as follows. Let $0 \leq i_1 < i_2 < ... < i_k$ be the indices of the bits (with the least significant bit having index 0) in `n` that are 1. Then the indices of the bits of `a(n)` that are 1 are $i_1$, $i_3$, $i_5$, ... and the indices of the bits of `b(n)` that are 1 are $i_2$, $i_4$, $i_6$, ... For example, if `n` is `110110101` in binary then, again in binary, `a = 010010001` and `b= 100100100`. .left-column2-large[ __Input__ The input consists of a single integer `n` between `1` and $2^{31} - 1$ written in standard decimal (base 10) format on a single line. __Output__ The output consists of a single line, containing the integers `a(n)` and `b(n)` separated by a single space. Both `a(n)` and `b(n)` should be written in decimal format. ] .right-column2-small[ __Example 1__ ``` Input: 6 Output: 2 4 ``` __Example 2__ ``` Input: 7 Output: 5 2 ``` ] --- template: challenge ## Challenge: Split a Number In pairs, - make sure that __you understand how the output values are obtained from the input__ (try both examples by hand) - create two new test cases (input and output pairs) that are for some reason interesting, - restriction: they need to be values > 2000 - try to come up with a test case that is likely to be different than those of other students around you - work together to try to come up with a solution (an algorithm) for the problem (DO NOT WRITE ACTUAL CODE)