Skip to content

Commit 1b3a18d

Browse files
committed
new execises
1 parent e48a7ec commit 1b3a18d

27 files changed

Lines changed: 1043 additions & 2 deletions
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
## Development - Advanced, exercise 34
2+
3+
### Text
4+
**Letter frequency** is the number of times letters of the alphabet appear on average in written language. It is possible to have a frequency sequence of a language, i.e. the use of letters showing trends in related letter frequencies, by returning the sequence of letters from the most frequent one to least frequent one. For instance, considering the following simple text
5+
6+
> Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
7+
8+
has the following letter frequencies:
9+
10+
```
11+
'n': 10 'i': 9' 't': 9 'e': 8 'o': 7 'g': 6
12+
'a': 5 's': 4 'r': 4 'h': 4 'b': 3 'd': 3
13+
'v': 2 'y': 2 'f': 2 'l': 1 'c': 1 'w': 1 'k': 1
14+
```
15+
16+
and the frequency sequence is represented by the string `"niteogasrhbdvyflcwk"`, where no punctuation and other non-letters are included. It is worth mentioning that, in the frequency sequence, letters having the same frequency are ordered according to their first occurrence in the input text – e.g. 'l' comes before 'c' because the first occurrence of the first letter happens before the first occurrence of the second one (in the word "Alice"). In addition, the input text is considered as lowercase when counting the frequencies.
17+
18+
Write an algorithm in Python – `def sequence(s)` – which takes in input a string `s` representing a text, and returns another string representing the fingerprint of such an input string.
19+
20+
21+
### Solution
22+
```python
23+
from collections import deque
24+
25+
# Test case for the function
26+
def test_sequence(s, expected):
27+
result = sequence(s)
28+
if result == expected:
29+
return True
30+
else:
31+
return False
32+
33+
34+
# Code of the function
35+
def sequence(s):
36+
count = {}
37+
for c in s.lower():
38+
if c not in [".", ",", ";", " ", ":", "'"]:
39+
if c not in count:
40+
count[c] = 0
41+
count[c] += 1
42+
43+
result = list()
44+
sorted_values = deque(sorted(count.values()))
45+
while len(sorted_values) > 0 and len(count) > 0:
46+
cur_count = sorted_values.pop()
47+
for c in s.lower():
48+
char_count = count.get(c)
49+
if char_count is not None and char_count == cur_count:
50+
result.append(c)
51+
del count[c]
52+
53+
return "".join(result)
54+
55+
56+
# Tests
57+
print(test_sequence("Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do", "niteogasrhbdvyflcwk"))
58+
```
59+
60+
### Additional material
61+
The runnable [Python file](exercise_34.py) is available online.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
## Development - Advanced, exercise 35
2+
3+
### Text
4+
The **index of coincidence** (IC) provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is calculated by number of times that letter appears divided by length of the text (excluding spaces and punctuation, and considering all letters in lowercase). The chance of drawing that same letter again (without replacement) is the number of occurrences of that letter minus one, divided by the lengh of the text minus one. The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability is then be normalized by multiplying it by some coefficient c dependant of the language of the text, as shown in the following formula:
5+
6+
<img src="img/ic.png" alt="IC" style="max-height:35px;" />
7+
8+
where *n<sub>a</sub>* is the number of occurrences of the letter *a* in the text, *n<sub>b</sub>* is is the number of occurrences of the letter *b* in the text, and so on (considering all letters in the alphabeth), and *N* is the total number of letters in the text.
9+
Write an algorithm in Python – `def ic(s, c)` – which takes in input a string `s` representing a text and a number `c` representing the coefficient mentioned in the formula above, and returns a number representing the index of coincidence for the input text.
10+
11+
12+
### Solution
13+
```python
14+
# Test case for the function
15+
def test_ic(s, c, expected):
16+
result = ic(s, c)
17+
# For testing it, I've approximated the result to integer
18+
if int(result) == int(expected):
19+
return True
20+
else:
21+
return False
22+
23+
24+
# Code of the function
25+
def ic(s, c):
26+
result = 0
27+
28+
en_alphabeth = "abcdefghijklmnopqrstuvwxyz"
29+
s_len = 0
30+
for char in s:
31+
if char.lower() in en_alphabeth:
32+
s_len += 1
33+
34+
for letter in en_alphabeth:
35+
letter_count = 0
36+
for char in s:
37+
if char.lower() == letter:
38+
letter_count += 1
39+
result += (letter_count / s_len) * ((letter_count - 1) / (s_len - 1))
40+
41+
return c * result
42+
43+
44+
45+
# Tests
46+
print(test_ic("Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do", 26, 57))
47+
print(test_ic("This is another text in english", 26, 19))
48+
```
49+
50+
### Additional material
51+
The runnable [Python file](exercise_35.py) is available online.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
## Development - Advanced, exercise 36
2+
3+
### Text
4+
The **Vigenère cipher** is a method of encrypting alphabetic text where each letter of the input text is replaced by a letter some fixed number of positions down the alphabet, and the number of positions is determined by the corresponding letter of another input text, the key. For example, if the input text is `"another exam"` and the key is `"bucainangolo"`, then:
5+
6+
* the first letter *a* of the input text is shifted by 1 position in the alphabet (because the first letter *B* of the key is the 2nd letter of the English alphabet, counting from 0), yielding *b*;
7+
* the second letter *n* is shifted by 20 (because the second letter *U* of the key means 20) yielding *h*, with wrap-around;
8+
* the third letter *o* is shifted by 2 (*C*) yelding *q*, with wrap-around;
9+
* and so on; yielding the message `"bhqtprr klla"` (all spaces are preserved).
10+
11+
Write an algorithm in Python – `def vigenere(text, key)` – which considers only English texts, and takes in input a string `text` in lowercase representing the input text to cipher and another lowercase string `key` representing the key for the cipher – where both text and key contain the same number of characters, i.e. `len(text)` is equal to `len(key)`. The algorithm must return the encrypted text according to the rules described above.
12+
13+
14+
### Solution
15+
```python
16+
# Test case for the function
17+
def test_vigenere(text, key, expected):
18+
result = vigenere(text, key)
19+
if result == expected:
20+
return True
21+
else:
22+
return False
23+
24+
25+
# Code of the function
26+
def vigenere(text, key):
27+
result = list()
28+
29+
a = "abcdefghijklmnopqrstuvwxyz"
30+
for idx, c in enumerate(text):
31+
if c in a:
32+
a_idx = a.index(c)
33+
k_idx = a.index(key[idx])
34+
result.append(a[(a_idx + k_idx) % len(a)])
35+
else:
36+
result.append(" ")
37+
38+
return "".join(result)
39+
40+
41+
# Tests
42+
print(test_vigenere("attacking tonight", "oculorhinolaringo", "ovnlqbpvt eoeqtnh"))
43+
print(test_vigenere("another exam", "bucainangolo", "bhqtprr klla"))
44+
```
45+
46+
### Additional material
47+
The runnable [Python file](exercise_36.py) is available online.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
## Development - Advanced, exercise 37
2+
3+
### Text
4+
**Trial division** is one of the integer factorisation algorithms. The idea is to see if an integer n greater than 1, provided as input, can be divided by each number in turn from 2 to n. For example:
5+
6+
* for the integer *n = 12*, the list of factors dividing it is *2, 2, 3* (i.e. *12 = 2 * 2 * 3*);
7+
* for the integer *n = 11*, the list of factors dividing it is *11* (i.e. *11 = 11*, since 11 is prime andm thus, it can be divided by itself only);
8+
* for the integer *n = 15*, the list of factors dividing it is *3, 5* (i.e. *15 = 3 * 5*).
9+
10+
The algorithm proceed by dividing the input number starting from the smallest possible number *f*, initially set to 2. If the division returns a reminder, it repeat the operation by incrementing f of one unit. Instead, if the division returns no reminder, *f* is added to the list of factors, and n will be assigned with the result of the division, before repeating the operation. For instance, considering *n = 18*, the initial *f = 2*, and the list of factors to return initially empty:
11+
12+
1. 18 / 2 = 9 (with no remainder) → list of factors: 2; n = 9
13+
2. 9 / 2 = 4 (with remainder 1) → f = 3
14+
3. 9 / 3 = 3 (with no remainder) → list of factors: 2, 3; n = 3
15+
4. 3 / 3 = 1 (with no reminder) → list of factors: 2, 3, 3; n = 1
16+
17+
The algorithm stop when *f* is greater than *n*, and returns the list of factors.
18+
19+
Write an algorithm in Python – `def trial_div(n)` – which takes in input an integer `n` greater than 1, and returns the list with the factors dividing `n` according to the rules described above.
20+
21+
22+
### Solution
23+
```python
24+
# Test case for the function
25+
def test_trial_div(n, expected):
26+
result = trial_div(n)
27+
if result == expected:
28+
return True
29+
else:
30+
return False
31+
32+
33+
# Code of the function
34+
def trial_div(n):
35+
result = []
36+
f = 2
37+
38+
while not f > n:
39+
if n % f == 0:
40+
result.append(f)
41+
n = n / f
42+
else:
43+
f = f + 1
44+
45+
return result
46+
47+
48+
# Tests
49+
print(test_trial_div(12, [2, 2, 3]))
50+
print(test_trial_div(11, [11]))
51+
print(test_trial_div(15, [3, 5]))
52+
print(test_trial_div(18, [2, 3, 3]))
53+
```
54+
55+
### Additional material
56+
The runnable [Python file](exercise_37.py) is available online.
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
## Development - Advanced, exercise 38
2+
3+
### Text
4+
The **PageRank** is an algorithm used by Google Search to rank web pages in their search engine results. It works on a directed graph where nodes represent webpages and each directed edge is a link connecting a source webpage to a target one. Each node of the graph has associated a PageRank that measures its relative importance within the graph (the greater, the more important).
5+
6+
In its simplified version, it is computed as follows. It takes in input a directed graph where each node a potential PageRank transfer value to share with other nodes set to 1. Then, the algoritm transfers the such potential value of a given node to the targets of its outbound links, dividing such a value equally among all outbound links. For instance, suppose that page B had a link to pages C and A, page C has a link to page A, and page D has links to all three pages. Thus, page B would transfer half of its existing value (0.5) to page A and the other half (0.5) to page C. Page C would transfer all of its existing value (1) to the only page it links to, A. Since D had three outbound links, it would transfer one third of its existing value, or approximately 0.33, to A, B and C. The sum of all the values that are transferred to a given node is the PageRank of that node – for instance, page A will have a PageRank of approximately 1.83.
7+
8+
Write an algorithm in Python – `def simplified_pr(g)` – which takes in input a directed graph created using the networkx library, and returns a dictionary having as many key-value pairs as the number of the nodes in the graph. In particular, each pair has the name of a node as the key and the PageRank of that node as the value. It is possible to use the method `adj[n]` of a graph for getting all the nodes reacheable from a node `n` by following its outbound edges. For instance, considering the example shown above stored as a `DiGraph` in the variable `my_g`, the execution of `my_g.adj["D"]` returns a collection containing the nodes A, B and C.
9+
10+
11+
### Solution
12+
```python
13+
from networkx import DiGraph
14+
15+
16+
# Test case for the function
17+
def test_simplified_pr(g, expected):
18+
result = simplified_pr(g)
19+
20+
if len(result) == len(expected):
21+
test_res = True
22+
for key in result:
23+
if round(result[key], 2) != round(expected[key], 2):
24+
test_res = False
25+
return test_res
26+
else:
27+
return False
28+
29+
30+
# Code of the function
31+
def simplified_pr(g):
32+
result = {}
33+
34+
for n in g.nodes:
35+
if n not in result:
36+
result[n] = 0
37+
38+
adj_n = g.adj[n]
39+
40+
if len(adj_n):
41+
value = 1 / len(adj_n)
42+
43+
for a in adj_n:
44+
if a not in result:
45+
result[a] = 0
46+
result[a] += value
47+
48+
return result
49+
50+
51+
# Tests
52+
my_g = DiGraph()
53+
my_g.add_edge("B", "C")
54+
my_g.add_edge("B", "A")
55+
my_g.add_edge("C", "A")
56+
my_g.add_edge("D", "A")
57+
my_g.add_edge("D", "B")
58+
my_g.add_edge("D", "C")
59+
60+
res = {
61+
"A": 1.83,
62+
"B": 0.33,
63+
"C": 0.83,
64+
"D": 0
65+
}
66+
67+
print(test_simplified_pr(my_g, res))
68+
```
69+
70+
### Additional material
71+
The runnable [Python file](exercise_38.py) is available online.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
## Development - Advanced, exercise 39
2+
3+
### Text
4+
The **Sørensen–Dice coefficient** is a statistic used to gauge the similarity of two samples, that was intended to be applied to discrete data. Given two sets, A and B, it is defined as twice the number of elements common to both sets divided by the sum of the number of elements in each set, as defined in the following formula:
5+
6+
<img src="img/sd.png" alt="Sørensen–Dice coefficient" style="max-height:35px;" />
7+
8+
Write an algorithm in Python – `def sd_coeff(s1, s2)` – which takes in input two sets and returns the number defining the Sørensen–Dice coefficient for those sets.
9+
10+
11+
### Solution
12+
```python
13+
# Test case for the function
14+
def test_sd_coeff(s1, s2, expected):
15+
result = sd_coeff(s1, s2)
16+
print(result)
17+
if result is not None and (round(result, 2) == round(expected, 2)):
18+
return True
19+
else:
20+
return False
21+
22+
23+
# Code of the function
24+
def sd_coeff(s1, s2):
25+
count = 0
26+
for i in s1:
27+
if i in s2:
28+
count += 1
29+
30+
den = len(s1) + len(s2)
31+
32+
return (2 * count) / den
33+
34+
35+
# Tests
36+
print(test_sd_coeff({1, 2, 3}, {1, 2, 3}, 1.0))
37+
print(test_sd_coeff({1, 2}, {1, 2, 3}, 0.8))
38+
```
39+
40+
### Additional material
41+
The runnable [Python file](exercise_39.py) is available online.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# -*- coding: utf-8 -*-
2+
# Copyright (c) 2022, Silvio Peroni <essepuntato@gmail.com>
3+
#
4+
# Permission to use, copy, modify, and/or distribute this software for any purpose
5+
# with or without fee is hereby granted, provided that the above copyright notice
6+
# and this permission notice appear in all copies.
7+
#
8+
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
9+
# REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
10+
# FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT,
11+
# OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
12+
# DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
13+
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
14+
# SOFTWARE.
15+
16+
from collections import deque
17+
18+
# Test case for the function
19+
def test_sequence(s, expected):
20+
result = sequence(s)
21+
if result == expected:
22+
return True
23+
else:
24+
return False
25+
26+
27+
# Code of the function
28+
def sequence(s):
29+
count = {}
30+
for c in s.lower():
31+
if c not in [".", ",", ";", " ", ":", "'"]:
32+
if c not in count:
33+
count[c] = 0
34+
count[c] += 1
35+
36+
result = list()
37+
sorted_values = deque(sorted(count.values()))
38+
while len(sorted_values) > 0 and len(count) > 0:
39+
cur_count = sorted_values.pop()
40+
for c in s.lower():
41+
char_count = count.get(c)
42+
if char_count is not None and char_count == cur_count:
43+
result.append(c)
44+
del count[c]
45+
46+
return "".join(result)
47+
48+
49+
# Tests
50+
print(test_sequence("Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do", "niteogasrhbdvyflcwk"))

0 commit comments

Comments
 (0)