347. Top K Frequent Elements by tarinaihitori · Pull Request #9 · tarinaihitori/leetcode

tarinaihitori · 2024-10-25T16:07:48Z

https://leetcode.com/problems/top-k-frequent-elements/description/

oda · 2024-10-25T16:14:19Z

+            if num in num_to_count:
+                num_to_count[num] += 1
+            else:
+                num_to_count[num] = 0


これ意図通りですか。1ですか。

いや、意図通りではないです。elseのときは1をいれないと正しい数にカウントされませんね。
相対的な頻度順は変わらないので、コード自体は動きましたが。

oda · 2024-10-25T16:16:03Z

+            elif k_smallest < pivot_index:
+                quickselect(left, pivot_index - 1, k_smallest)
+            else:
+                quickselect(pivot_index + 1, right, k_smallest)


これを手で末尾再帰最適化してループにするのは一つ案でしょう。

なるほど。その解法も書いてみます。

oda · 2024-10-25T16:16:27Z

+
+            if k_smallest == pivot_index:
+                return
+            elif k_smallest < pivot_index:


私は elif を if にします。上が return なので。

oda · 2024-10-25T16:19:44Z

+            result.extend(bucket[freq])
+            if k <= len(result):
+                return result[:k]
+```


問題設定は必ず k 種類あることになっていますが、なかった場合にどうするかは考えておいてください。
特別な値を返すか、Exception を投げるか、短いものでも返すか、プログラムを止めるか、そのあたりです。

goto-untrapped · 2024-10-26T01:06:56Z

+```python
+class Solution:
+    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
+        num_to_count = {}


ここも defaultdict が使えそうです。

goto-untrapped · 2024-10-26T01:26:14Z

+- pros
+  - 平均時間計算量が O(n)で効率的
+- cons
+  - 実装が複雑


最悪計算量が O(n^2) になり得ることも要注意だと感じました。

最悪計算量も考慮すべきですね。ありがとうございます。

あとはどういうときに最悪計算量になるのか、とそれを軽減させるためのpivot選択などのテクニックについても詳しくなければ調べておくと良いと思います

goto-untrapped · 2024-10-26T01:43:31Z

+- pros
+  - k が小さい場合、効率的
+- cons
+  - k が大きい場合、非効率になる


質問なのですが、
こちら、k が小さいか大きいかで効率的、非効率というのは、なぜなのでしょうか？
1st を拝見したところ、num_to_count をすべて min_heap に追加しているので、入力の nums は効率に影響しそう、とは思ったのですが。。

たしかに1stのままだとmin_heapにユニークな要素が多い場合、ヒープのサイズも大きくなるのですが、
heapにユニークな要素をすべて入れから、サイズがkになるまでpopするのではなく
下のように書くと、heapのサイズをkにすることができるので、kの値が小さいほど効率的になると思っています。

for num, freq in num_to_count.items(): heapq.heappush(min_heap, (freq, num)) if len(min_heap) > k: heapq.heappop(min_heap)

ありがとうございます。
k が小さい方がメモリの使用量が減りますよね。
個人的な感覚かもしれませんが、heap は k のサイズで書けるよね、という話のように感じているので、pros でも cons でもないのかな、という気がしました。

thonda28 · 2024-10-26T12:31:11Z

+
+        while k < len(min_heap):
+            heapq.heappop(min_heap)
+        return [num for freq, num in min_heap]


freq は使用していないので _ としておくと不要な情報がなくせます

Suggested change

return [num for freq, num in min_heap]

return [num for _, num in min_heap]

thonda28 · 2024-10-26T13:05:11Z

+        bucket = defaultdict(list)
+        for num, frequency in num_to_count.items():
+            bucket[frequency].append(num)
+
+        sorted_frequencies = sorted(bucket.keys(), reverse=True)


バケットソートは要素を特定のカテゴリーごとに「バケット」に分けて、
それぞれのバケット内で並び替えをし、最後に結合するアルゴリズム。
今回の場合だと、出現頻度ごとにバケットを作るという考え方が適用できる。
https://en.wikipedia.org/wiki/Bucket_sort
これだと、メモリの無駄が多いので、3rd で辞書を使った方法で書いた。
https://github.com/kagetora0924/leetcode-grind/pull/11/files#r1658626553

上記コメントを「3rd では辞書を使った Bucket Sort を実装した」という意味で読んだので、コメントです。

3rd で書かれているこちらの実装は Bucket Sort ではなさそうです。Bucket Sort は特定の状況下では高速なソートができるが、トレードオフとして上で書かれているように追加メモリが必要なものという理解です。メモリが無駄なので辞書を使うというのはよい選択だと思うんですが、結果的に通常のソートを行っており、Bucket Sort ではないと思いました。（Bucket Sort は追加メモリを使うことでここでの sorted() を不要にしている認識です）

Bucket Sort に対する誤解があるかもしれないと思ってのコメントでした。

colorbox · 2024-10-26T13:33:48Z

+        count_to_num = []
+        for num, frequency in num_to_count.items():
+            heapq.heappush(count_to_num, (frequency, num))
+        while k < len(count_to_num):


この処理をforに入れると、ヒープがkを超えないようにできそうですね

for num, frequency in num_to_count.items(): heapq.heappush(count_to_num, (frequency, num)) if k < len(count_to_num): heapq.heappop(count_to_num)

hroc135 · 2024-11-04T01:37:09Z

+            if bucket[i]:
+                result.extend(bucket[i])
+            if len(result) >= k:
+                return result[:k]


問題文の制約に"It is guaranteed that the answer is unique."とあるので、ちょうど len(result) == k となる瞬間が必ずあるのですが、こうすることによって汎用性を持たせているのがいいと思いました。自分はleetcodeの制約に頼ってしまう時があるので見習いたいです

for num, freq in num_to_count.items(): でアクセスされる順番はランダムだと思うので、もし len(result) == k とならずにはみ出たらランダムな答えになってしまいますね

hroc135 · 2024-11-04T01:44:12Z

+        for num in nums:
+            if num not in num_to_count:
+                num_to_count[num] = 0
+            num_to_count[num] += 1


この4行はcollectionsライブラリのCounterでできると思いました

tarinaihitori added 2 commits October 26, 2024 01:01

step3まで

1a2652b

微修正

52ab57d

oda reviewed Oct 25, 2024

View reviewed changes

goto-untrapped reviewed Oct 26, 2024

View reviewed changes

thonda28 reviewed Oct 26, 2024

View reviewed changes

colorbox reviewed Oct 26, 2024

View reviewed changes

hroc135 reviewed Nov 4, 2024

View reviewed changes

	return [num for freq, num in min_heap]
	return [num for _, num in min_heap]

Conversation

tarinaihitori commented Oct 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colorbox Oct 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

colorbox Oct 26, 2024 •

edited

Loading