Init: KDTree by Engelsgeduld · Pull Request #3 · Engelsgeduld/Spbu-ML-Practice

Engelsgeduld · 2025-03-11T20:34:05Z

No description provided.

maybenotilya

А ноутбуки где...

maybenotilya · 2025-05-01T14:52:22Z

src/knn/kd_tree/heap.py

+        self.id = 1
+        self.size = size
+        self.heap: list[tuple[float, int, Optional[PointType]]] = [(-np.inf, 1, None)]
+        heapq.heapify(self.heap)


А зачем делать heapify для кучи из одного элемента?

maybenotilya · 2025-05-01T14:53:54Z

src/knn/kd_tree/heap.py

+
+class Heap:
+    def __init__(self, size: int):
+        self.id = 1


Не совсем понимаю предназначение id

maybenotilya · 2025-05-01T14:54:22Z

src/knn/kd_tree/heap.py

+class Heap:
+    def __init__(self, size: int):
+        self.id = 1
+        self.size = size


Ну скорее capacity

maybenotilya · 2025-05-01T15:00:13Z

src/knn/kd_tree/kdtree.py

+            raise ValueError("Leaf size must be positive")
+
+        valid_points = self._validate_points(points)
+        self.dim: int = valid_points.shape[1]


Наверное для удобства стоит в конструкторе объявить через None

maybenotilya · 2025-05-01T15:03:10Z

src/knn/processing/scalers.py

+
+
+class AbstractScaler(metaclass=ABCMeta):
+    def fit(self, data: np._typing.NDArray) -> None:


В декоратор @abstractmethod стоило бы обернуть

maybenotilya · 2025-05-01T15:05:17Z

src/knn/classifier/knn_classifier.py

+            raise ValueError("Features and targets must be same lenght")
+        self.model = KDTree(features, self.leaf_size, self.metric)
+        self.classifier = dict((tuple(pair[0]), pair[1]) for pair in zip(features.tolist(), targets.tolist()))
+        self.targets = targets


Не понимаю зачем хранить таргеты отдельно, если они уже есть в self.classifier

maybenotilya · 2025-05-01T15:09:20Z

src/knn/classifier/knn_classifier.py

+            if point in self.classifier:
+                probability.append(
+                    (
+                        np.unique(self.targets),
+                        (self.classifier[point] == np.unique(self.targets)).astype(int),
+                    )
+                )


Слабо понимаю что тут вообще происходит, но кажется это некорректно, как минимум потому что могут быть две одинаковые точки с разными лейблами

maybenotilya · 2025-05-01T15:12:22Z

src/knn/classifier/knn_classifier.py

+                    )
+                )
+            else:
+                result = self.model.query([point], self.k)


У тебя model.query принимает много точек сразу, почему бы их всех туда не передать?

maybenotilya · 2025-05-01T15:16:55Z

src/knn/classifier/knn_classifier.py

+                result = self.model.query([point], self.k)
+                target_result = np.array([self.classifier[tuple(neighbors.tolist())] for neighbors in result[0]])
+                counts = np.array([(target_result == val).sum() for val in np.unique(self.targets)])
+                probability.append((self.targets, counts / len(result[0])))


Кажется сохранять в каждом предикте таргеты это оверхед по памяти, у тебя они всё равно используются только в классе

А еще почему выше np.unique(self.targets), а тут просто self.targets

maybenotilya · 2025-05-01T15:20:40Z

src/knn/processing/scalers.py

+    def transform(self, data: np._typing.NDArray) -> np._typing.NDArray:
+        if self.median is None or self.iqr is None:
+            raise ValueError("Scaler unfitted")
+        return (data - self.median) / self.iqr


self.iqr может быть равным нулю

Engelsgeduld added 2 commits March 11, 2025 23:31

Init: KDTree, KNNClassifier, Metrics and Scalers

c7ede5e

Fix: remove print debuging

518718f

Engelsgeduld changed the title ~~Kd tree~~ Init: KDTree Mar 11, 2025

maybenotilya reviewed May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Init: KDTree#3

Init: KDTree#3
Engelsgeduld wants to merge 2 commits intomainfrom
KD-Tree

Engelsgeduld commented Mar 11, 2025

Uh oh!

maybenotilya left a comment

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class AbstractScaler(metaclass=ABCMeta):
		def fit(self, data: np._typing.NDArray) -> None:

Conversation

Engelsgeduld commented Mar 11, 2025

Uh oh!

maybenotilya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants