-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathupdate.patch
More file actions
5041 lines (5039 loc) · 216 KB
/
update.patch
File metadata and controls
5041 lines (5039 loc) · 216 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Index: 04_Naive Bayesian Model/email Filter/ham/2.txt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/email Filter/ham/2.txt (date 1551675378000)
+++ 04_Naive Bayesian Model/email Filter/ham/2.txt (date 1551675378000)
@@ -0,0 +1,3 @@
+Yay to you both doing fine!
+
+I'm working on an MBA in Design Strategy at CCA (top art school.) It's a new program focusing on more of a right-brained creative and strategic approach to management. I'm an 1/8 of the way done today!
\ No newline at end of file
Index: 04_Naive Bayesian Model/email Filter/ham/5.txt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/email Filter/ham/5.txt (date 1551675378000)
+++ 04_Naive Bayesian Model/email Filter/ham/5.txt (date 1551675378000)
@@ -0,0 +1,2 @@
+There was a guy at the gas station who told me that if I knew Mandarin
+and Python I could get a job with the FBI.
\ No newline at end of file
Index: 04_Naive Bayesian Model/email Filter/ham/25.txt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/email Filter/ham/25.txt (date 1551675378000)
+++ 04_Naive Bayesian Model/email Filter/ham/25.txt (date 1551675378000)
@@ -0,0 +1,2 @@
+That is cold. Is there going to be a retirement party?
+Are the leaves changing color?
\ No newline at end of file
Index: 04_Naive Bayesian Model/email Filter/emailFilter.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/email Filter/emailFilter.py (date 1565861431946)
+++ 04_Naive Bayesian Model/email Filter/emailFilter.py (date 1565861431946)
@@ -0,0 +1,242 @@
+#!/usr/bin/python3.7
+# -*- coding: utf-8 -*-
+"""
+@Time :2019/8/14 16:43
+
+@Author :Yuki
+
+@FileName :emailFilter.py
+
+@E-mail :fujii20180311@foxmail.com
+"""
+
+import numpy as np
+import random
+import re
+
+"""
+函数说明:将切分的实验样本词条整理成不重复的词条列表,也就是词汇表
+
+Parameters:
+ dataSet - 整理的样本数据集
+Returns:
+ vocabSet - 返回不重复的词条列表,也就是词汇表
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-11
+"""
+def createVocabList(dataSet):
+ vocabSet = set([]) #创建一个空的不重复列表
+ for document in dataSet:
+ vocabSet = vocabSet | set(document) #取并集
+ return list(vocabSet)
+
+"""
+函数说明:根据vocabList词汇表,将inputSet向量化,向量的每个元素为1或0
+
+Parameters:
+ vocabList - createVocabList返回的列表
+ inputSet - 切分的词条列表
+Returns:
+ returnVec - 文档向量,词集模型
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-11
+"""
+def setOfWords2Vec(vocabList, inputSet):
+ returnVec = [0] * len(vocabList) #创建一个其中所含元素都为0的向量
+ for word in inputSet: #遍历每个词条
+ if word in vocabList: #如果词条存在于词汇表中,则置1
+ returnVec[vocabList.index(word)] = 1
+ else: print("the word: %s is not in my Vocabulary!" % word)
+ return returnVec #返回文档向量
+
+
+"""
+函数说明:根据vocabList词汇表,构建词袋模型
+
+Parameters:
+ vocabList - createVocabList返回的列表
+ inputSet - 切分的词条列表
+Returns:
+ returnVec - 文档向量,词袋模型
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-14
+"""
+def bagOfWords2VecMN(vocabList, inputSet):
+ returnVec = [0]*len(vocabList) #创建一个其中所含元素都为0的向量
+ for word in inputSet: #遍历每个词条
+ if word in vocabList: #如果词条存在于词汇表中,则计数加一
+ returnVec[vocabList.index(word)] += 1
+ return returnVec #返回词袋模型
+
+"""
+函数说明:朴素贝叶斯分类器训练函数
+
+Parameters:
+ trainMatrix - 训练文档矩阵,即setOfWords2Vec返回的returnVec构成的矩阵
+ trainCategory - 训练类别标签向量,即loadDataSet返回的classVec
+Returns:
+ p0Vect - 侮辱类的条件概率数组
+ p1Vect - 非侮辱类的条件概率数组
+ pAbusive - 文档属于侮辱类的概率
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def trainNB0(trainMatrix,trainCategory):
+ numTrainDocs = len(trainMatrix) #计算训练的文档数目
+ numWords = len(trainMatrix[0]) #计算每篇文档的词条数
+ pAbusive = sum(trainCategory)/float(numTrainDocs) #文档属于侮辱类的概率
+ p0Num = np.ones(numWords); p1Num = np.ones(numWords) #创建numpy.ones数组,词条出现数初始化为1,拉普拉斯平滑
+ p0Denom = 2.0; p1Denom = 2.0 #分母初始化为2,拉普拉斯平滑
+ for i in range(numTrainDocs):
+ if trainCategory[i] == 1: #统计属于侮辱类的条件概率所需的数据,即P(w0|1),P(w1|1),P(w2|1)···
+ p1Num += trainMatrix[i]
+ p1Denom += sum(trainMatrix[i])
+ else: #统计属于非侮辱类的条件概率所需的数据,即P(w0|0),P(w1|0),P(w2|0)···
+ p0Num += trainMatrix[i]
+ p0Denom += sum(trainMatrix[i])
+ p1Vect = np.log(p1Num/p1Denom) #取对数,防止下溢出
+ p0Vect = np.log(p0Num/p0Denom)
+ return p0Vect,p1Vect,pAbusive #返回属于侮辱类的条件概率数组,属于非侮辱类的条件概率数组,文档属于侮辱类的概率
+
+"""
+函数说明:朴素贝叶斯分类器分类函数
+
+Parameters:
+ vec2Classify - 待分类的词条数组
+ p0Vec - 侮辱类的条件概率数组
+ p1Vec -非侮辱类的条件概率数组
+ pClass1 - 文档属于侮辱类的概率
+Returns:
+ 0 - 属于非侮辱类
+ 1 - 属于侮辱类
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
+ p1 = sum(vec2Classify * p1Vec) + np.log(pClass1) #对应元素相乘。logA * B = logA + logB,所以这里加上log(pClass1)
+ p0 = sum(vec2Classify * p0Vec) + np.log(1.0 - pClass1)
+ if p1 > p0:
+ return 1
+ else:
+ return 0
+
+"""
+函数说明:朴素贝叶斯分类器训练函数
+
+Parameters:
+ trainMatrix - 训练文档矩阵,即setOfWords2Vec返回的returnVec构成的矩阵
+ trainCategory - 训练类别标签向量,即loadDataSet返回的classVec
+Returns:
+ p0Vect - 侮辱类的条件概率数组
+ p1Vect - 非侮辱类的条件概率数组
+ pAbusive - 文档属于侮辱类的概率
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def trainNB0(trainMatrix,trainCategory):
+ numTrainDocs = len(trainMatrix) #计算训练的文档数目
+ numWords = len(trainMatrix[0]) #计算每篇文档的词条数
+ pAbusive = sum(trainCategory)/float(numTrainDocs) #文档属于侮辱类的概率
+ p0Num = np.ones(numWords); p1Num = np.ones(numWords) #创建numpy.ones数组,词条出现数初始化为1,拉普拉斯平滑
+ p0Denom = 2.0; p1Denom = 2.0 #分母初始化为2,拉普拉斯平滑
+ for i in range(numTrainDocs):
+ if trainCategory[i] == 1: #统计属于侮辱类的条件概率所需的数据,即P(w0|1),P(w1|1),P(w2|1)···
+ p1Num += trainMatrix[i]
+ p1Denom += sum(trainMatrix[i])
+ else: #统计属于非侮辱类的条件概率所需的数据,即P(w0|0),P(w1|0),P(w2|0)···
+ p0Num += trainMatrix[i]
+ p0Denom += sum(trainMatrix[i])
+ p1Vect = np.log(p1Num/p1Denom) #取对数,防止下溢出
+ p0Vect = np.log(p0Num/p0Denom)
+ return p0Vect,p1Vect,pAbusive #返回属于侮辱类的条件概率数组,属于非侮辱类的条件概率数组,文档属于侮辱类的概率
+
+
+"""
+函数说明:接收一个大字符串并将其解析为字符串列表
+
+Parameters:
+ 无
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-14
+"""
+def textParse(bigString): #将字符串转换为字符列表
+ listOfTokens = re.split(r'\W+', bigString) #将特殊符号作为切分标志进行字符串切分,即非字母、非数字
+ return [tok.lower() for tok in listOfTokens if len(tok) > 2] #除了单个字母,例如大写的I,其它单词变成小写
+
+"""
+函数说明:测试朴素贝叶斯分类器
+
+Parameters:
+ 无
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-14
+"""
+def spamTest():
+ docList = []; classList = []; fullText = []
+ for i in range(1, 26): #遍历25个txt文件
+ wordList = textParse(open('./spam/%d.txt' % i, 'r').read()) #读取每个垃圾邮件,并字符串转换成字符串列表
+ docList.append(wordList)
+ fullText.append(wordList)
+ classList.append(1) #标记垃圾邮件,1表示垃圾文件
+ wordList = textParse(open('./ham/%d.txt' % i, 'r').read()) #读取每个非垃圾邮件,并字符串转换成字符串列表
+ docList.append(wordList)
+ fullText.append(wordList)
+ classList.append(0) #标记非垃圾邮件,1表示垃圾文件
+ vocabList = createVocabList(docList) #创建词汇表,不重复
+ trainingSet = list(range(50)); testSet = [] #创建存储训练集的索引值的列表和测试集的索引值的列表
+ for i in range(10): #从50个邮件中,随机挑选出40个作为训练集,10个做测试集
+ randIndex = int(random.uniform(0, len(trainingSet))) #随机选取索索引值
+ testSet.append(trainingSet[randIndex]) #添加测试集的索引值
+ del(trainingSet[randIndex]) #在训练集列表中删除添加到测试集的索引值
+ trainMat = []; trainClasses = [] #创建训练集矩阵和训练集类别标签系向量
+ for docIndex in trainingSet: #遍历训练集
+ trainMat.append(setOfWords2Vec(vocabList, docList[docIndex])) #将生成的词集模型添加到训练矩阵中
+ trainClasses.append(classList[docIndex]) #将类别添加到训练集类别标签系向量中
+ p0V, p1V, pSpam = trainNB0(np.array(trainMat), np.array(trainClasses)) #训练朴素贝叶斯模型
+ errorCount = 0 #错误分类计数
+ for docIndex in testSet: #遍历测试集
+ wordVector = setOfWords2Vec(vocabList, docList[docIndex]) #测试集的词集模型
+ if classifyNB(np.array(wordVector), p0V, p1V, pSpam) != classList[docIndex]: #如果分类错误
+ errorCount += 1 #错误计数加1
+ print("分类错误的测试集:",docList[docIndex])
+ print('错误率:%.2f%%' % (float(errorCount) / len(testSet) * 100))
+
+
+if __name__ == '__main__':
+ spamTest()
\ No newline at end of file
Index: 04_Naive Bayesian Model/email Filter/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/email Filter/__init__.py (date 1565771263904)
+++ 04_Naive Bayesian Model/email Filter/__init__.py (date 1565771263904)
@@ -0,0 +1,12 @@
+#!/usr/bin/python3.7
+# -*- coding: utf-8 -*-
+"""
+@Time :2019/8/14 16:27
+
+@Author :Yuki
+
+@FileName :__init__.py.py
+
+@E-mail :fujii20180311@foxmail.com
+"""
+
Index: 04_Naive Bayesian Model/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 04_Naive Bayesian Model/__init__.py (date 1565770191269)
+++ 04_Naive Bayesian Model/__init__.py (date 1565770191269)
@@ -0,0 +1,185 @@
+#!/usr/bin/python3.7
+# -*- coding: utf-8 -*-
+"""
+@Time :2019/8/14 9:04
+
+@Author :Yuki
+
+@FileName :__init__.py
+
+@E-mail :fujii20180311@foxmail.com
+"""
+
+import numpy as np
+from functools import reduce
+
+"""
+函数说明:创建实验样本
+
+Parameters:
+ 无
+Returns:
+ postingList - 实验样本切分的词条
+ classVec - 类别标签向量
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-11
+"""
+def loadDataSet():
+ postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'], #切分的词条
+ ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
+ ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
+ ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
+ ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
+ ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
+ classVec = [0,1,0,1,0,1] #类别标签向量,1代表侮辱性词汇,0代表不是
+ return postingList,classVec #返回实验样本切分的词条和类别标签向量
+
+"""
+函数说明:将切分的实验样本词条整理成不重复的词条列表,也就是词汇表
+
+Parameters:
+ dataSet - 整理的样本数据集
+Returns:
+ vocabSet - 返回不重复的词条列表,也就是词汇表
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-11
+"""
+def createVocabList(dataSet):
+ vocabSet = set([]) #创建一个空的不重复列表
+ for document in dataSet:
+ vocabSet = vocabSet | set(document) #取并集
+ return list(vocabSet)
+
+"""
+函数说明:根据vocabList词汇表,将inputSet向量化,向量的每个元素为1或0
+
+Parameters:
+ vocabList - createVocabList返回的列表
+ inputSet - 切分的词条列表
+Returns:
+ returnVec - 文档向量,词集模型
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-11
+"""
+def setOfWords2Vec(vocabList, inputSet):
+ returnVec = [0] * len(vocabList) #创建一个其中所含元素都为0的向量
+ for word in inputSet: #遍历每个词条
+ if word in vocabList: #如果词条存在于词汇表中,则置1
+ returnVec[vocabList.index(word)] = 1
+ else: print("the word: %s is not in my Vocabulary!" % word)
+ return returnVec #返回文档向量
+
+
+"""
+函数说明:朴素贝叶斯分类器训练函数
+
+Parameters:
+ trainMatrix - 训练文档矩阵,即setOfWords2Vec返回的returnVec构成的矩阵
+ trainCategory - 训练类别标签向量,即loadDataSet返回的classVec
+Returns:
+ p0Vect - 侮辱类的条件概率数组
+ p1Vect - 非侮辱类的条件概率数组
+ pAbusive - 文档属于侮辱类的概率
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def trainNB0(trainMatrix,trainCategory):
+ numTrainDocs = len(trainMatrix) #计算训练的文档数目
+ numWords = len(trainMatrix[0]) #计算每篇文档的词条数
+ pAbusive = sum(trainCategory)/float(numTrainDocs) #文档属于侮辱类的概率
+ p0Num = np.ones(numWords); p1Num = np.ones(numWords) #创建numpy.ones数组,词条出现数初始化为1,拉普拉斯平滑
+ p0Denom = 2.0; p1Denom = 2.0 #分母初始化为2,拉普拉斯平滑
+ for i in range(numTrainDocs):
+ if trainCategory[i] == 1: #统计属于侮辱类的条件概率所需的数据,即P(w0|1),P(w1|1),P(w2|1)···
+ p1Num += trainMatrix[i]
+ p1Denom += sum(trainMatrix[i])
+ else: #统计属于非侮辱类的条件概率所需的数据,即P(w0|0),P(w1|0),P(w2|0)···
+ p0Num += trainMatrix[i]
+ p0Denom += sum(trainMatrix[i])
+ p1Vect = np.log(p1Num/p1Denom) #取对数,防止下溢出
+ p0Vect = np.log(p0Num/p0Denom)
+ return p0Vect,p1Vect,pAbusive #返回属于侮辱类的条件概率数组,属于非侮辱类的条件概率数组,文档属于侮辱类的概率
+
+
+"""
+函数说明:朴素贝叶斯分类器分类函数
+
+Parameters:
+ vec2Classify - 待分类的词条数组
+ p0Vec - 侮辱类的条件概率数组
+ p1Vec -非侮辱类的条件概率数组
+ pClass1 - 文档属于侮辱类的概率
+Returns:
+ 0 - 属于非侮辱类
+ 1 - 属于侮辱类
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
+ p1 = sum(vec2Classify * p1Vec) + np.log(pClass1) #对应元素相乘。logA * B = logA + logB,所以这里加上log(pClass1)
+ p0 = sum(vec2Classify * p0Vec) + np.log(1.0 - pClass1)
+ #print("{0:%.3f},{1:%.3f}".format(p1, p0))
+ #print(p1)
+
+ if p1 > p0:
+ return 1
+ else:
+ return 0
+
+"""
+函数说明:测试朴素贝叶斯分类器
+
+Parameters:
+ 无
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-08-12
+"""
+def testingNB():
+ listOPosts, listClasses = loadDataSet() #创建实验样本
+ myVocabList = createVocabList(listOPosts) #创建词汇表
+ trainMat=[]
+ for postinDoc in listOPosts:
+ trainMat.append(setOfWords2Vec(myVocabList, postinDoc)) #将实验样本向量化
+ p0V,p1V,pAb = trainNB0(np.array(trainMat),np.array(listClasses)) #训练朴素贝叶斯分类器
+ testEntry = ['love', 'my', 'dalmation'] #测试样本1
+ thisDoc = np.array(setOfWords2Vec(myVocabList, testEntry)) #测试样本向量化
+ if classifyNB(thisDoc,p0V,p1V,pAb):
+ print(testEntry,'属于侮辱类') #执行分类并打印分类结果
+ else:
+ print(testEntry,'属于非侮辱类') #执行分类并打印分类结果
+ testEntry = ['stupid', 'garbage'] #测试样本2
+
+ thisDoc = np.array(setOfWords2Vec(myVocabList, testEntry)) #测试样本向量化
+ if classifyNB(thisDoc,p0V,p1V,pAb):
+ print(testEntry,'属于侮辱类') #执行分类并打印分类结果
+ else:
+ print(testEntry,'属于非侮辱类') #执行分类并打印分类结果
+
+if __name__ == '__main__':
+ testingNB()
\ No newline at end of file
Index: .idea/inspectionProfiles/Project_Default.xml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- .idea/inspectionProfiles/Project_Default.xml (date 1565771114719)
+++ .idea/inspectionProfiles/Project_Default.xml (date 1565771114719)
@@ -0,0 +1,89 @@
+<component name="InspectionProjectProfileManager">
+ <profile version="1.0">
+ <option name="myName" value="Project Default" />
+ <inspection_tool class="CommandLineInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyAbstractClassInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyArgumentListInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyAssignmentToLoopOrWithParameterInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyAsyncCallInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyAttributeOutsideInitInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyBroadExceptionInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyByteLiteralInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyCallByClassInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyCallingNonCallableInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyChainedComparisonsInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyClassHasNoInitInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyComparisonWithNoneInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDataclassInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDecoratorInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDefaultArgumentInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDeprecationInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDictCreationInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDictDuplicateKeysInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDocstringTypesInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyDunderSlotsInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyExceptClausesOrderInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyExceptionInheritInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyFromFutureImportInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyGlobalUndefinedInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyInconsistentIndentationInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyIncorrectDocstringInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyInitNewSignatureInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyInterpreterInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyListCreationInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyMethodFirstArgAssignmentInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyMethodMayBeStaticInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyMethodOverridingInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyMethodParametersInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyMissingConstructorInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyNamedTupleInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyNestedDecoratorsInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyNonAsciiCharInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyNoneFunctionAssignmentInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyOldStyleClassesInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyOverloadsInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyPackageRequirementsInspection" enabled="false" level="WARNING" enabled_by_default="false">
+ <option name="ignoredPackages">
+ <value>
+ <list size="0" />
+ </value>
+ </option>
+ </inspection_tool>
+ <inspection_tool class="PyPep8Inspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyPep8NamingInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyPropertyAccessInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyPropertyDefinitionInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyProtectedMemberInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyProtocolInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyRedeclarationInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyRedundantParenthesesInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyReturnFromInitInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PySetFunctionToLiteralInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyShadowingBuiltinsInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyShadowingNamesInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PySimplifyBooleanCheckInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PySingleQuotedDocstringInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyStatementEffectInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyStringExceptionInspection" enabled="false" level="ERROR" enabled_by_default="false" />
+ <inspection_tool class="PyStringFormatInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyStubPackagesAdvertiser" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyStubPackagesCompatibilityInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PySuperArgumentsInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTestParametrizedInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTrailingSemicolonInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTupleAssignmentBalanceInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTupleItemAssignmentInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTypeCheckerInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyTypeHintsInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyUnboundLocalVariableInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyUnnecessaryBackslashInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyUnreachableCodeInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyUnresolvedReferencesInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+ <inspection_tool class="PyUnusedLocalInspection" enabled="false" level="WEAK WARNING" enabled_by_default="false">
+ <option name="ignoreTupleUnpacking" value="true" />
+ <option name="ignoreLambdaParameters" value="true" />
+ <option name="ignoreLoopIterationVariables" value="true" />
+ <option name="ignoreVariablesStartingWithUnderscore" value="true" />
+ </inspection_tool>
+ </profile>
+</component>
\ No newline at end of file
Index: 02_kNN/kNN.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 02_kNN/kNN.py (date 1565688082454)
+++ 02_kNN/kNN.py (date 1565688082454)
@@ -0,0 +1,58 @@
+#!/usr/bin/python3.7
+# -*- coding: utf-8 -*-
+"""
+@Time :2019/7/29 8:52
+
+@Author :Yuki
+
+@FileName :kNN.py
+
+@E-mail :fujii20180311@foxmail.com
+"""
+
+import numpy as np
+import operator
+
+def createDataSet():
+ #四组二维特征
+ group = np.array([[1, 101], [5, 89], [108, 5], [115, 8]])
+ #四组特征的标签
+ labels = ['爱情片', '爱情片', '动作片', '动作片']
+ return group, labels
+
+def classify0(inX, dataSet, labels, k):
+ #numpy函数shape[0]返回dataSet的行数
+ dataSetSize = dataSet.shape[0]
+ #在列向量方向上重复inX共1次(横向),行向量方向上重复inX共dataSetSize次(纵向)
+ diffMat = np.tile(inX, (dataSetSize, 1)) - dataSet
+ #二维特征相减后平方
+ sqDiffMat = diffMat**2
+ #sum()所有元素相加,sum(0)列相加,sum(1)行相加
+ sqDistances = sqDiffMat.sum(axis=1)
+ #开方,计算出距离
+ distances = sqDistances**0.5
+ #返回distances中元素从小到大排序后的索引值
+ sortedDistIndices = distances.argsort()
+ #定一个记录类别次数的字典
+ classCount = {}
+ for i in range(k):
+ #取出前k个元素的类别
+ voteIlabel = labels[sortedDistIndices[i]]
+ #dict.get(key,default=None),字典的get()方法,返回指定键的值,如果值不在字典中返回默认值。
+ #计算类别次数
+ classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
+ #python3中用items()替换python2中的iteritems()
+ #key=operator.itemgetter(1)根据字典的值进行排序
+ #key=operator.itemgetter(0)根据字典的键进行排序
+ #reverse降序排序字典
+ sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
+ print(sortedClassCount)
+ #返回次数最多的类别,即所要分类的类别
+ return sortedClassCount[0][0]
+if __name__ == '__main__':
+ #创建数据集
+ group, labels = createDataSet()
+ #打印数据集
+ print(group.shape)
+ print(labels)
+ print(classify0([10, 90], group, labels, 2))
\ No newline at end of file
Index: 02_kNN/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 02_kNN/__init__.py (date 1565488580438)
+++ 02_kNN/__init__.py (date 1565488580438)
@@ -0,0 +1,417 @@
+#!/usr/bin/python3.7
+# -*- coding: utf-8 -*-
+"""
+@Time :2019/7/28 19:59
+
+@Author :Yuki
+
+@FileName :__init__.py.py
+
+@E-mail :fujii20180311@foxmail.com
+"""
+# -*- coding: UTF-8 -*-
+from matplotlib.font_manager import FontProperties
+import matplotlib.pyplot as plt
+from math import log
+import operator
+
+"""
+函数说明:计算给定数据集的经验熵(香农熵)
+
+Parameters:
+ dataSet - 数据集
+Returns:
+ shannonEnt - 经验熵(香农熵)
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def calcShannonEnt(dataSet):
+ numEntires = len(dataSet) #返回数据集的行数
+ labelCounts = {} #保存每个标签(Label)出现次数的字典
+ for featVec in dataSet: #对每组特征向量进行统计
+ currentLabel = featVec[-1] #提取标签(Label)信息
+ if currentLabel not in labelCounts.keys(): #如果标签(Label)没有放入统计次数的字典,添加进去
+ labelCounts[currentLabel] = 0
+ labelCounts[currentLabel] += 1 #Label计数
+ shannonEnt = 0.0 #经验熵(香农熵)
+ for key in labelCounts: #计算香农熵
+ prob = float(labelCounts[key]) / numEntires #选择该标签(Label)的概率
+ shannonEnt -= prob * log(prob, 2) #利用公式计算
+ return shannonEnt #返回经验熵(香农熵)
+
+"""
+函数说明:创建测试数据集
+
+Parameters:
+ 无
+Returns:
+ dataSet - 数据集
+ labels - 特征标签
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-20
+"""
+def createDataSet():
+ dataSet = [[0, 0, 0, 0, 'no'], #数据集
+ [0, 0, 0, 1, 'no'],
+ [0, 1, 0, 1, 'yes'],
+ [0, 1, 1, 0, 'yes'],
+ [0, 0, 0, 0, 'no'],
+ [1, 0, 0, 0, 'no'],
+ [1, 0, 0, 1, 'no'],
+ [1, 1, 1, 1, 'yes'],
+ [1, 0, 1, 2, 'yes'],
+ [1, 0, 1, 2, 'yes'],
+ [2, 0, 1, 2, 'yes'],
+ [2, 0, 1, 1, 'yes'],
+ [2, 1, 0, 1, 'yes'],
+ [2, 1, 0, 2, 'yes'],
+ [2, 0, 0, 0, 'no']]
+ labels = ['年龄', '有工作', '有自己的房子', '信贷情况'] #特征标签
+ return dataSet, labels #返回数据集和分类属性
+
+"""
+函数说明:按照给定特征划分数据集
+
+Parameters:
+ dataSet - 待划分的数据集
+ axis - 划分数据集的特征
+ value - 需要返回的特征的值
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def splitDataSet(dataSet, axis, value):
+ retDataSet = [] #创建返回的数据集列表
+ for featVec in dataSet: #遍历数据集
+ if featVec[axis] == value:
+ reducedFeatVec = featVec[:axis] #去掉axis特征
+ reducedFeatVec.extend(featVec[axis+1:]) #将符合条件的添加到返回的数据集
+ retDataSet.append(reducedFeatVec)
+ return retDataSet #返回划分后的数据集
+
+"""
+函数说明:选择最优特征
+
+Parameters:
+ dataSet - 数据集
+Returns:
+ bestFeature - 信息增益最大的(最优)特征的索引值
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-20
+"""
+def chooseBestFeatureToSplit(dataSet):
+ numFeatures = len(dataSet[0]) - 1 #特征数量
+ baseEntropy = calcShannonEnt(dataSet) #计算数据集的香农熵
+ bestInfoGain = 0.0 #信息增益
+ bestFeature = -1 #最优特征的索引值
+ for i in range(numFeatures): #遍历所有特征
+ #获取dataSet的第i个所有特征
+ featList = [example[i] for example in dataSet]
+ uniqueVals = set(featList) #创建set集合{},元素不可重复
+ newEntropy = 0.0 #经验条件熵
+ for value in uniqueVals: #计算信息增益
+ subDataSet = splitDataSet(dataSet, i, value) #subDataSet划分后的子集
+ prob = len(subDataSet) / float(len(dataSet)) #计算子集的概率
+ newEntropy += prob * calcShannonEnt(subDataSet) #根据公式计算经验条件熵
+ infoGain = baseEntropy - newEntropy #信息增益
+ # print("第%d个特征的增益为%.3f" % (i, infoGain)) #打印每个特征的信息增益
+ if (infoGain > bestInfoGain): #计算信息增益
+ bestInfoGain = infoGain #更新信息增益,找到最大的信息增益
+ bestFeature = i #记录信息增益最大的特征的索引值
+ return bestFeature #返回信息增益最大的特征的索引值
+
+
+"""
+函数说明:统计classList中出现此处最多的元素(类标签)
+
+Parameters:
+ classList - 类标签列表
+Returns:
+ sortedClassCount[0][0] - 出现此处最多的元素(类标签)
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def majorityCnt(classList):
+ classCount = {}
+ for vote in classList: #统计classList中每个元素出现的次数
+ if vote not in classCount.keys():classCount[vote] = 0
+ classCount[vote] += 1
+ sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True) #根据字典的值降序排序
+ return sortedClassCount[0][0] #返回classList中出现次数最多的元素
+
+"""
+函数说明:创建决策树
+
+Parameters:
+ dataSet - 训练数据集
+ labels - 分类属性标签
+ featLabels - 存储选择的最优特征标签
+Returns:
+ myTree - 决策树
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-25
+"""
+def createTree(dataSet, labels, featLabels):
+ classList = [example[-1] for example in dataSet] #取分类标签(是否放贷:yes or no)
+ if classList.count(classList[0]) == len(classList): #如果类别完全相同则停止继续划分
+ return classList[0]
+ if len(dataSet[0]) == 1: #遍历完所有特征时返回出现次数最多的类标签
+ return majorityCnt(classList)
+ bestFeat = chooseBestFeatureToSplit(dataSet) #选择最优特征
+ bestFeatLabel = labels[bestFeat] #最优特征的标签
+ featLabels.append(bestFeatLabel)
+ myTree = {bestFeatLabel:{}} #根据最优特征的标签生成树
+ del(labels[bestFeat]) #删除已经使用特征标签
+ featValues = [example[bestFeat] for example in dataSet] #得到训练集中所有最优特征的属性值
+ uniqueVals = set(featValues) #去掉重复的属性值
+ for value in uniqueVals: #遍历特征,创建决策树。
+ myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), labels, featLabels)
+ return myTree
+
+"""
+函数说明:获取决策树叶子结点的数目
+
+Parameters:
+ myTree - 决策树
+Returns:
+ numLeafs - 决策树的叶子结点的数目
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def getNumLeafs(myTree):
+ numLeafs = 0 #初始化叶子
+ firstStr = next(iter(myTree)) #python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0]
+ secondDict = myTree[firstStr] #获取下一组字典
+ for key in secondDict.keys():
+ if type(secondDict[key]).__name__=='dict': #测试该结点是否为字典,如果不是字典,代表此结点为叶子结点
+ numLeafs += getNumLeafs(secondDict[key])
+ else: numLeafs +=1
+ return numLeafs
+
+"""
+函数说明:获取决策树的层数
+
+Parameters:
+ myTree - 决策树
+Returns:
+ maxDepth - 决策树的层数
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def getTreeDepth(myTree):
+ maxDepth = 0 #初始化决策树深度
+ firstStr = next(iter(myTree)) #python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0]
+ secondDict = myTree[firstStr] #获取下一个字典
+ for key in secondDict.keys():
+ if type(secondDict[key]).__name__=='dict': #测试该结点是否为字典,如果不是字典,代表此结点为叶子结点
+ thisDepth = 1 + getTreeDepth(secondDict[key])
+ else: thisDepth = 1
+ if thisDepth > maxDepth: maxDepth = thisDepth #更新层数
+ return maxDepth
+
+"""
+函数说明:绘制结点
+
+Parameters:
+ nodeTxt - 结点名
+ centerPt - 文本位置
+ parentPt - 标注的箭头位置
+ nodeType - 结点格式
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def plotNode(nodeTxt, centerPt, parentPt, nodeType):
+ arrow_args = dict(arrowstyle="<-") #定义箭头格式
+ font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14) #设置中文字体
+ '''
+ Axes.annotate(s, xy, *args, **kwargs)
+ s:注释文本的内容
+ xy:被注释的坐标点,二维元组形如(x,y)
+ xytext:注释文本的坐标点,也是二维元组,默认与xy相同
+ xycoords:被注释点的坐标系属性,允许输入的值如下
+ 'figure points' 以绘图区左下角为参考,单位是点数
+ 'figure pixels' 以绘图区左下角为参考,单位是像素数
+ 'figure fraction' 以绘图区左下角为参考,单位是百分比
+ 'axes points' 以子绘图区左下角为参考,单位是点数(一个figure可以有多个axex,默认为1个)
+ 'axes pixels' 以子绘图区左下角为参考,单位是像素数
+ 'axes fraction' 以子绘图区左下角为参考,单位是百分比
+ 'data' 以被注释的坐标点xy为参考 (默认值)
+ 'polar' 不使用本地数据坐标系,使用极坐标系
+ arrowprops:箭头的样式,dict(字典)型数据,如果该属性非空,
+ 则会在注释文本和被注释点之间画一个箭头。如果不设置'arrowstyle'
+ 关键字,则允许包含以下关键字:
+ width 箭头的宽度(单位是点)
+ headwidth 箭头头部的宽度(点)
+ headlength 箭头头部的长度(点)
+ shrink 箭头两端收缩的百分比(占总长)
+ ? 任何 matplotlib.patches.FancyArrowPatch中的关键字
+ bbox:设置边框样式
+
+ '''
+ createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction', #绘制结点
+ xytext=centerPt, textcoords='axes fraction',
+ va="center", ha="center", bbox=nodeType, arrowprops=arrow_args, FontProperties=font)
+
+"""
+函数说明:标注有向边属性值
+
+Parameters:
+ cntrPt、parentPt - 用于计算标注位置
+ txtString - 标注的内容
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24
+"""
+def plotMidText(cntrPt, parentPt, txtString):
+ xMid = (parentPt[0]-cntrPt[0])/2.0 + cntrPt[0] #计算标注位置
+ yMid = (parentPt[1]-cntrPt[1])/2.0 + cntrPt[1]
+ createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)
+
+"""
+函数说明:绘制决策树
+
+Parameters:
+ myTree - 决策树(字典)
+ parentPt - 标注的内容
+ nodeTxt - 结点名
+Returns:
+ 无
+Author:
+ Jack Cui
+Blog:
+ http://blog.csdn.net/c406495762
+Modify:
+ 2017-07-24