jagomart
digital resources
picture1_Data Mining Notes 180793 | Lecture6proofs


 140x       Filetype PDF       File size 0.15 MB       Source: www.ida.liu.se


File: Data Mining Notes 180793 | Lecture6proofs
lecture notes 732a75 advanced data mining tddd41 data mining clustering and association analysis jose m pena ida linkoping university sweden 1 correctness of the apriori algorithm the proof of correctness ...

icon picture PDF Filetype PDF | Posted on 30 Jan 2023 | 2 years ago
Partial capture of text on file.
                                                                  LECTURE NOTES
                                                    732A75 ADVANCED DATA MINING
                     TDDD41 DATA MINING - CLUSTERING AND ASSOCIATION ANALYSIS
                                                                                       ˜
                                                                        JOSE M. PENA
                                                                     ¨
                                                        IDA, LINKOPING UNIVERSITY, SWEDEN
                                                  1. Correctness of the Apriori algorithm
                    The proof of correctness is not unique. You can find one proof in the article by Agrawal and
                 Srikant available from the course website. Our own alternative proof can be found below.
                    Weprove by induction on k that the apriori algorithm is correct. That is, we prove the result for
                 k = 1 and then for k under the assumption that the algorithm is correct up to k−1. Combining this
                 two facts, we can conclude that the algorithm is correct for any k. First, recall the apriori algorithm.
                                     Algorithm: apriori(D, minsup)
                                     Input: A transactional database D and the minimum support minsup.
                                     Output: All the large itemsets in D.
                                 1 L1=˜large 1-itemsets 
                                 2 for ˆk = 2;L          ≠∅;k++ do
                                                    k−1
                                 3      Ck = apriori-gen(Lk−1)              // Generate candidate large k-itemsets
                                 4      for all t ∈ D do
                                 5          for all c ∈ Ck such that c ∈ t do
                                 6             c.count++
                                 7      Lk = ˜c ∈ CkSc.count ≥ minsup
                                 8 return ⋃kLk
                    Trivial case: The algorithm is correct for k = 1 by line 1.
                    Induction hypothesis: Assume that the algorithm is correct up to k − 1. We now prove that
                 the algorithm is correct for k. It suffices to prove that Lk ⊆ Ck in line 3, because lines 4-7 simply
                 count the frequency of the candidates and, thus, nothing can go wrong there. Recall the apriori-gen
                 function.
                                                     Algorithm: apriori-gen(L              )
                                                                                       k−1
                                                     Input: Large ˆk−1-itemsets.
                                                     Output: A superset of L .
                                                                                      k
                                                 1 Ck=∅                                       // Self-join
                                                 2 for all I,J ∈ Lk−1 do
                                                 3      if I  =J ,...,I        =J       and I       1 then call genrules(l , am−1, minconf)
                                                        k
                 Weprove by contradiction that the rule generation algorithm is correct. Assume to the contrary
              that the algorithm missed a rule. Let a         →l ∖a         denote one of the missing rules with the
                                                          m−1     k    m−1
              largest antecedent. Note that that we wrongly missed the rule implies that l has minimum support
                                                                                               k
              and, thus, it is outputted by the apriori algorithm since this is correct, as proven in the previous
              section. Then, the rule generation algorithm cannot have missed the rule when m = k, because
              m=konly when we called genrules(l , l , minconf), and then the rule is evaluated and outputted
                                                      k  k
              in lines 1-5.
                 Therefore, we must have missed the rule in one of the subsequent calls to genrules, i.e. when
              m
						
									
										
									
																
													
					
The words contained in this file might help you see if this file matches what you are looking for:

...Lecture notes a advanced data mining tddd clustering and association analysis jose m pena ida linkoping university sweden correctness of the apriori algorithm proof is not unique you can nd one in article by agrawal srikant available from course website our own alternative be found below weprove induction on k that correct we prove result for then under assumption up to combining this two facts conclude any first recall d minsup input transactional database minimum support output all large itemsets l do ck gen lk generate candidate t c such count cksc return klk trivial case line hypothesis assume now it suces because lines simply frequency candidates thus nothing go wrong there function superset self join i j if call genrules am minconf contradiction rule generation contrary missed let denote missing rules with largest antecedent note wrongly implies has outputted since as proven previous section cannot have when konly called evaluated therefore must subsequent calls e...

no reviews yet
Please Login to review.