[llvm tbaa] Tbaa analysis error in the case of forced type conversion

I found a bug In the process of using clang. After traced this bug, The implementation of tbaa confuses me very much.

The original c++ code as following:

#include <stdio.h>                                                              
#include <stdlib.h>                                                             
#include <iostream>                                                             
#define DATA_NUM 100                                                            
                                                                                
int main () {                                                                   
  uint16_t buf[DATA_NUM] = {0};                                                 
  *((uint32_t*)buf) = 666;                                                      
  for (int i = 0; i < DATA_NUM; i++) {                                          
    printf("%d ", buf[i]);                                                      
  }                                                                             
  printf("\n");                                                                 
  return 0;                                                                     
} 

The expected output of this code should be:

 666 0 0 0 ...... 

This is indeed in the case -O0/-O1.
When -O2/-O3, output of this code turns to:

 0 0 0 0 ...... 

The store operation was deleted in DSE pass! The analysis result of memory ssa:

define dso_local i32 @main() local_unnamed_addr #4 {                            
entry:                                                                          
  %buf = alloca [100 x i16], align 16                                           
  %0 = bitcast [100 x i16]* %buf to i8*                                         
; 1 = MemoryDef(liveOnEntry)                                                    
  call void @llvm.lifetime.start.p0i8(i64 200, i8* nonnull %0) #8               
; 2 = MemoryDef(1)                                                              
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 16 dereferenceable(200) %0, i8 0, i64 200, i1 false)
  %1 = bitcast [100 x i16]* %buf to i32*                                        
; 3 = MemoryDef(2)                                                              
  store i32 666, i32* %1, align 16, !tbaa !3                                    
  br label %for.body                                                            
                                                                                
for.cond.cleanup:                                 ; preds = %for.body           
; 4 = MemoryDef(6)                                                              
  %putchar = tail call i32 @putchar(i32 10)                                     
; 5 = MemoryDef(4)                                                              
  call void @llvm.lifetime.end.p0i8(i64 200, i8* nonnull %0) #8                 
  ret i32 0                                                                     
                                                                                
for.body:                                         ; preds = %entry, %for.body   
; 7 = MemoryPhi({entry,3},{for.body,6})                                         
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]          
  %arrayidx = getelementptr inbounds [100 x i16], [100 x i16]* %buf, i64 0, i64 %indvars.iv
; MemoryUse(2) MayAlias                                                         
  %2 = load i16, i16* %arrayidx, align 2, !tbaa !7                              
  %conv = zext i16 %2 to i32                                                    
; 6 = MemoryDef(7)                                                              
  %call = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %conv)
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1                             
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100                             
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !9  
}  

The load operation in MSSA is “MemoryUse(2)” but not “MemoryUse(3)” cause this wrong elimination of store. This is becase TBAA suggests load operation on tag “!tbaa !7” must not alias with “!tbaa !3”. The program’s tbaa tag as following:

!llvm.module.flags = !{!0, !1}                                                  
!llvm.ident = !{!2}                                                             
                                                                                
!0 = !{i32 1, !"wchar_size", i32 4}                                             
!1 = !{i32 7, !"uwtable", i32 1}                                                
!2 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git cf78715cae7244406334242199d0ff031248543d)"}
!3 = !{!4, !4, i64 0}                                                           
!4 = !{!"int", !5, i64 0}                                                       
!5 = !{!"omnipotent char", !6, i64 0}                                           
!6 = !{!"Simple C++ TBAA"}                                                      
!7 = !{!8, !8, i64 0}                                                           
!8 = !{!"short", !5, i64 0}                                                     
!9 = distinct !{!9, !10}                                                        
!10 = !{!"llvm.loop.mustprogress"}

It is obvious that “!tbaa !7” and “!tbaa !3” alias. So i start to read tbaa’s implementation, this is how i thought key function “matchAccessTags” works:

  1. If tag A and B is the same, they are allowed to overlap.
  2. If any tag don’t have information, they are allowed to overlap.
  3. Find common ancesstor in tag tree, if not found, they are allowed to overlap to be conservative.
  4. Check if one tag is subobject of another, if so, check the offset to judge is they are alias.
  5. The two tags is not alias.

So here is my question, if two tag is deep nested in struct under same common type, tbaa should always tells no alias bettewn then? Where did I get it wrong?

FYI, gcc-7.5.0 also failed in the case.