Review Board 1.0.5.1

Massive fullindex speedup

Updated 5 months, 3 weeks ago

David Wolever Reviewers
Basie-Dev
None Basie SVN
Speed up fullindex by committing all the changes at once.

A fullindex now runs in ~1 minute on my laptop.
The testrunner is… Still running.
Review request changed
Updated 6 months, 2 weeks ago (February 20th, 2010, 8:27 a.m.)
  • 
    	  

    The testrunner is… Still running.
Review request changed
Updated 6 months, 2 weeks ago (February 20th, 2010, 8:28 a.m.)
  • Speed up fullindex by committing all the changes at once.

    Speed up fullindex by committing all the changes at once.
    
    A fullindex now runs in ~1 minute on my laptop.
Posted 6 months, 2 weeks ago (February 20th, 2010, 4:39 p.m.)

   

  
/trunk/apps/search/document.py (Diff revision 2)
 
Your comment here is redundant. You should instead use intention-revealing variable names such as: should_commit_writer. Boolean variables named my_* are never a good idea.
  1. What makes it redundant? If I changed the variable name to 'should_commit_writer', I'd need to explain why we should only commit the writer if we instantiate it.
    
    Also, what's bad about my_* variables? Not that I necessarily disagree with you, I've just never thought about it.
/trunk/apps/search/document.py (Diff revision 2)
 
Same as above.
Nit: trailing space.
  1. Fixed in the local version - I'll update the diff after the above is resolved.
Posted 6 months, 2 weeks ago (February 22nd, 2010, 10:15 p.m.)

   

  
/trunk/apps/search/document.py (Diff revision 2)
 
By redundant I mean that the code says what you've said in the comments in fewer words. 

I.e. 
"if my_writer: writer.commit()" == "If we own this 'writer', then commit it" 

You've just rewritten the entire conditional block in a comment :)

The second part "Assume someone else will be responsible for committing" is implied by the code. That part would be more useful if you said: "Class X is responsible for committing the writer".

The problem with your variable name is that it seems to exist solely for the purpose of indicating that this class should commit the writer but yet the name does not indicate the purpose of the variable. This is why you felt the need to add a comment to clarify it. The name should "fully and accurately describe what the variable represents" (McConnell, Code Complete 2).

What does a name like "my_anything" tell us about a boolean? What does it mean for a local boolean variable to be "mine"? Okay, perhaps it tells us that "anything" belongs to this class or scope or something. But why do we need to know that? Because the purpose of the my_anything variable is ambiguous, you'd need a comment to explain its existence. Meaningful variable names replace the need for excessive commenting.

However, this code is simple enough that you could have gotten away with that variable name. But the 2 extra lines of comments, especially the first line are not needed.
Review request changed
Updated 6 months, 2 weeks ago (February 24th, 2010, 2:47 p.m.)
Fixing some tests, updating variable names.
Posted 6 months, 1 week ago (March 2nd, 2010, 11:12 a.m.)
The code looks good. Today I have to update the server's whoosh index on my computer so I will test it and give you the ship it after that.
Ship it!
Posted 6 months, 1 week ago (March 2nd, 2010, 11:38 a.m.)
Flushing...: 1558/5193 [========================                                                          ] ETA:  00:01:23
Flushed run: [(StructFile(None), 149772)]
Flushing...: 3012/5193 [===============================================                                   ] ETA:  00:00:41
Flushed run: [(StructFile(None), 149772), (StructFile(None), 142365)]
Flushing...: 4570/5193 [========================================================================          ] ETA:  00:00:11
Flushed run: [(StructFile(None), 149772), (StructFile(None), 142365), (StructFile(None), 144381)]
MailMessage: 5193/5193 [==================================================================================] Time: 00:01:40
WikiPage: 4/4 [===========================================================================================] Time: 00:00:01
Flushing...ation: 347/354 [=============================================================================  ] ETA:  00:00:01
Flushed run: [(StructFile(None), 149772), (StructFile(None), 142365), (StructFile(None), 144381), (StructFile(None), 135231)]
ChatConversation: 354/354 [===============================================================================] Time: 00:00:56
Ticket: 294/294 [=========================================================================================] Time: 00:00:11
Writing index...
Flushed run: [(StructFile(None), 149772), (StructFile(None), 142365), (StructFile(None), 144381), (StructFile(None), 135231), (StructFile(None), 14340)]

real	4m2.267s
user	1m38.222s
sys	0m4.516s
A \n is necessary here.