How to do
Version Control
Written by Flavio Adalberto Kubota   

This week I've been studying about how to save data on database because I am concerned about performance vs disc space. I have discussed about this with my mentor.
A choice could be store the whole content of each version. I have checked another software that use a version control system and all version are saved on database, but data is compressed. I did some tests on compression lib methods of PHP(zlib). I have tested three methods, but all results were similar. I got compression ration between about 50% and 60%. If there is a article of size 30k, and about 100 revisions, we got 3000k of data in a unique content without compression. In fact, I don't know how frequently an article is modified, but, in a large site, database size may increase fast. In the other hand, saving all data, to get any version is very fast, although large database do queries slower.
So, we thought about other alternatives. We agree in save only changes between versions.
I have studied about save only changes. I implemented a algorithm to apply a diff file to test performance. Apply many diffs in large contents may be slow. I will try to improve the algorithm but I can't improve asymptoticly. I have search some standards formats, and I think the normal format  may solve the problem. About database size, int the worst case, storing only changes is equal to store all data. But I think it is difficult to get the worst case. A disadvantage of this method is that there is a processing to get a specifically version. If we are is revision number 100 and we want get number 1, we have to apply 100 diff files on the content, once each diff is related to next version. To get performance, the data of published version and lastest version is also stored in database.

Another funcionality I've thinking is lock article and 'who is editing'. This funcionalities is just able with AJAX, and I'm thinking about this now to plan a database model. Also with AJAX, I planning autosave support, that save in a frequency but generate just one revision.
 

Well, I think planning is the most difficult part and the most important, many thing may depend of  how it was planned.

 

Show other articles of this author

36 Votes

1 Comment

Feed
  1. Check this blog post out - it is about the new Barracuda goodies for InnoDB storage by Oracle.

    Basically it has some really impressive (and surprising) effects on LOB storage.

Add Comment


    • >:o
    • :-[
    • :'(
    • :-(
    • :-D
    • :-*
    • :-)
    • :P
    • :\
    • 8-)
    • ;-)



    Click to get a new image.