Skip to content

Easter Progress

Throughout the Easter vac I have been working hard at the project to complete all of the implementation that I had not yet done.

Now I have a program that vaguely works to a poor level of accuracy when it comes to differently encoded video.

Currently it can scan a new video with 4 different parameters – the number of buckets used in the histogram, the number of blocks of histograms that will be compared, the percentage difference that a bucket needs to change by for the block histogram to be considered different and the percentage of the histograms that need to be different for there to be a cut perceived.

This uses either all three RGB colour spaces – or just the Y (which decreases the level of accuracy).

After this, the shot information is put into the database and then when a new shot is scanned in then it is compared to previously scanned video and hence compared to other shots.

The program then spits out all of the shots that it believes are similar enough, and then checks the next one to see if there is a chain of shots that are similar in which case they are put together into a clip.

There are a fair few bugs with this application so far but I am trying to iron them out, as well as gather all of my data to generate some decent output.

Things to do

Over the coming weeks I would like to set up full integration with the database. Both with the testing algorithm and the manual input interface.

Do so analysis of spatial locality with respect to the speedup unsafe block that I have been using.

Generate more test data to fill the database with.

Backup!

So my external HDD died yesterday, which lost me a days worth of work (not too bad since I can remember exactly what I did and how), but I did lose all of the video test data which is backed up – but in London so I will have to make a journey down one night to grab it and then bring it back up to work on it!

Still – could have been a lot worse!

Images of my application so far

More speedups

Currently it is taking about 2 seconds (on average 1998 milliseconds) to complete a comparison of two adjacent frames using their histograms with the getPixel() method.

I am going to rewrite the code to access the bitmap data and more quickly come to this result and will note of any speedups.

Comparison for 1 frame : 00:00:01.9610000 1961 Comparison for 1 frame : 00:00:01.9880000 1988 Comparison for 1 frame : 00:00:01.9790000 1979 Comparison for 1 frame : 00:00:01.9770000 1977 Comparison for 1 frame : 00:00:02.0370000 2037 Comparison for 1 frame : 00:00:01.9990000 1999 Comparison for 1 frame : 00:00:01.9720000 1972 Comparison for 1 frame : 00:00:02.0270000 2027 Comparison for 1 frame : 00:00:01.9720000 1972 Comparison for 1 frame : 00:00:02.0410000 2041 Comparison for 1 frame : 00:00:02.0280000 2028

EDIT

Speedup was increased to just 438 ms which is about a 4.6 times speedup!

Comparison for 1 frame : 00:00:00.4230000 Comparison for 1 frame : 00:00:00.4310000 Comparison for 1 frame : 00:00:00.4420000 Comparison for 1 frame : 00:00:00.4460000 Comparison for 1 frame : 00:00:00.4620000 Comparison for 1 frame : 00:00:00.4580000 Comparison for 1 frame : 00:00:00.4550000 Comparison for 1 frame : 00:00:00.4160000 Comparison for 1 frame : 00:00:00.4320000 Comparison for 1 frame : 00:00:00.4300000 Comparison for 1 frame : 00:00:00.4250000

SIFT – Scale-invariant feature transform

SIFT may be appropriate for my project – or at least parts of it may be transferrable.

http://en.wikipedia.org/wiki/Scale-invariant_feature_transform has a lot of very useful information

This technique is used to compare the same object in many different images, which is perhaps something that I could use, however I am looking for almost perfect video matching and so it may not be that useful as it seems very advanced and I am not sure whether I will have enough time to look at implementing this.

For example, I am not really caring that much about scaling and that much geometric distortion since the main motivation is advertisment detection – which shouldn’t be altered, however should this extend to using copyrighted material then people could simply scale the video to prevent this program from working!

So it might be worth looking into this – or at least considering it.

SIFT uses feature vectors and compares them to vectors in a database looking for the smallest euclidean distance away (taken from the image retrieval course) where I could create a basis which is dependent on features of the images/video – e.g. colour information/edges/length/average pixel colour change value etc…

However then we need to look into how we will compare or index this data as if we are looking over several thousand video clips, it might be worth doing a recursive ordered search – especially as there will be a lot of clips that don’t match! – which brings me onto another point – when scanning a video – we need to break it up into shots – should we add these shots to a database or not? How do we add clips/adverts to the database? Is it all done manually? Right now it is, should it be automatic and have a lot of shots that are irrelevant?!

Lowe used a modification of the k-d tree algorithm called the Best-bin-first search method [7] that can identify the nearest neighbors with high probability using only a limited amount of computation

Lowe used a ratio of distance from the closest neighbor to the distance of the second closest to determine if they were matching – but I think that would be inappropriate here – perhaps?! I will have to do some analysis on this as he found that a ratio of greater than 0.8 discarded 90% of false positives and only lost 5% of true negatives.

So now I have to work out how to classify the bins and what characteristics to use for the bins. I am thinking possibly use the length of the individual shots of a clip and see if they match – provided the shot algorithm is accurate across different video streams, another feature could be histogram and edge data, and perhaps number of pixels differing by more than a threashold after each frame change.

http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=790410

1. Object recognition from local scale-invariant features
Lowe, D.G.;
Computer Vision, 1999.

Speeding up pixel accessing and image processing

Using the standard GetPixel function with C#’s Bitmap structures it takes a long time to process individual images which is extremely annoying when trying to quickly scan just a short 30 second clip with approx 900 frames, and each pixel takes a while to get access to it:

This is due to GetPixel locking, then unlocking the bitmap into system memory every time GetPixel is called. The process of locking and unlocking the bitmap itself is what takes the time.

(http://www.improperargument.com/2009/03/speeding-up-getpixel-with-unsafe-code.html)

GetAllPixelsSafe  : 00:00:00.6670000
GetAllPixelsUnsafe: 00:00:00.1490000
GetAllPixelsSafe  : 00:00:00.6710000
GetAllPixelsUnsafe: 00:00:00.1520000
GetAllPixelsSafe  : 00:00:00.6750000
GetAllPixelsUnsafe: 00:00:00.1550000
GetAllPixelsSafe  : 00:00:00.6620000
GetAllPixelsUnsafe: 00:00:00.1540000

I managed to get an average speed up of

4.386131

just by using the unsafe code

GetAllPixelsSafe  : 00:00:00.6670000

GetAllPixelsUnsafe: 00:00:00.1490000

GetAllPixelsSafe  : 00:00:00.6710000

GetAllPixelsUnsafe: 00:00:00.1520000

GetAllPixelsSafe  : 00:00:00.6750000

GetAllPixelsUnsafe: 00:00:00.1550000

GetAllPixelsSafe  : 00:00:00.6620000

GetAllPixelsUnsafe: 00:00:00.1540000

Reducing the average time to get pixel data for a 720×480 image from 669 milliseconds to just 153 milliseconds (on average from my sample above)

This means re-writing all code that accesses image data! YAY! :)

http://www.bobpowell.net/lockingbits.htm helps too!

I also introduced some setpixel calls which increase the speed from an average of 670ms to 296 which is a speed up of over 2.2 times


SetAllPixelsSafe : 00:00:00.6560000 SetAllPixelsUnsafe: 00:00:00.3000000 SetAllPixelsSafe : 00:00:00.6550000 SetAllPixelsUnsafe: 00:00:00.2940000 SetAllPixelsSafe : 00:00:00.6540000 SetAllPixelsUnsafe: 00:00:00.2930000 SetAllPixelsSafe : 00:00:00.6550000 SetAllPixelsUnsafe: 00:00:00.2920000 SetAllPixelsSafe : 00:00:00.6760000 SetAllPixelsUnsafe: 00:00:00.2940000 SetAllPixelsSafe : 00:00:00.6600000 SetAllPixelsUnsafe: 00:00:00.2940000 SetAllPixelsSafe : 00:00:00.6580000 SetAllPixelsUnsafe: 00:00:00.2990000 SetAllPixelsSafe : 00:00:00.6630000 SetAllPixelsUnsafe: 00:00:00.3020000

Length of an advertisement

ISCI Code – for spot – letters & numbers -
ISCI Code: http://en.wikipedia.org/wiki/ISCI |
ISCI codes are usually printed on the video cassette label of a commercial
An ISCI code is usually a set of 8 digits, with the first four being alphabetic, and
the remaining four being numerical, in the format of XXXX1111.

Adverts are characterised with an ISCI code that could perhaps be loaded into my database used for scanning over new TV footage.

Most sources online seem to believe that TV advertisement is typically between 10 and 60 seconds today. Therefore I am going to take the minimum of 10 second clips to be the minimum match for now. I can always extend this for arbitrary length clips – however it makes more sense to define this rather relaxed condition early on to ensure I get a working project!

Database + Search Space

The database need not hold all of the video data, however perhaps just pointers to it.

I also need to determine how the data is going to be stored and searched across – perhaps using some sort of feature vector of the image/clip (length/rgb colour info/edges info stored in some easily comparable manner for a whole clip).

This means that when a clip is added to the database perhaps there can be a CLIPS table and it will need a Name(String), ID(int), Type(String – advert/movie/tv show), the VideoFileLocation(String) where it is in the storage, the TimeIntoVideo(int) of milliseconds or frame number of the start of the clip, the Length(int) of the number of frames or milliseconds the clip is and some sort of reference to duplicate clips – which could be constructed in a (doubly) linked list format whereby there is a NextDuplicate (and PreviousDuplicate) pointer to the ID of the duplicate clip so that it can easily be found.

There could also be a CHARACTERISTICS table which has such information to search over (e.g. length, histogram information (e.g. percentages of red in bucket 1,2,3,4 – which can easily be compared i.e. value is between a certain range), edge information, audio peaks?) with pointers to the IDs of the clips in the clips table.

Edge Detection Techniques

Potentially I could convert the images to a grayscale and then use a binary bitmap to do the edge detection and use that data to characterise the frame and then see if there is a huge difference between this and the next frame…

Alternatively I could also do shot detection in a sort of binary search which would speed things up as opposed to looking at all frames – but it might miss some shot changes!