Description of the Nullsoft Video (NSV) Format
by Mike Melanson (melanson@pcisys.net)
v1.0: May 19, 2003

// some additions by Vitalijus Slavinskas, 2003.01.20
// 2003.01.23 finally understood toc structure //01.24 maybe not :D

  Copyright (c) 2003 Mike Melanson
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  A copy of the license is included in the section entitled "GNU
  Free Documentation License".


Contents
--------
 * Introduction
 * File Format
 * References
 * Acknowledgements
 * Changelog
 * GNU Free Documentation License


Introduction
------------
Nullsoft, the entity behind the ubiquitous Winamp MP3 / general-purpose
multimedia player, offers a multimedia container format designed with
network streaming in mind. The format is called Nullsoft Video and bears
the file extension '.nsv'.


File Format
-----------
All multi-byte numbers are in little endian format.

A NSV file has the following overall structure:

  'NSVf' optional metadata chunk
  'NSVs' audio/video data chunk
  [NSVs chunk]
    ..
  [NSVs chunk]

NSV files may start with an optional info and index chunk that is marked
by the characters 'NSVf'. The chunk has the following layout:

  bytes 0-3    'NSVf' signature
  bytes 4-7    size of chunk, including signature and size fields
  bytes 8-11   total size of file
  bytes 12-15  playtime in msec of file
  bytes 16-19  length of info strings
  bytes 20-23  number of table entries // number of nsvs*2+1
  bytes 24-27  number of table entries //number of nsvs
    [arbitrary length info string]
      ..
    [arbitrary length info string]
    [data table]

Following the first 28 bytes of the info block is any number of
arbitrary length info strings. The strings take the format of:

  STRING=`value`

The string is delimited by the backtick (`) character, a.k.a. ASCII
0x60. Known strings include TITLE and ASPECT for the NSV file's title
and aspect ratio, respectively. Examples:

  TITLE=`deer video`
  ASPECT=`1.125`

// IMHO only values are delimited and only with 0x01 (or how it's written in C)
// between tags is space 0x20

Following the metadata strings is a table of incrementing 32-bit
numbers. The number of entries in this table is specified in the NSVf
chunk header, apparently twice. The meaning of these numbers is unclear.

// it's offset's in data stream, they point to 'NSVs', maybe mark keyframes...
// later something called 'TOC2'
// and after this
// array of frame number in every nsvs
// this is valid for winamp genrated seek information, but encoding tools makes *something* :)

The meat of a NSV file (encoded audio and video chunks) is stored in a
series of NSVs data chunks. Each NSVs chunk can contain multiple video
and/or audio chunks.  A NSVs chunk has the following header:

  bytes 0-3    'NSVs' signature
  bytes 4-7    video codec fourcc
  bytes 8-11   audio codec fourcc
  bytes 12-13  video width, divisible by 16
  bytes 14-15  video height, divisible by 16
  byte 16      framerate
     bit 7     1 = lower 7 bits indicate a standard fractional framerate
               0 = lower 7 bits indicate an absolute framerate
     bits 6-0  framerate
  bytes 17-18  unknown

If a file does not have audio or video, the corresponding codec fourcc
will be 'NONE'. Common video fourccs are 'VP31' and 'VP3 ' which
indicate On2 VP3 video. Common audio fourccs are 'MP3 ' for MPEG layer
III audio and 'PCM ' for raw PCM audio.

The MSB of byte 16 appears to indicate that the lower 7 bits represent a
standard fractional framerate. For example, 0x81 equates to 29.97 fps,
0x85 equates to 14.98 fps, while 0x0F simply represents 15 fps.


PASCAL EXAMPLE:		// at least  i didn't know standart fractional framerate and such so it's clearer maybe

fr :byte;     // framerate
ft :whatever; // frametime msec
           if fr>$79 then
           begin
               case (fr and 3) of
                  0:dal:=3000;   // 30.00 baze
                  1:dal:=3003;   // 29.97
                  2:dal:=3600;   // 25.00
                  3:dal:=3753;   // 23.98
               end;
               if fr<$C0 then
                  ft := (dal * ((fr xor $80) shr 2 +1) ) div 90;
               else
                  ft := dal div (90 * ((fr xor $C0) shr 2 +1) );
           end
           else
               ft:=1000 div fr;


After the NSVs header are 5 bytes which provide the following length
information:

  v? vv vv aa aa

The lower nibble of byte 0 is unknown. The upper nibble of byte 0, along
with bytes 1 and 2 comprise the length of the video data in bytes. Since
there are 5 hex characters to describe the length, the maximum video
chunk size is 2^20 = 1 megabyte. Bytes 3-4 are the 16-bit length of the
audio chunk. Consider this example:

  80 B7 00 D1 00

The first 3 bytes, 80 B7 00, are rearranged in little endian form as
0x00B780. Then the number is shifted right by 4 to give a video chunk
length of 0xB78 bytes. The audio chunk length bytes are D1 00, or 0x00D1
in little endian.

After the first video/audio chunk pair in a NSV file, there will be a
BEEF marker before the next pair. That is, the hex number 0xBEEF encoded
in little endian (EF BE). After the marker is another 5 bytes encoding
the video and audio chunk lengths as described above, followed by
another frame of video and audio data. This BEEF-length-data pattern
continues until the end of the NSVs chunk.

A small note on PCM audio: If the audio data is encoded with fourcc
'PCM ', each audio data chunk will contain the following 4-byte header:

  byte 0      unknown
  byte 1      number of channels
  bytes 2-3   sample rate

// there should be added info about subtitles
// http://forums.winamp.com/showthread.php?s=a1811303139d1c1301779730986ada93&threadid=158106&highlight=subt


References
----------
NSV website, home to samples and SDK:
http://www.nullsoft.com/nsv/


Acknowledgements
----------------
Thanks to Roberto Togni (rtogni at bresciaonline dot it) and Arpad
"A'rpi" Gereoffy (arpi at mplayerhq dot hu) for further investigation
into the format.


Changelog
---------
v1.0: May 19, 2003
- sorted out NSVs chunk formatting
- document promoted to 1.0 status since enough information has been
uncovered to create functional demuxers

v0.2: March 13, 2003
- licensed under GNU Free Documentation License
- expanded information regarding NSVs data chunks

v0.1: February 11, 2003
- initial release


GNU Free Documentation License
------------------------------
see http://www.gnu.org/licenses/fdl.html
