mmap() files in Common Lisp

mmap() is a POSIX function used to map files or memory into the address space of a process. While Common Lisp provides functions to manipulate files, a lower level access is sometimes needed. For example, if you have multiple processes accessing the same file, or if you need random access to large files, mmap() can be useful.

Osicat provides access to various POSIX functions, including mmap(). In the following example, we will use Osicat and mmap() to read the ID3 version of an audio file.

Mapping a file is done by opening it with open(), reading its size with fstat(), then calling mmap(). Once the file has been mapped we can close the file descriptor, it’s not needed anymore.

(defun mmap-file (path)
  (let ((fd (osicat-posix:open path (logior osicat-posix:o-rdonly))))
    (unwind-protect
         (let* ((size (osicat-posix:stat-size (osicat-posix:fstat fd)))
                (addr (osicat-posix:mmap (cffi:null-pointer) size
                                         (logior osicat-posix:prot-read)
                                         (logior osicat-posix:map-private)
                                         fd 0)))
           (values addr size))
      (osicat-posix:close fd))))

The mmap-file function returns two values: the address of the memory mapping and its size.

Unmapping this chunk of memory is done with munmap():

(defun munmap-file (addr size)
  (osicat-posix:munmap addr size))

Let’s add a macro to safely map and unmap files:

(defmacro with-mmapped-file ((file addr size) &body body)
  (let ((original-addr (gensym "ADDR-"))
        (original-size (gensym "SIZE-")))
    `(multiple-value-bind (,addr ,size)
         (mmap-file ,file)
       (let ((,original-addr ,addr)
             (,original-size ,size))
         (unwind-protect
              (progn ,@body)
           (munmap-file ,original-addr ,original-size))))))

It’s now easy to read data in memory using CFFI functions. An ID3 header starts with 3 bytes containing the ASCII characters "ID3", followed by 2 bytes indicating the ID3 version used:

(defun file-id3-version (path)
  (with-mmapped-file (path addr size)
    (when (< size 5)
      (error "~A doesn't contain an ID3 header." path))
    (let ((id (cffi:foreign-string-to-lisp addr :count 3)))
      (unless (string= id "ID3")
        (error "~A doesn't contain an ID3 identifier." path)))
    (cffi:incf-pointer addr 3)
    (let ((version (cffi:mem-aref addr :ushort)))
      (case version
        (2 :id3v2-2.2.0)
        (3 :id3v2-2.3.0)
        (4 :id3v2-2.4.0)
        (t :unknown)))))

Thanks to Peder O. Klingenberg for pointing out safety issues in the with-mmapped-file macro.