Reading fuse.h
Preamble
Include guards
```c
/* lines 9-10 */
#ifndef _FUSE_H_
#define _FUSE_H_
```
This is a bit hard to read because there is no indentation. It turns out
`#ifndef` must always be accompanied by and `#endif`, so in this case if `_FUSE_H_`
has not been defined then the entire header will not be read. Presumably this
is an idiomatic way to avoid including the same header file multiple times?
Anyway, you can prove it by finding this line at the end of the file:
```c
/* line 1061 */
#endif /* _FUSE_H_ */
```
According to the SO question [Why are #ifndef and #define used in C++ header
files?](https://stackoverflow.com/a/1653965/4681998), they're called ["include
guards"](https://en.wikipedia.org/wiki/Include_guard). It's basically what we
guessed.
> Once the header is included, it checks if a unique value (in this case
> HEADERFILE\_H) is defined. Then if it's not defined, it defines it and
> continues to the rest of the page. When the code is included again, the
> first ifndef fails, resulting in a blank file. That prevents double
> declaration of any identifiers such as types, enums and static variables.
Version declaration
Then there is a big comment which is self contained and we can probably ignore:
```c
/* lines 12-24 */
/** @file
*
* This file defines the library interface of FUSE
*
* IMPORTANT: you should define FUSE_USE_VERSION before including this
* header. To use the newest API define it to 26 (recommended for any
* new application), to use the old API define it to 21 (default) 22
* or 25, to use the even older 1.X API define it to 11.
*/
#ifndef FUSE_USE_VERSION
#define FUSE_USE_VERSION 21
#endif
```
Header inclusions
Then there are several `#include` lines:
```c
/* lines 26-34 */
#include "fuse_common.h"
#include <fcntl.h>
#include <time.h>
#include <utime.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/statvfs.h>
#include <sys/uio.h>
```
Local vs System includes
For general information about the include directive in C, see https://en.wikipedia.org/wiki/Include_directive
The only relevant distinction is between the first line, `#include "fuse_common.h"`,
and the rest. The angle brackets in the header name imply that the file in
question is a system library. For example, I could find `<fcntl.h>` after a quick
google. It's an abbreviation of "file control" I guess, and it seems to define
the POSIX interace to the filesystem. For example,
> The \<fcntl.h\> header shall define the following requests and arguments for use by the functions fcntl() and open().
> https://pubs.opengroup.org/onlinepubs/007904875/basedefs/fcntl.h.html
So presumably if you want to use `open()` then you have to include this header. OK.
FUSE-specific information is presumably included in `fuse_common.h`. I had a
quick look at this file, and it included another file `fuse_opt.h`. I think
I'll come back to these if I feel like I need to.
Other local header files on my system
In fact, while I'm at it, here are all the headers I could find on my system:
```
$ tree /nix/store/6ivxlbrn6sxaqwnlihhfidmpl97smpf1-fuse-2.9.9/include/fuse/
/nix/store/6ivxlbrn6sxaqwnlihhfidmpl97smpf1-fuse-2.9.9/include/fuse/
├── cuse_lowlevel.h
├── fuse_common_compat.h
├── fuse_common.h
├── fuse_compat.h
├── fuse.h
├── fuse_lowlevel_compat.h
├── fuse_lowlevel.h
└── fuse_opt.h
```
You can probably find this location on any system by doing `find / -name fuse.h`.
extern "C"
The next bit of code worth explaining is
```c
/* lines 36-38 */
#ifdef __cplusplus
extern "C" {
#endif
```
This is another example where there's obvious nesting going on (I see no
closing `}`). Just guessing for a moment, I guess that C++ is allowed to
compile C code, but that it has to define a constant called `__cplusplus`. Then
it will enclose the whole file in `extern "C"`. We can verify it's enclosing
the whole file by going to the end of the file, and as promised:
lines 1057-1059
```
#ifdef __cplusplus
}
#endif
```
This is asked and answered by
[What is the effect of extern "C" in C++?](https://stackoverflow.com/questions/1041866/what-is-the-effect-of-extern-c-in-c)
but we don't need to care about it, because we're writing plain C.
The FUSE API
Now we "begin" - there is a big header comment, which suggests we're about to
learn the API:
```c
/* lines 40-64 */
/* ----------------------------------------------------------- *
* Basic FUSE API *
* ----------------------------------------------------------- */
/** Handle for a FUSE filesystem */
struct fuse;
/** Structure containing a raw command */
struct fuse_cmd;
```
Forward declaration of structs
This was my first WTF. What use is it to refer to a struct but not define any
of its members? This was hard to investigate, because it's not mentioned in
simple introductions to the struct syntax in C. This is *not* the same as an
"empty struct", which seems to be allowed by both C++ and some C extensions.
That would be written `struct fuse {};`, and is documented here:
https://stackoverflow.com/questions/24685399/c-empty-struct-what-does-this-mean-do
Instead, this example is a "structure without contents" and is described here:
https://stackoverflow.com/questions/8633436/how-to-interpret-structure-without-definition
(in fact, from someone investigating the FUSE source).
The general term for giving a partial definition of an identifier with a view
to defining it completely later is "forward declaration", and *that* wiki
article mentions this exact practice. https://en.wikipedia.org/wiki/Forward_declaration#Classes
It sounds like the library designer is trying to hide the definition of the
`fuse` struct from us so that they have the option of changing it later (i.e. a
kind of encapsulation). This suggests that there is an alternative interface to
interacting with `fuse`. This is a good time to mention that later on there are
several function definitions with return type `struct fuse`. For example,
```c
/* lines 665-667 */
struct fuse *fuse_new(struct fuse_chan *ch, struct fuse_args *args,
const struct fuse_operations *op, size_t op_size,
void *user_data);
/* line 669 */
void fuse_destroy(struct fuse *f);
```
which seems to clearly suggest that the sought-after interface is described
later. I can't find a complete definition of `struct fuse` anywhere, which
seems to support that this is an "opaque type" and you're not meant to touch
it. In summary, I'm reading this definition as "There is such a thing as a `fuse`
(filesystem) and a `fuse_cmd`, but these will be explained later".
typedef examples
```c
/* lines 50-64 */
/** Function to add an entry in a readdir() operation
*
* @param buf the buffer passed to the readdir() operation
* @param name the file name of the directory entry
* @param stat file attributes, can be NULL
* @param off offset of the next entry or zero
* @return 1 if buffer is full, zero otherwise
*/
typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
const struct stat *stbuf, off_t off);
/* Used by deprecated getdir() method */
typedef struct fuse_dirhandle *fuse_dirh_t;
typedef int (*fuse_dirfil_t) (fuse_dirh_t h, const char *name, int type,
ino_t ino);
```
readdir vs getdir
First note that `readdir()` and `getdir()` sound like they do very similar
jobs. I got confused at first because `readdir()` is a syscall, but I couldn't
find anything about `getdir()` anywhere online. Looking back in the file,
`getdir` and `readdir` are both mentioned later inside of `struct
fuse_operations`. I assume the old version didn't have exact parity with Linux
syscalls, and now does? Or maybe the actual Linux syscall changed (this seems
unlikely).
```c
/* lines 107-108 */
/* Deprecated, use readdir() instead */
int (*getdir) (const char *, fuse_dirh_t, fuse_dirfil_t);
/* lines 302-305
* Introduced in version 2.3
*/
int (*readdir) (const char *, void *, fuse_fill_dir_t, off_t,
struct fuse_file_info *);
```
So it seems like we can ignore `fuse_dirhandle` and `fuse_dirfill_t` entirely.
I still want to manually parse them, just in case the syntax is interesting.
Parsing fuse\_fill\_dir\_t
```c
/* lines 58-59 */
typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
const struct stat *stbuf, off_t off);
```
The line says `typedef`, but the comment says "function". How can they be the
same? The baby examples of `typedef` are defining your own type aliases - like
`typedef struct Fraction { int denominator; int numerator; } Fraction;`, which
lets you declare values of type `Fraction`.
Of course, the code is right. The usage of `fuse_fill_dir_t` elsewhere gives a
hint. The named argument `filler` shows that it should be understood like a
callback function in other languages. `_t` suggests that it is a type of
function (rather than a function value itself). But how should we read the `*`
in `typedef int (*fuse_fill_dir_t)`?
```c
/* lines 848-850 */
int fuse_fs_readdir(struct fuse_fs *fs, const char *path, void *buf,
fuse_fill_dir_t filler, off_t off,
struct fuse_file_info *fi);
```
The SO question [Typedef function
pointer?](https://stackoverflow.com/a/4295495/4681998) provides a good answer.
They write (I have corrected the grammar)
> `typedef` is a language construct that associates a name to a type.
> You use it the same way you would use the original type, for instance
>
> typedef int myinteger;
> typedef char *mystring;
> typedef void (*myfunc)();
>
> can be used like so:
>
> myinteger i; // is equivalent to int i;
> mystring s; // is the same as char *s;
> myfunc f; // compiles equally as void (*f)();
Applying this verbatim to our example suggests that we should be able to translate
```
fuse_fill_dir_t my_filler (void *buf, const char *name, const struct stat *stbuf, off_t off) {
/* actual function goes here */
};
/* should translate into */
int (*my_filler) (void *buf, const char *name, const struct stat *stbuf, off_t off) {
/* actual function goes here */
};
```
which reveals a further confusion - what does it mean to name a function with
an asterisk? I *guess* it's a way to specify a function pointer. And that guess
is correct: another SO question [Why was the C syntax for arrays, pointers, and
functions designed this
way?](https://softwareengineering.stackexchange.com/questions/117024/why-was-the-c-syntax-for-arrays-pointers-and-functions-designed-this-way)
explains the historical reasoning for this notation (and confirms that the guess is correct).
The other examples
Given the previous observations, these are now quite easy.
```c
/* line 62 */
typedef struct fuse_dirhandle *fuse_dirh_t;
```
This example just defines an alias `fuse_dirh_t` which is a type of pointers to
`fuse_dirhandle`s.
* So if I declare `fuse_dirh_t x` then `&x` should be a `fuse_dirhandle`
* and in `fuse_dirh_t (*x)`, `x` will be a `fuse_dirhandle`.
```c
/* lines 63-64 */
typedef int (*fuse_dirfil_t) (fuse_dirh_t h, const char *name, int type,
ino_t ino);
```
This is identical to `fuse_fill_dir_t` above, but with a different signature.
Nothing interesting to note.
Filesystem operations
It feels again like we are finally getting into the actual meat of the API. We
begin with a comment,
```c
/* lines 66-87 */
/**
* The file system operations:
*
* Most of these should work very similarly to the well known UNIX
* file system operations. A major exception is that instead of
* returning an error in 'errno', the operation should return the
* negated error value (-errno) directly.
*
* All methods are optional, but some are essential for a useful
* filesystem (e.g. getattr). Open, flush, release, fsync, opendir,
* releasedir, fsyncdir, access, create, ftruncate, fgetattr, lock,
* init and destroy are special purpose methods, without which a full
* featured filesystem can still be implemented.
*
* Almost all operations take a path which can be of any length.
*
* Changed in fuse 2.8.0 (regardless of API version)
* Previously, paths were limited to a length of PATH_MAX.
*
* See http://fuse.sourceforge.net/wiki/ for more information. There
* is also a snapshot of the relevant wiki pages in the doc/ folder.
*/
```
Note the link to [the wiki](http://fuse.sourceforge.net/wiki/).
The Linux FS API
First hurdle: I don't know what the UNIX file system operation are. I've *used*
files, but never from C code, and never in such a way that I need to know what
their interface consisted of. I'm assuming for the time being that we will be
able to pick it up from the remaining documentation.
There is a very good overview at [Filesystem kernel API
](https://unix.stackexchange.com/a/275580/284679) on the UNIX stack exchange,
although it doesn't go into great detail.
I'm surprised that they say `getattr` is essential and `open` is optional. But
I assume they'll explain why as we continue.
Errno
The statement about `errno` presumably means the return code of the
functions. I guess that FUSE needs to reserve numbers for its own error
codes, which will be positive, and the negative codes will be exactly what the
Linux FS interface expects. For instance, if something happens that the FUSE
layer wants to indicate as "Could not write to file; read only filesystem" then
you go into Linux and look up what error code that is, and you return `-that`.
I wasn't sure how to look up those error codes, and I found the answer at [What
are the standard error codes in Linux?
](https://unix.stackexchange.com/a/326811/284679): there is literally a program
called `errno`, which is in [moreutils](https://joeyh.name/code/moreutils/)
(which is a really useful package to be familiar with!)
`errno ls | grep "Read-only"` shows that this error is named `EROFS`
and has code 30. So a FUSE implementation of reading a file that wanted to
prevent writes for this reason should return `-30`, which the FUSE driver will
hand to applications.
The pathname limit
I was actually surprised to hear them say pathnames could be any length. I
think filenames are limited to 255 bytes and pathnames are limited to 4096
bytes. Not that it's particularly important, I just was surprised they put in
the effort to allow unbounded strings.
In fact you can check this for yourself using [getconf](https://linux.die.net/man/1/getconf):
```sh
[daniel@nixos:~/blog/danielittlewood-xyz]$ getconf NAME_MAX /dev/sda1
255
[daniel@nixos:~/blog/danielittlewood-xyz]$ getconf PATH_MAX /dev/sda1
4096
```
Presumably it depends on the filesystem.
fuse\_operations
The rest of the section is below. I've removed the comments.
```c
/* lines 88-594 */
struct fuse_operations {
int (*getattr) (const char *, struct stat *);
int (*readlink) (const char *, char *, size_t);
int (*getdir) (const char *, fuse_dirh_t, fuse_dirfil_t);
int (*mknod) (const char *, mode_t, dev_t);
int (*mkdir) (const char *, mode_t);
int (*unlink) (const char *);
int (*rmdir) (const char *);
int (*symlink) (const char *, const char *);
int (*rename) (const char *, const char *);
int (*link) (const char *, const char *);
int (*chmod) (const char *, mode_t);
int (*chown) (const char *, uid_t, gid_t);
int (*truncate) (const char *, off_t);
int (*utime) (const char *, struct utimbuf *);
int (*open) (const char *, struct fuse_file_info *);
int (*read) (const char *, char *, size_t, off_t, struct fuse_file_info *);
int (*write) (const char *, const char *, size_t, off_t, struct fuse_file_info *);
int (*statfs) (const char *, struct statvfs *);
int (*flush) (const char *, struct fuse_file_info *);
int (*release) (const char *, struct fuse_file_info *);
int (*fsync) (const char *, int, struct fuse_file_info *);
int (*setxattr) (const char *, const char *, const char *, size_t, int);
int (*getxattr) (const char *, const char *, char *, size_t);
int (*listxattr) (const char *, char *, size_t);
int (*removexattr) (const char *, const char *);
int (*opendir) (const char *, struct fuse_file_info *);
int (*readdir) (const char *, void *, fuse_fill_dir_t, off_t, struct fuse_file_info *);
int (*releasedir) (const char *, struct fuse_file_info *);
int (*fsyncdir) (const char *, int, struct fuse_file_info *);
void *(*init) (struct fuse_conn_info *conn);
void (*destroy) (void *);
int (*access) (const char *, int);
int (*create) (const char *, mode_t, struct fuse_file_info *);
int (*ftruncate) (const char *, off_t, struct fuse_file_info *);
int (*fgetattr) (const char *, struct stat *, struct fuse_file_info *);
int (*lock) (const char *, struct fuse_file_info *, int cmd, struct flock *);
int (*utimens) (const char *, const struct timespec tv[2]);
int (*bmap) (const char *, size_t blocksize, uint64_t *idx);
unsigned int flag_nullpath_ok:1;
unsigned int flag_nopath:1;
unsigned int flag_utime_omit_ok:1;
unsigned int flag_reserved:29;
int (*ioctl) (const char *, int cmd, void *arg, struct fuse_file_info *, unsigned int flags, void *data);
int (*poll) (const char *, struct fuse_file_info *, struct fuse_pollhandle *ph, unsigned *reventsp);
int (*write_buf) (const char *, struct fuse_bufvec *buf, off_t off, struct fuse_file_info *);
int (*read_buf) (const char *, struct fuse_bufvec **bufp, size_t size, off_t off, struct fuse_file_info *);
int (*flock) (const char *, struct fuse_file_info *, int op);
int (*fallocate) (const char *, int, off_t, off_t, struct fuse_file_info *);
};
```
Good lord! They did say most were optional. And that `getattr` definitely
wasn't. Maybe we should just try understanding `getattr` and going from there.
There are some high-level groupings I can guess at, just by looking:
* `getattr` is probably so essential because it tells you things like file
types, access flags, and so on. So if you imagine the first thing you do with
a directory or file (inspect it prior to reading) then you probably need this.
* **Links** `readlink` will only be needed if you implement symbolic links.
`link` is the syscall for creating a hard link
* **File creation** `mknod` and `unlink` are regular file creation and deletion.
* **Editing files** `open`, `read`, `write` speak for themselves (you must open
a file before reading from or writing to it).
`getattr` and `stat`
Since `getattr` is apparently essential to implement, we should probably take a
look at it in detail. The comment is helpful:
```c
/* lines 89-95 */
/** Get file attributes.
*
* Similar to stat(). The 'st_dev' and 'st_blksize' fields are
* ignored. The 'st_ino' field is ignored except if the 'use_ino'
* mount option is given.
*/
int (*getattr) (const char *, struct stat *);
```
`stat` is a [syscall](SYSCALLDOCS), and like all (most) syscalls, you call it
from C using a wrapper function of the same name. The [docs for `stat()` in libc
](https://man7.org/linux/man-pages/man2/lstat.2.html) explain that it takes two
arguments:
1. `const char *restrict pathname`, a pathname string (pointer to char).
2. `struct stat *restrict statbuf`, a pointer to a "stat structure".
It's worth dwelling for a second on what `const` and `restrict` mean in this
context.
* `const` is a hint to the compiler (and the human reader) that `stat()` is not
going to modify the memory pointed to by `pathname`. That makes sense,
obviously. `statbuf` is not `const` because *the point of calling `stat()` is
to write to that buffer*. [Note that `const` *does not mean* that the
argument won't change](https://publications.gbdirect.co.uk//c_book/chapter8/const_and_volatile.html);
only that the given function promises not to change it.
* `restrict` is a different hint, which says "I promise that this is the unique
pointer to this memory address in the entire world.". It's a performance
enhancement ([here's a simple example](https://en.wikipedia.org/wiki/Restrict#Optimization)).
In contrast with `const`, which makes a promise about the function's local
behaviour, `restrict` tries to make a guarantee about global memory management.
For this reason, pointers marked as `restrict` should not generally be passed to
functions which do not mark that argument as `restrict` (is this true?).
This guarantee is made not just for the life of the function, but for the
entire life cycle of the pointer. (from wiki. really? this sounds wrong. verify.)
You might wonder, if `restrict`ed code and un`restrict`ed code shouldn't mix,
why not incorporate it in the type declaration of the pointer itself? I'm
actually not sure. I thought perhaps we could allow for manual memory
management of multiple pointers which promises to call a `restrict` function
only when one of many pointers is actually pointing to the address in question.
Also, what's going on with the wandering asterisk here? Would `const char
restrict *pathname` be equivalent to the first one?
struct stat definition
I assume this is defined in some header file, but I can't find it. Looking for
a simple definition like `struct stat {` or `typedef struct stat` didn't work.
The `stat()` docs refer to [another man page for `struct stat`](https://man7.org/linux/man-pages/man3/stat.3type.html)
which defines the type as follows:
```c
/* I couldn't find this code anywhere on my machine... */
#include <sys/stat.h>
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* Inode number */
mode_t st_mode; /* File type and mode */
nlink_t st_nlink; /* Number of hard links */
uid_t st_uid; /* User ID of owner */
gid_t st_gid; /* Group ID of owner */
dev_t st_rdev; /* Device ID (if special file) */
off_t st_size; /* Total size, in bytes */
blksize_t st_blksize; /* Block size for filesystem I/O */
blkcnt_t st_blocks; /* Number of 512 B blocks allocated */
/* Since POSIX.1-2008, this structure supports nanosecond
precision for the following timestamp fields.
For the details before POSIX.1-2008, see VERSIONS. */
struct timespec st_atim; /* Time of last access */
struct timespec st_mtim; /* Time of last modification */
struct timespec st_ctim; /* Time of last status change */
#define st_atime st_atim.tv_sec /* Backward compatibility */
#define st_mtime st_mtim.tv_sec
#define st_ctime st_ctim.tv_sec
};
```
* The comment says we can disregard `dev`, `ino` and `blksize`.
Does that mean we should write them with zeroes or leave them uninitialized?
What if we accidentally leave a field that's necessary uninitialized and something wacky happens?
* Surely if `blksize` and `dev` are ignored, then `blkcnt` and `rdev` probably also are?
* It's interesting that almost every element of this struct has itself a custom type
for returning exactly this sort of data. Like `uid_t` for `st_uid`.
Presumably some of them are just aliases for ints. In fact, I bet they're all
aliases for ints, because only the timestamps declare themselves a struct
(that's not a proof, just a guess).
Example usage of the stat syscall
The manual gives an extremely simple C program that exposes most of the
relevant information exposed by `stat()` (actually it uses `lstat()`, which can
examine symlinks, but the difference is not important here). Here it is
completely:
```c
/* stat-example.c */
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/sysmacros.h>
#include <time.h>
int
main(int argc, char *argv[])
{
struct stat sb;
if (argc != 2) {
fprintf(stderr, "Usage: %s <pathname>\n", argv[0]);
exit(EXIT_FAILURE);
}
if (lstat(argv[1], &sb) == -1) {
perror("lstat");
exit(EXIT_FAILURE);
}
printf("ID of containing device: [%x,%x]\n",
major(sb.st_dev),
minor(sb.st_dev));
printf("File type: ");
switch (sb.st_mode & S_IFMT) {
case S_IFBLK: printf("block device\n"); break;
case S_IFCHR: printf("character device\n"); break;
case S_IFDIR: printf("directory\n"); break;
case S_IFIFO: printf("FIFO/pipe\n"); break;
case S_IFLNK: printf("symlink\n"); break;
case S_IFREG: printf("regular file\n"); break;
case S_IFSOCK: printf("socket\n"); break;
default: printf("unknown?\n"); break;
}
printf("I-node number: %ju\n", (uintmax_t) sb.st_ino);
printf("Mode: %jo (octal)\n",
(uintmax_t) sb.st_mode);
printf("Link count: %ju\n", (uintmax_t) sb.st_nlink);
printf("Ownership: UID=%ju GID=%ju\n",
(uintmax_t) sb.st_uid, (uintmax_t) sb.st_gid);
printf("Preferred I/O block size: %jd bytes\n",
(intmax_t) sb.st_blksize);
printf("File size: %jd bytes\n",
(intmax_t) sb.st_size);
printf("Blocks allocated: %jd\n",
(intmax_t) sb.st_blocks);
printf("Last status change: %s", ctime(&sb.st_ctime));
printf("Last file access: %s", ctime(&sb.st_atime));
printf("Last file modification: %s", ctime(&sb.st_mtime));
exit(EXIT_SUCCESS);
}
```
Note that substituting `lstat` for `stat` does not change the output when run
on a regular file.
Example output:
```sh
[daniel@nixos:/tmp/tmp.fjBCio02Bq]$ gcc stat-example.c && ./a.out stat-example.c
ID of containing device: [0,1d]
File type: regular file
I-node number: 110466
Mode: 100644 (octal)
Link count: 1
Ownership: UID=1000 GID=100
Preferred I/O block size: 4096 bytes
File size: 1977 bytes
Blocks allocated: 8
Last status change: Sat Feb 3 13:50:46 2024
Last file access: Sat Feb 3 13:50:51 2024
Last file modification: Sat Feb 3 13:50:46 2024
```
Originally I was surprised to see that the initialisation of `st` did not
declare how much free space it should use. We're reserving a buffer to write
to, after all. But then I remembered that the definition of the struct
explicitly sets out how much space to reserve for each field.
The expression `sb.st_mode & S_IFMT` indicates that `S_IFMT` is [a bitmask](https://stackoverflow.com/questions/10493411/what-is-bit-masking).
Indeed, printing its value gives 61440, which is `1111000000000000` in binary,
so `sb.st_mode & S_IFMT` will leave only the high 4 bits in `sb.st_mode`
(assuming it is 16 bits long). This suggests that `sb.st_mode` encodes the file
type in the first four bits. The full list of special values is below:
```c
printf("%d\n", S_IFMT); /* 61440 = 1111000000000000 */
printf("%d\n", S_IFBLK); /* 24576 = 0110000000000000 */
printf("%d\n", S_IFCHR); /* 8192 = 0010000000000000 */
printf("%d\n", S_IFDIR); /* 16384 = 0100000000000000 */
printf("%d\n", S_IFIFO); /* 4096 = 0001000000000000 */
printf("%d\n", S_IFLNK); /* 40960 = 1010000000000000 */
printf("%d\n", S_IFREG); /* 32768 = 1000000000000000 */
printf("%d\n", S_IFSOCK); /* 49152 = 1100000000000000 */
```
Presumably the lower 12 bits store the usual `rwx` permissions on the file in
question. This implies 3 octal digits (owner, group owner, and world
permissions), which should require 9 bits to store. Perhaps there is more
information stored (like the sticky bit?) or perhaps a couple of bits are just
wasted.
In the example, the following line prints the `rwx` permissions:
```
printf("Mode: %jo (octal)\n", (uintmax_t) sb.st_mode);
```
To understand the `%jo` syntax I had to find a copy of [the C99 standard](https://www.iso-9899.info/wiki/The_Standard#C99).
Section 7.19.6 defines how format strings (the argument to `printf`) work in general. In particular,
> `j`: Specifies that a following `d`, `i`, `o`, `u`, `x`, or `X` conversion
> specifier applies to an `intmax_t` or `uintmax_t` argument; or that a
> following `n` conversion specifier applies to a pointer to an `intmax_t`
> argument.
> `o`: The unsigned int argument is converted to unsigned octal...
> The precision specifies the minimum number of digits to appear;
> if the value being converted can be represented in fewer digits, it is expanded
> with leading zeros. The default precision is 1.
So the `%o` format flag in `printf` indicates that the argument's bits should be
interpreted as an octal number. So, for example, `unsigned int x = 100` has bits
`1100100` (leading zeroes omitted), which written as an octal number is
`(001)(100)(100) = 144`. Hence, `printf("%d in octal is %o\n", x, x)` should
print `100 in octal is 144`.
If you instead put `int x = 100` in the previous example, it prints the same
thing. This is because `int 100` and `unsigned int 100` have exactly the same
bits (this is guaranteed by the standard in section 6.3.1.3). If you put a
negative integer (such as `int x = -1`), you instead see `-1 in octal is
37777777777`. In this case `-1` has been converted to the maximum unsigned
integer value, which is in this case 32 bits all set to `1`.
> **6.3.1.3 Signed and unsigned integers**
>
> When a value with integer type is converted to another integer type other
> than _Bool, if the value can be represented by the new type, it is unchanged.
>
> Otherwise, if the new type is unsigned, the value is converted by repeatedly
> adding or subtracting one more than the maximum value that can be represented
> in the new type until the value is in the range of the new type.
>
> (The rules describe arithmetic on the mathematical value, not the value of a
> given type of expression.)
```
-1 = (-1) + (max unsigned int + 1)
= max unsigned int
= (11)(111)(111)(111)(111)(111)(111)(111)(111)(111)(111)
= 37777777777
```
The difference between `%o` and `%jo` is that in the latter case the argument
should be of type `uintmax_t`. This explains the cast `(uintmax_t) sb.st_mode`,
but doesn't explain *why* we do that cast. Why not just leave the value
uncasted, and use `%o`?
The reason is that, as noted earlier, `%o` assumes the value to be printed is
an unsigned int. In this case, all we know is that it has 16 significant bits,
and may or may not be the same size as a `uint` (probably isn't). If it isn't,
then the behaviour is explicitly undefined. When I tried it, it worked fine,
but that isn't guaranteed. By casting the value to `uintmax_t`, we know for
sure that the value is unchanged, since:
* `unitmax_t` is by definition the largest unsigned integer size on the system,
* it is guaranteed to be more than 16 bits,
* casting to a larger type will just pad the value with leading zeroes,
* `%j` by definition supports values of size `uintmax_t`.
Incidentally, Jim Fisher found [a detailed definition of `mode_t`](https://jameshfisher.com/2017/02/24/what-is-mode_t/),
which explains in detail how the lower 12 bits work (although seems to miss the
higher 4 we discussed). The remaining 3 we didn't understand (the "sticky
bits") are another octal number wrapping three flags:
* 4 indicates the file is [setuid](https://en.wikipedia.org/wiki/Setuid#SUID)
(it may change the effective user ID when executed).
* 2 indicates the file is [setgid](https://en.wikipedia.org/wiki/Setuid#SGID)
(it may change the effective group ID when executed).
* 1 indicates the file has the [sticky bit](https://en.wikipedia.org/wiki/Setuid#Sticky_bit)
(moving or deleting the file depends on the user/group that owns *this file*,
rather than its parent directory, which is the default).
Minimal example: A symlink owned by the current user
* Should stat the link, not the target file.
* Ignores the pathname; considers any input to be a symlink.
* `dev`, `ino` and `blksize` will be left uninitialized (no obvious default).
* `uid` and `gid` should be those of the user that ran the program.
* `nlink` (hard link count) should be 1 (it is unique in the filesystem).
* `size` and `blocks` will be 0, since the file is fake.
* `st_mode` should have the file mode equal to `S_IFLNK`, and
the permission bits equal to `0777` (note: that is a 4-digit octal number).
* All timestamps (access, modification and creation) should be set to "now"
i.e. when the program was run.
The last point is actually not chosen by me, [it's in the Linux kernel docs](https://man7.org/linux/man-pages/man7/symlink.7.html):
> **Symbolic link ownership, permissions, and timestamps**
>
> The owner and group of an existing symbolic link can be changed using
> [`lchown(2)`](https://man7.org/linux/man-pages/man2/lchown.2.html).
> The ownership of a symbolic link matters when the link is being removed or
> renamed in a directory that has the sticky bit set
> (see [`inode(7)`](https://man7.org/linux/man-pages/man7/inode.7.html)),
> and when the `fs.protected_symlinks` sysctl is set
> (see [`proc(5)`](https://man7.org/linux/man-pages/man5/proc.5.html)).
>
> The last access and last modification timestamps of a symbolic
> link can be changed using
> [`utimensat(2)`](https://man7.org/linux/man-pages/man2/utimensat.2.html) or
> [`lutimes(3)`](https://man7.org/linux/man-pages/man3/lutimes.3.html).
>
> On Linux, the permissions of an ordinary symbolic link are not
> used in any operations; the permissions are always `0777` (read,
> write, and execute for all user categories), and can't be
> changed.
The way to get the user and group ID of the running process are the syscalls
[`getuid()`](https://man7.org/linux/man-pages/man2/getuid.2.html) and
[`getgid()`](https://man7.org/linux/man-pages/man2/getgid.2.html), which are
included from the header file `<unistd.h>`. The manual pages distinguish
between the "real user ID" and the "effective user ID". The distinction is
that if a setuid binary is owned by `root` and executed by `user`, then the
resulting process will have "real user id" set to `user` and the "effective
user id" set to `root`. From the "client" perspective, the effective user ID
is the one that is typically used for access checks. So if we imagine, for
instance, that we were using this fake `stat` function in a FUSE system and
performing a mount, the correct ID to use would be the effective user ID. That
way we are agnostic about whether the user doing the mounting has come from
running the process directly or from forking a setuid program. I think that's
correct (but not completely sure).
Getting the current time is straightforward. We instantiate a `timespec`
buffer, and write to it with [`clock_gettime()`](https://pubs.opengroup.org/onlinepubs/7908799/xsh/clock_gettime.html).
These are defined in `<time.h>`, which is already included in our example script.
The full example function is below, and I replaced `lstat` in the earlier
example program with `fake_stat`:
```c
int fake_stat(const char *restrict pathname, struct stat *restrict stat_buf)
{
struct timespec now;
clock_gettime(CLOCK_REALTIME, &now);
stat_buf->st_nlink = 1;
stat_buf->st_uid = geteuid();
stat_buf->st_gid = getegid();
stat_buf->st_size = 0;
stat_buf->st_blocks = 0;
stat_buf->st_mode = S_IFLNK | 00777;
stat_buf->st_atim = now;
stat_buf->st_mtim = now;
stat_buf->st_ctim = now;
return 0;
}
```
which produces:
```sh
[daniel@nixos:/tmp/tmp.fjBCio02Bq]$ gcc stat-example.c && ./a.out stat-example.c
ID of containing device: [400,0]
File type: symlink
I-node number: 8
Mode: 120777 (octal)
Link count: 1
Ownership: UID=1000 GID=100
Preferred I/O block size: 17179869188 bytes
File size: 0 bytes
Blocks allocated: 0
Last status change: Sun Feb 11 16:15:41 2024
Last file access: Sun Feb 11 16:15:41 2024
Last file modification: Sun Feb 11 16:15:41 2024
```
which looks correct.
GNU stat
`stat` is part of libc ([manual](https://man7.org/linux/man-pages/man2/lstat.2.html)),
defined in the system header file `<sys/stat.h>`.
Mounting a filesystem
First attempt: Eveything is a symlink
To celebrate our trivial example, let's try to mount it into the system.
Earlier investigation suggests that `fuse_new` and `fuse_destroy` are relevant,
but after having a quick browse through the remainder of the header I found
`fuse_main` which looks promising:
```c
/* lines 621-654 */
/**
* Main function of FUSE.
*
* This is for the lazy. This is all that has to be called from the
* main() function.
*
* This function does the following:
* - parses command line options (-d -s and -h)
* - passes relevant mount options to the fuse_mount()
* - installs signal handlers for INT, HUP, TERM and PIPE
* - registers an exit handler to unmount the filesystem on program exit
* - creates a fuse handle
* - registers the operations
* - calls either the single-threaded or the multi-threaded event loop
*
* Note: this is currently implemented as a macro.
*
* @param argc the argument counter passed to the main() function
* @param argv the argument vector passed to the main() function
* @param op the file system operation
* @param user_data user data supplied in the context during the init() method
* @return 0 on success, nonzero on failure
*/
/*
int fuse_main(int argc, char *argv[], const struct fuse_operations *op,
void *user_data);
*/
#define fuse_main(argc, argv, op, user_data) \
fuse_main_real(argc, argv, op, sizeof(*(op)), user_data)
/* ----------------------------------------------------------- *
* More detailed API *
* ----------------------------------------------------------- */
...
```
This looks promising. It needs `argc` and `argv`, so we'd better include
`<stdio.h>` in our `main` function and pass those through verbatim. It looks
like `op` is a pointer to a `fuse_operations` struct, which itself should
contain pointers to all the relevant filesystem operations. Finally `user_data`
- presumably this is the user who performed the mount (previously we used
`geteuid()` for this). Hopefully we can ignore that for now.
It seems like the following program should work, as a minimal example...
```c
/* Recommended in the "Version declaration" section. Must precede #include <fuse.h> */
#define FUSE_USE_VERSION 26
#include <fuse.h>
/* to get command line arguments */
#include <stdio.h>
/* for clock_gettime */
#include <time.h>
/* for geteuid/getegid */
#include <unistd.h>
int fake_stat(const char *restrict pathname, struct stat *restrict stat_buf)
{
struct timespec now;
clock_gettime(CLOCK_REALTIME, &now);
stat_buf->st_nlink = 1;
stat_buf->st_uid = geteuid();
stat_buf->st_gid = getegid();
stat_buf->st_size = 0;
stat_buf->st_blocks = 0;
stat_buf->st_mode = S_IFLNK | 00777;
stat_buf->st_atim = now;
stat_buf->st_mtim = now;
stat_buf->st_ctim = now;
return 0;
}
int main(int argc, char *argv[]) {
struct fuse_operations op = {
.getattr = fake_stat
};
fuse_main(argc, argv, &op, NULL);
}
```
The only thing unexplained is the syntax for setting the `fuse_operations`
struct. I got that hint from [James Pfeiffer's FUSE
tutorial](https://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/html/callbacks.html).
I also set the `user_details` to `NULL` since I wasn't allowed to leave it out
and I had no idea what to set it to. Obviously then we try to compile it...
```sh
[daniel@nixos:~/software/diffuse/tmp.fjBCio02Bq]$ gcc test-fs.c
test-fs.c:3:10: fatal error: fuse.h: No such file or directory
3 | #include <fuse.h>
| ^~~~~~~~
compilation terminated.
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ gcc test-fs.c
In file included from /nix/store/wcd4pwkb7nxd6xbwn19rg6sw1id12lgg-fuse-2.9.9/include/fuse/fuse.h:26,
from /nix/store/wcd4pwkb7nxd6xbwn19rg6sw1id12lgg-fuse-2.9.9/include/fuse.h:9,
from test-fs.c:3:
/nix/store/wcd4pwkb7nxd6xbwn19rg6sw1id12lgg-fuse-2.9.9/include/fuse/fuse_common.h:33:2: error: #error Please add -D_FILE_OFFSET_BITS=64 to your compile flags!
33 | #error Please add -D_FILE_OFFSET_BITS=64 to your compile flags!
| ^~~~~
```
You might see either of the above errors. One is due to my operating system: on
NixOS, you can run multiple versions of the same software simultaneously,
because it isn't installed into the global namespace. A library like FUSE will
be installed at an obscure location like
`/nix/store/6ivxlbrn6sxaqwnlihhfidmpl97smpf1-fuse-2.9.9` rather
than the usual location (something like `/usr/lib/fuse`) and then
when you check out a specific version of the software (called a "profile"), a
link will be automatically set up from the usual location to your store, and from there it
works like normal. Something I didn't know previously is that [libraries are
not installed by default](https://nixos.wiki/wiki/FAQ/I_installed_a_library_but_my_compiler_is_not_finding_it._Why%3F),
only applications. The fix is straightforward - I had to set up a development shell:
```nix
default.nix
with import <nixpkgs> {};
stdenv.mkDerivation {
name = "fuse-devel";
buildInputs = [ pkg-config fuse ];
}
```
Then, when I run `nix-shell`, I see what (I assume) a normal user sees when
they have FUSE installed: They need to put some obscure flag on the command
line in order to compile. This is an instance of a very general problem - that
when you compile a program against a library, the compiler needs to be told all
sorts of information about the library that you (a user of it) don't care about
at all. The tool `pkg-config` was designed to solve exactly this problem: if
the library supports it, you can just run (for example)
```
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ pkg-config fuse --cflags --libs
-D_FILE_OFFSET_BITS=64 -I/nix/store/wcd4pwkb7nxd6xbwn19rg6sw1id12lgg-fuse-2.9.9/include/fuse -L/nix/store/wcd4pwkb7nxd6xbwn19rg6sw1id12lgg-fuse-2.9.9/lib -lfuse -pthread
```
You'll likely see some different flags. But it doesn't matter what they are -
just copy and paste them between `gcc` and `test-fs.c` ([the order
matters!](https://stackoverflow.com/questions/22288456/fuse-functions-not-found-on-compile))
and it will compile as desired. Or, better yet, don't copy and use a subshell instead:
```sh
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ gcc test-fs.c `pkg-config fuse --cflags --libs`
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ mkdir -p test && ./a.out test
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ stat test/a
File: test/astat: cannot read symbolic link 'test/a': Function not implemented
Size: 0 Blocks: 0 IO Block: 4096 symbolic link
Device: 0,95 Inode: 2 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ daniel) Gid: ( 100/ users)
Access: 2024-02-11 21:30:46.639525687 +0000
Modify: 2024-02-11 21:30:46.639525687 +0000
Change: 2024-02-11 21:30:46.639525687 +0000
Birth: -
```
It works! Sort of. We have a new directory `test` which allegedly contains some
files, and they're all symbolic links. But the links are sort of broken,
presumably because they aren't pointing anywhere. And we can't meaningfully
navigate the filesystem, because we can't list directories. If you try anything
other than stating a single file you'll get all sorts of stupid errors. We can
close the filsystem and clean up using `fusermount -u`. Although since `test`
itself is broken (it doesn't have an owner, even) we'll have to use `sudo` to
do that.
```sh
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ stat test/a/b/c
stat: cannot statx 'test/a/b/c': Input/output error
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ stat test/a/../../b
stat: cannot statx 'test/a/../../b': Function not implemented
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ ls test
ls: cannot access 'test': Input/output error
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ fusermount -u test
fusermount: failed to unmount /home/daniel/stow/data/software/diffuse/tmp.fjBCio02Bq/test: Operation not permitted
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ ls -al | grep test
ls: cannot access 'test': Input/output error
d????????? ? ? ? ? ? test
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ sudo fusermount -u test
[sudo] password for daniel:
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ ls test
```
Now is a good time to improve our debugging technique. To avoid using a stale
binary, I recompile the file every time I change anything. I also started
passing some command line arguments to the fuse system to simplify things:
> **FUSE options:**
> * `-d -o debug` enable debug output (implies `-f`)
> * `-f` foreground operation
> * `-s` disable multi-threaded operation
>
> <https://gist.github.com/c4milo/2007941>
```
[nix-shell:~/software/diffuse/tmp.fjBCio02Bq]$ gcc test-fs.c `pkg-config fuse --cflags --libs` && ./a.out -f -s -d test
FUSE library version: 2.9.9
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.37
flags=0x73fffffb
max_readahead=0x00020000
INIT: 7.19
flags=0x00000011
max_readahead=0x00020000
max_write=0x00020000
max_background=0
congestion_threshold=0
unique: 2, success, outsize: 40
in another terminal
[daniel@nixos:~/stow/data/software/diffuse/tmp.fjBCio02Bq]$ stat test
stat: cannot statx 'test': Input/output error
in the first terminal
unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 220481
getattr /
unique: 4, success, outsize: 120
```
Now if you quit with `<C-c>`, the directory will cleanly unmount.
Adding `readlink`
The first and most obvious thing to fix is that links don't point anywhere. To
keep the implementation simple, let's say they all point to some fixed
location, like `/tmp`. The signature can be found in `fuse.h` easily enough:
```c
/* lines 97-105 */
/** Read the target of a symbolic link
*
* The buffer should be filled with a null terminated string. The
* buffer size argument includes the space for the terminating
* null character. If the linkname is too long to fit in the
* buffer, it should be truncated. The return value should be 0
* for success.
*/
int (*readlink) (const char *, char *, size_t);
```
Presumably the first argument is the pathname of the link, and the second
argument is the buffer into which we should write the string `"/tmp"`.
The size is how big the buffer is - so we shouldn't read or write beyond that
number of bytes. My first thought here was to use something like `memcpy()` or
`strcpy()`, but it turns out there is something better:
> **strncpy**
>
> `char * strncpy ( char * destination, const char * source, size_t num );`
>
> Copy characters from string
>
> Copies the first num characters of source to destination. If the end of the
> source C string (which is signaled by a null-character) is found before num
> characters have been copied, destination is padded with zeros until a total
> of num characters have been written to it.
>
> <https://cplusplus.com/reference/cstring/strncpy/>
The updated version is:
```c
/* new header */
/* for strncpy */
#include <string.h>
/* unchanged... */
int fake_stat(const char *restrict pathname, struct stat *restrict stat_buf) {}
int fake_readlink(const char *restrict pathname, char *dest_buf, size_t length) {
char *dest = "/tmp";
strncpy(dest_buf, dest, length);
return (strlen(dest) > length ? -1 : 0); /* perhaps this should be ENAMETOOLONG? */
}
int main(int argc, char *argv[]) {
struct fuse_operations op = {
.getattr = fake_stat,
.readlink = fake_readlink
};
fuse_main(argc, argv, &op, NULL);
}
```
Now if I stat any file immediately under `test`, I get a nice symlink to
`/tmp`. But if I stat anything of the form `test/a/b`, I get "no such file or
directory", and if I stat `test` (the root directory) I get an Input/Output
error and have to restart the filesystem completely.
Second (more realistic) attempt: Derive the file's type from its name
A more realistic filesystem is as follows:
* The file at the root path is a directory.
* Any file whose basename starts with `dir` is a directory.
* Any file whose basename starts with `file` is a regular file.
* Any file whose basename starts with `link` is a symbolic link to the file whose basename has `link` replaced by `file`.
This is still not completely realistic: every directory contains an infinite
number of entries. But it will still partially work - we just won't implement
the method for listing directories, and trying to do so will crash the
filesystem.
Our `stat` function will be very similar to the previous one. In fact, the only
thing that needs to change is that the filetype bits of `st_mode` need to be
dynamically allocated. Here is the entire function:
```c
int fake_stat(const char *restrict pathname, struct stat *restrict stat_buf)
{
struct timespec now;
clock_gettime(CLOCK_REALTIME, &now);
stat_buf->st_nlink = 1;
stat_buf->st_uid = geteuid();
stat_buf->st_gid = getegid();
stat_buf->st_size = 0;
stat_buf->st_blocks = 0;
stat_buf->st_mode = getftype(pathname) | 00777;
stat_buf->st_atim = now;
stat_buf->st_mtim = now;
stat_buf->st_ctim = now;
return 0;
}
```
Obviously the interesting thing is the new function `getftype()`. We need to do
two jobs. First, identify the basename of our file. It's possible to do this
with string manipulation directly, but it will be much simpler to use a library
function. The basename of a file path is basically everything following the
last slash - the "file part". This is defined carefully by POSIX, and there are
functions `dirname()` and `basename()` in the C standard library which take a
pathname as argument. [The manual](https://man7.org/linux/man-pages/man3/basename.3.html)
explains in detail:
> The functions `dirname()` and `basename()` break a null-terminated
> pathname string into directory and filename components. In the
> usual case, `dirname()` returns the string up to, but not
> including, the final `/`, and `basename()` returns the component
> following the final `/`. Trailing `/` characters are not counted
> as part of the pathname.
>
> If `path` does not contain a slash, `dirname()` returns the string
> `"."` while `basename()` returns a copy of path. If path is the
> string `"/"`, then both `dirname()` and `basename()` return the string
> `"/"`. If `path` is a null pointer or points to an empty string,
> then both `dirname()` and `basename()` return the string `"."`.
>
> Concatenating the string returned by `dirname()`, a `"/"`, and the
> string returned by `basename()` yields a complete pathname.
Interestingly, this last sentence sounds wrong: in the obvious special case of
the root path, which they mention explicitly, concatenating `dirname()`, `/`
and `basename()` would give `///`, which doesn't look like it's a valid pathname.
But it is! `/`, `//`, `///`, and so on are all valid pathnames referring to the
root directory. They all have basename `/` and dirname `/`. This works anywhere
- for example, `.` and `.///./././././////./` are equivalent pathnames.
Diversion: POSIX basename vs GNU basename
For some reason, there are multiple implementations of these simple sounding
functions. Indeed, [the manual continues](https://man7.org/linux/man-pages/man3/basename.3.html):
> There are two different versions of `basename()` - the POSIX
> version described above, and the GNU version, which one gets
> after
>
```c
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <string.h>
```
>
> The GNU version never modifies its argument, and returns the
> empty string when path has a trailing slash, and in particular
> also when it is `"/"`. **There is no GNU version of `dirname()`**.
>
> With glibc, one gets the POSIX version of `basename()` when
> `<libgen.h>` is included, and the GNU version otherwise.
I actually don't find this explanation super clear. [The documentation for
feature macros](https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html)
is fine and says pretty much what you expect, but in this case it sounds like
to change from the POSIX one to the GNU one you don't *just* `#define
_GNU_SOURCE`, you also have to change from including `<libgen.h>` to including
`<string.h>`.
First of all, the POSIX-compliant version. This should work with both default
and GNU implementations.
```c
#include <libgen.h>
#include <stdio.h>
// For PATH_MAX;
// Not really necessary but feels better than choosing an arbitrary limit
#include <limits.h>
#include <string.h> // for strcpy
int main() {
char buf[PATH_MAX];
char *examples[] = { "/usr/lib", "/usr/", "usr", "/" };
int i;
for(i = 0; i <= 3; i++) {
strcpy(buf, examples[i]);
printf("%s\t%s/%s\n", buf, dirname(buf), basename(buf));
}
return 0;
}
```
Then the results are as expected (the two columns should always print two
equivalent paths, as discussed in the previous section).
```
[daniel@nixos:/tmp/tmp.ugjpoW14w8]$ gcc feature_test.c && ./a.out
/usr /usr/lib
/ //
usr ./usr
/ ///
```
Wait... That's not right! The first column should say `/usr/lib` in the first
column. Of course, what's happening is that the arguments to `printf` are all
being evaluated before `printf` is, and since they modify the buffer in
place you're going to end up with the wrong thing in column 1.
Easy to fix, with some boilerplate:
```c
#include <libgen.h>
#include <stdio.h>
// For PATH_MAX;
// Not really necessary but feels better than choosing an arbitrary limit
#include <limits.h>
#include <string.h> // for strcpy
int main() {
char a[PATH_MAX], b[PATH_MAX], c[PATH_MAX];
char *examples[] = { "/usr/lib", "/usr/", "usr", "/" };
int i;
for(i = 0; i <= 3; i++) {
// Lots of stupid boilerplate. Maybe there's a nicer way?
strcpy(a, examples[i]);
strcpy(b, dirname(a));
strcpy(a, examples[i]);
strcpy(c, basename(a));
strcpy(a, examples[i]);
printf("%s\t\t%s/%s\n", a, b, c);
}
return 0;
}
```
My fancy tabular layout is broken now, but at least the answers are correct:
```
[daniel@nixos:/tmp/tmp.ugjpoW14w8]$ gcc feature_test.c && ./a.out
/usr/lib /usr/lib
/usr/ //usr
usr ./usr
/ ///
```
What about the GNU version? Well this test example should be much simpler to write,
```c
#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
int main() {
char *examples[] = { "/usr/lib", "/usr/", "usr", "/" };
int i;
for(i = 0; i <= 3; i++) {
printf("%s\t%s\n", examples[i], basename(examples[i]));
}
return 0;
}
```
```
[daniel@nixos:/tmp/tmp.ugjpoW14w8]$ gcc feature_test.c && ./a.out
/usr/lib lib
/usr/
usr usr
/
```
but why on earth do they implement `basename()` and not `dirname()`? Anyway:
* You don't need `<libgen.h>` in this case.
* Changing `<string.h>` to `<libgen.h>` segfaults.
* Having `<string.h>` and `<libgen.h>` also segfaults.
Full example
Since we only need `basename`, the program is a little simpler to use the GNU
version. We don't have to worry about segfaulting, since `pathname` is not a
string literal (and I don't think it's read-only) but nevertheless if we modify
it multiple times we could get unexpected results.
Full example (including the implementation of `getftype()`) below.
```c
/* Recommended in the "Version declaration" section. Must precede #include <fuse.h> */
#define FUSE_USE_VERSION 26
#define _GNU_SOURCE
#include <fuse.h>
/* to get command line arguments */
#include <stdio.h>
/* for clock_gettime */
#include <time.h>
/* for geteuid/getegid */
#include <unistd.h>
/* for ENOENT and other error codes */
#include <errno.h>
/* for basename and strncpy */
#include <string.h>
#define ISPREFIX(pre, str) (strncmp(pre, str, strlen(pre)) == 0)
unsigned int getftype(const char *restrict pathname) {
if((strcmp("/", pathname) == 0)) { return S_IFDIR; } // special case for root directory
if(ISPREFIX("dir", basename(pathname))) { return S_IFDIR; }
if(ISPREFIX("file", basename(pathname))) { return S_IFREG; }
if(ISPREFIX("link", basename(pathname))) { return S_IFLNK; }
return -ENOENT;
}
int fake_stat(const char *restrict pathname, struct stat *restrict stat_buf)
{
struct timespec now;
clock_gettime(CLOCK_REALTIME, &now);
stat_buf->st_nlink = 1;
stat_buf->st_uid = geteuid();
stat_buf->st_gid = getegid();
stat_buf->st_size = 0;
stat_buf->st_blocks = 0;
int type = getftype(pathname);
if(type < 0) { return type; }
else { stat_buf->st_mode = type | 00777; }
stat_buf->st_atim = now;
stat_buf->st_mtim = now;
stat_buf->st_ctim = now;
return 0;
}
int fake_readlink(const char *restrict pathname, char *dest_buf, size_t length) {
char *substr = strstr(pathname, "/link");
if(substr != NULL) {
memcpy(substr, "/file", 5);
strncpy(dest_buf, pathname, length);
return ((strlen(pathname) > length) ? -1 : 0); /* perhaps this should be ENAMETOOLONG? */
} else {
return -1;
}
}
int main(int argc, char *argv[]) {
struct fuse_operations op = {
.getattr = fake_stat,
.readlink = fake_readlink
};
fuse_main(argc, argv, &op, NULL);
}
```
After mounting, produces the following results. Note that the system correctly
infers that `test/xyz/link-1` doesn't exist (because `test/xyz` is not a
directory).
```
$ stat test/dir1 test/dir1/file-abcd test/dir1/link-abcd test/xyz test/xyz/link-1
File: test/dir1
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 0,132 Inode: 2 Links: 1
Access: (0777/drwxrwxrwx) Uid: ( 1000/ daniel) Gid: ( 100/ users)
Access: 2024-03-02 16:23:24.199837974 +0000
Modify: 2024-03-02 16:23:24.199837974 +0000
Change: 2024-03-02 16:23:24.199837974 +0000
Birth: -
File: test/dir1/file-abcd
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 0,132 Inode: 3 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1000/ daniel) Gid: ( 100/ users)
Access: 2024-03-02 16:23:24.201383499 +0000
Modify: 2024-03-02 16:23:24.201383499 +0000
Change: 2024-03-02 16:23:24.201383499 +0000
Birth: -
File: test/dir1/link-abcd -> /dir1/file-abcd
Size: 0 Blocks: 0 IO Block: 4096 symbolic link
Device: 0,132 Inode: 4 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ daniel) Gid: ( 100/ users)
Access: 2024-03-02 16:23:24.202712664 +0000
Modify: 2024-03-02 16:23:24.202712664 +0000
Change: 2024-03-02 16:23:24.202712664 +0000
Birth: -
stat: cannot statx 'test/xyz': No such file or directory
stat: cannot statx 'test/xyz/link-1': No such file or directory
```