C++ Replacement for getopt

C++ Replacement for getopt #

Introduction #

These days command-line programs are not as popular as they used to be. However, from time to time, it’s easier to make one of those instead of making a GUI-based program. In the world of C, since times immemorial, programmers have used getopt() function to parse command line arguments. Technically getopt() is not a C feature, it is a POSIX feature and that is why, if you are using Microsoft Visual C++, it’s not even available.

The code shown below is a simple C++ replacement for the traditional C getopt() parser with most of the flexibility required by the POSIX standard. The two files required, options.h and options.cpp, are part of the mlib project and can be downloaded from the GitHub repo.

Sample usage #

Let’s say we have to parse a command line like this:

testopt -y --params p1 p2 p3 -o 123 arg1 arg2 arg3

Below is the program that can do this.

#include <mlib/options.h>
#include <iostream>

using namespace std;
using mlib::OptionsParser;


int main(int argc, char** argv)
{
  OptParser parser{
    "h|help \t show help message",
    "y| \t boolean flag",
    "n| \t another boolean flag",
    "p+param parameters \t one or more parameters",
    "o:option value \t optional value",
    "*stuff things \t option with zero or more arguments"
  };

This declares the parser object and defines the allowed options. We will see soon the exact syntax for option definitions.

  int nonopt;
  if (opt.parse (argc, argv, &nonopt) != 0)
  {
    cout << "Syntax error. Usage:" <<endl;
    cout << opt.synopsis () << endl << "Where:" << opt.description () << endl;
    exit (1);
  }

The opt.parse() function, parses the command line arguments. nonopt is set to the index of the first non option argument. In the case of our sample command line nonopt will be 8, the index of arg1 in the command line.

  string par;
  if (opt.getopt ("params", par))
  {
    cout << "params:" << par << endl;
  }
}

The opt.getopt() function returns a non-zero value if an option is present on the command line. If the option can have arguments, they are returned as a string (par in our example) separated by a user defined character (by default, the vertical pipe ‘|’).

  if (opt.hasopt ('y'))
    cout << "Yes option set" << endl;

The opt.hasopt() function returns true if an option is present on the command line.

Command Line Syntax Understood by OptParser #

According to POSIX standard a command line has three parts: command name, options plus options arguments and operands. The POSIX standard describes only short options made of one character preceded by -(hyphen), however it is common to have also long options that are preceded by --. The arguments that follow the last option and its arguments are called operands. If one of the arguments is --, option processing stops at that point and all remaining arguments are considered operands. Options can have one or more arguments. If an option has multiple arguments, the arguments end when the next option starts or at the -- argument or at the end of the command line. Short options can be combined behind one single hyphen, provided they don’t have arguments (except maybe the last). For instance, instead of writing:

command -a -b -c

one can write

command -abc

Options can be repeated. In this case, if the option has arguments, all arguments are accumulted. For instance, the following two lines are equivalent:

command -a arg1 arg2 -b
command -a arg1 -b -a arg2

It is customary to describe command line sytax using a synopsis with some type of BNF notation like this:

command -a <arg_a> -b [<arg_b>...] -c|--clong <operand1> <operand2>

where optional arguments are enclosed in square brackets and alternative are denoted by ‘|’. Arguments that can repeat one or more times are indicated by ‘…’ (ellipsis).

Defining Options #

Each valid option is described by a descriptor string with the following syntax:

[<short>] <flag> [<long>] [<spaces><parameter>] [\t<description>]

where

  • <short> - a single character that is the short form of the option.
  • <flag> - one character that specifies the number of arguments that can follow the option:
    • ‘|’ - no arguments
    • ‘:’ - one required argument
    • ‘?’ - one optional argument
    • ‘+’ - one or more arguments
    • ‘*’ - zero or more arguments
  • <long> - a string that specifies the long form of the option
  • <parameter> - a string that is used as parameter name in synopsis
  • <description> - a string used for option description. Parameter name and description are separated by a tab \t character. Either one of the long or short forms of an option can be missing.

OptParser API #

Some of the functions have been mentioned before.

Constructors #

  OptParser ()                                          [1]

Default constructor creates a parser object with an empty list of valid options.

OptParser (std::vector<const char*> &list)              [2]

Initializes parser and sets the list of valid options.

OptParser (const char **list)                           [3]

Initializes parser and sets the list of options descriptors. The argument is a list of C strings terminated with a NULL pointer.

OptParser (std::initializer_list<const char*> list)     [4]

Initializes parser and sets the list of valid options. The initializer list does not need a NULL terminator. See the sample code at the beginning for an example.

Member Functions #

void add_option (const char* descr)                         [1]

Adds a new option descriptor to the list of valid options.

void set_options (std::vector <const char*> &list)          [2]

Set list of valid options. Any previous options are removed and new ones are added.

int parse (int argc, const char* const* argv, int* stop=0)  [3]

Parse command line arguments. stop is a pointer to an integer value that, if not null, receives the index in the argv array of the first non-option argument. If there are no non-option arguments stop == argc.

If successful, the function returns 0. Otherwise it returns an error code:

  • 1 = Unknown option
  • 2 = Required argument missing
  • 3 = Invalid multiple options string If an error occurred, the stop argument, if not null, is the index in the argv array of the argument that triggered the error.
int getopt (char option, std::string& optarg, char sep='|') const               [4]
int getopt (const std::string& option, std::string& optarg, char sep='|') const [5]
int getopt (char option, std::vector<std::string>& optarg) const                [6]
int getopt (const std::string& option, std::vector<std::string>& optarg) const  [7]

Returns a specific option from the command. The function returns the number of option occurrences on the command line. For [4] and [5], the optarg argument receives a string containing all the option’s arguments separated by the sep character. The [6] and [7] return the option arguments as a vector of strings. The option can be specified either using the short form (in [4] and [6]) or the long form (in [5] and [7]).

bool  hasopt (const std::string &option) const          [8]
bool  hasopt (char option) const                        [9]

Checks if an option is present on the command line. Option can be specified using either the short form [9] or the long form [8].

bool  next (std::string& opt, std::string& optarg, char sep='|')  [10]
bool  next (std::string& opt, std::vector<std::string>& optarg)   [11]

Returns the next option on the command line. The parser maintains an internal iterator that is initialized to the first available option when command line is parsed. At each call of a next function, the iterator is incremented and the function returns the next option and its arguments. The function returns false if there are no more options.

Form [10] returns as a string containing all the option’s arguments separated by the sep character. Form [11] returns the arguments as a vector of strings.

If an option has both a long and a short form, the next() function returns the long form.

const std::string&  appname () const  [12]

Returns the program name. This is the content of argv[0] with directory path and extension removed.

const std::string synopsis () const [13]

Generates a nicely formatted syntax string. For the example shown before, the synopsis string is:

appname -h|--help -y -n -p|--param <parameters>... -o|--option <value> --stuff [things ...]

Where appname is the actual name of the executable as returned by the appname() function. Any non-option parameters can be added at the end of the synopsis string.

const std::string  description (size_t indent_size=2) const  [14]

Generates a nicely formatted description string. For the example shown before, the description string is:

  -h|--help                   show help message
  -y                          boolean flag
  -n                          another boolean flag
  -p|--param <parameters>...  one or more parameters
  -o|--option <value>         optional value
  --stuff [things ...]        option with zero or more arguments

In Case You Are Wondering #

  • How are quoted strings handled? They are not. OptParser relies on the operating system to break arguments on the command line.
  • If arguments are combined, why do you return the number of option occurrences? Because it allows you to do fancy stuff like increasing the level of verbosity if a -v option is specified more than once.
  • Some command line parsers allow you to have the argument value separated by an equal sign (like -o=123). Can yours do that? No, first of all that syntax is outside POSIX specification and it would open a can of worms about argument quoting.
  • How efficient is OptParser? Well, not particularly efficient. For all it’s storage needs, OptParser uses strings and vectors of strings. Normally, options parsing is a one time activity and its impact on the execution time is minimal. Efficiency was not a design goal.

Alternative Solutions #

There are other packages that provide a similar functionality. In case you want to look at alternatives you can check:

  • Boost Program Options - Like all things Boost, it is very complete and sophisticated and BIG.
  • LLVM CommandLine - Using LLVM only for command line parsing seems like an overkill to me. Just like BOOST, if you need it for other reasons, it’s certainly worth looking into it.
  • Taywee args
  • args-parser
  • tclap - A brief reding of documentation left me wondering if it isn’t suffering from lack of focus (mission creep). Instead of concentrating only on parsing command line, it drifts into parsing individual elements of the command line, converting them to integers, floats, etc. It then adds validators, visitors and what not. With almost 40 classes, it seems baroque.
  • Argumentum - I’ve just discoverd it (05-Apr-23)

I haven’t tried all of them, but I’d love to hear other opinions.