Ad

Extract The Identical Beginning Parts Of Multiple Strings

I have multiple strings (so-called DOIs) like this:

doi1 <- "10.1057/bp.2009.9"
doi2 <- "10.1057/bp.2015.4"
doi3 <- "10.1057/bp.2008.12"

How do I best extract the common beginnings of the strings?

The correct output should be 10.1057/bp.20.

(My first guess was to use identical(), but that function can only compare two whole strings)

Ad

Answer

The package ‘Biobase’ has this implemented as lcPrefix.

But implementing this oneself isn’t hard; here’s another quick and dirty version (careful, this was only tested on a handful of cases):

find_longest_prefix = function (strings) {
    stopifnot(is.character(strings) && length(strings) > 0L)

    for (len in seq_len(nchar(strings[1L]))) {
        prefixes = substr(strings, 1L, len)
        if (! Reduce(\(prev, p) prev && p == prefixes[1L], prefixes[-1L], TRUE)) {
            len = len - 1L
            break
        }
    }
    substr(strings[1L], 1L, len)
}
Ad
source: stackoverflow.com
Ad