R Extension for MediaWiki v0.06

Aus sk21
Wechseln zu: Navigation, Suche

Installation

Prerequisites

I do not test the extension/plugin with a lot of systems, here the systems and software versions we have used or tested:

Version Operating system MediaWiki PHP MySQL ImageMagick R Contact
0.02 Suse Linux 9.1 2.0.1
0.02 Suse Linux 10.0 1.5.5 4.4.0 4.1.13 6.2.3 2.1.1 sigbert@wiwi.hu-berlin.de
0.03 Suse Linux 10.0 1.6.5 4.4.0 4.1.13 6.2.3 2.3.0 sigbert@wiwi.hu-berlin.de
0.03 Debian 3.1rev2 1.6.8 4.4.2-0.dotdeb.2 4.0.24_Debian-10sarge2-log 6.0.6 2.1.0 sigbert@wiwi.hu-berlin.de
0.03 Ubuntu 5.10 1.6.6 5.0.5 4.1.12 6.2.3.4 2.1.1 acrida@arrakis.es
0.03 Ubuntu 6.06.1 LTS Server 1.7.1 5.1.2 5.0.22 6.2.4.5 2.3.1 TECason@vcu.edu
0.04 Fedora Core 4 Server 1.8.2 5.0.4 4.1.20 6.2.2 2.4.0 sigbert@wiwi.hu-berlin.de
0.04 Debian 3.1 1.5.8 4.3.10-18 (apache) 4.0.24_Debian-10sarge2-log 6.0.6 2.1.0 baaden@smplinux.de
0.05 Knoppix-based-4.0.2 1.9.3 5.2.1 5.1.16 6.2.0 2.4.1 cchuang@mail.cgu.edu.tw
0.06 Ubuntu 7.10 - LAMP 2.5.1 mario.dicaro@gmail.com
0.06 Ubuntu 6.06 LTS (Lamp) 1.12.0 5.1.2 5.0.22-Debian_0ubuntu6.06.6-log 6.2.4 2.5.1 sigbert@wiwi.hu-berlin.de

If you download the extension and use it successfully on other systems/other software combinations then please let me know such that we can update the list above. Thanks!

After downloading and unpacking

After installing the mediawiki, e.g. under /srv/htdocs/mywiki (Suse Linux or /var/www/mywiki under Debian), you should have the following directory and file structure in mywiki (replace mywiki by your wiki name):

drwxr-xr-x  16 wwwrun www    4096 Aug  7 13:57 .
drwxr-xr-x  16 root   root   4096 Jul 24 11:41 ..
-rw-r--r--   1 wwwrun www     815 Jul 14 09:10 AdminSettings.php
-rw-r--r--   1 wwwrun www     819 Jun 21 06:44 AdminSettings.sample
-rw-r--r--   1 wwwrun www   17997 May  3 02:31 COPYING
-rw-r--r--   1 wwwrun www     160 May  3 02:31 FAQ
-rw-r--r--   1 wwwrun www  122877 May  3 02:31 HISTORY
-rw-r--r--   1 wwwrun www    3985 May  3 02:31 INSTALL
-rw-r--r--   1 wwwrun www    4666 Jun 30 12:39 LocalSettings.php
-rw-r--r--   1 wwwrun www    3493 May  3 02:31 README
-rw-r--r--   1 wwwrun www    8430 Jun 21 06:44 RELEASE-NOTES
drwxr-xr-x   2 wwwrun www   16384 Aug  2 09:19 Rfiles
-rw-r--r--   1 wwwrun www   11334 May  3 02:31 UPGRADE
drwxr-xr-x   2 wwwrun www    4096 Jun  2 16:00 bin
drwxr-xr-x   2 wwwrun www    4096 Jun  2 16:00 config
-rw-r--r--   1 wwwrun www   17997 Jun 21 06:44 copying
drwxr-xr-x   4 wwwrun www    4096 Jun  2 16:00 docs
drwxr-xr-x   3 wwwrun www    4096 Jun 26 16:23 extensions
-rw-r--r--   1 wwwrun www     160 Jun 21 06:44 faq
-rw-r--r--   1 wwwrun www  122877 Jun 21 06:44 history
drwxr-xr-x  12 wwwrun www    4096 Jun 26 06:32 images
-rw-r--r--   1 wwwrun www    1666 Jun 21 06:44 img_auth.php
drwxr-xr-x   2 wwwrun www    4096 Jun  2 16:00 include
drwxr-xr-x   6 wwwrun www    8192 Jun  2 16:00 includes
-rw-r--r--   1 wwwrun www    3967 Jun 21 06:44 index.php
-rw-r--r--   1 wwwrun www    3985 Jun 21 06:44 install
-rw-r--r--   1 wwwrun www    3731 Jun 21 06:44 install-utils.inc
drwxr-xr-x   2 wwwrun www    8192 Jun  2 16:00 languages
drwxr-xr-x   2 wwwrun www    4096 Jun  2 16:01 locale
drwxr-xr-x   7 wwwrun www    4096 Jun  2 16:01 maintenance
drwxr-xr-x   2 wwwrun www    4096 Jun 23 14:52 math
-rw-r--r--   1 wwwrun www    6347 Jun 21 06:44 profileinfo.php
-rw-r--r--   1 wwwrun www    3493 Jun 21 06:44 readme
-rw-r--r--   1 wwwrun www     566 Jun 21 06:44 redirect.php
-rw-r--r--   1 wwwrun www      91 Jun 21 06:44 redirect.phtml
-rw-r--r--   1 wwwrun www     688 Jun 21 06:44 setup_message.html
drwxr-xr-x   9 wwwrun www    4096 Jun  2 16:01 skins
drwxr-xr-x   2 wwwrun www    4096 Jun  2 16:01 tests
-rw-r--r--   1 wwwrun www    2016 Jun 21 06:44 thumb.php
-rw-r--r--   1 wwwrun www    1589 Jun 21 06:44 trackback.php
-rw-r--r--   1 wwwrun www   11334 Jun 21 06:44 upgrade
-rw-r--r--   1 wwwrun www      88 Jun 21 06:44 wiki.phtml

Note that all files belong to the user under which the webserver runs (wwwrun www Suse Linux, www-data www-data Debian, apache apache Fedora)!

  • The extension/plugin contains the following files which has to be installed in to the mywiki directory:
./extensions/R/StatWiki.r
./extensions/Rext.php
./extensions/extbase.php
./R.php
  • Add in LocalSettings.php the line
require_once('extensions/Rext.php');
  • If you want to allow the upload of data files to read them into your R programs with readdataSK then you need add the following line in LocalSettings.php:
$wgFileExtensions = array( 'png', 'gif', 'jpg', 'jpeg', 'csv');

By default Mediawiki will allow the first four extensions to upload graphic files. If you also want to upload files with other extensions (e.g. csv) you have to add them to the array wgFileExtensions.

  • Create a directory Rfiles in mywiki which is writable for the user under which your webserver (wwwrun www Suse Linux, www-data www-data Debian, apache apache Fedora) runs, e.g. by
mkdir Rfiles
chown -R wwwrun:www Rfiles 

Security issues

Level Description Environment
0 no security at all; a R program has to everything access that the www-user has access should only be used in a secure intranet
1 (default) like in R-PHP a bunch of R commands is forbidden could be used in www environment; may still cause problems
2 R programs are executed as a specific user (default: rd) can be securely used in a www environment, an administrator can restrict the user (quotas, rw permissions etc.)

The security levels are nested, everything that does not work in level 1, will not work in level 2 etc.

Activating security level 2

  • Create a user which will be used for executing the R programs, by default I use the name rd, the group of the user has to be the same as the web server user (Suse Linux: www, Debian: www-data)
  • Goto to your mediawiki directory cd ....../mywiki
  • Modify the rights of the Rfiles directory and of all files in it in your wiki mywiki by
chmod -R 777 Rfiles
  • Add in the sudoer file (/etc/sudoer for Suse Linux)
wwwrun  mars=(rd) NOPASSWD: ALL 

which allows the user wwwrun to make a sudo to the user rd on host mars without giving a password. Maybe it is a good idea to not give a a shell to the user wwwrun.

  • Edit the file extensions/Rext.php and change the lines
defined('security')  || define('security',  1);
defined('sudouser')  || define('sudouser', 'rd');

to

defined('security')  || define('security',  2);
defined('sudouser')  || define('sudouser', 'rd');
  • In case that you have not choosen rd than replace 'rd' by 'myusername'.
  • Make sure that sudo is installed

Final security hints

I made the extension as secure as possible. If anybody can imagine further security measures then please let me know. However, some more security steps are highly recommend:

  • Run the Mediawiki and the R programs on a server which is not connected to your intranet.
  • Make regular backups of your server.
  • If your are changing in the Rext.php the security level to 0 then be aware that every external user can run any R program on your server and has full access to the same files as webuser. This should only be done in a secure intranet.

Documentation

New tags and attributes

Tag Attribute Values(s) Required
(default)
Meaning
<R>...</R> runs the R program on save and display its output
output text or html or display no (text) Which kind of output should be taken:
text raw output from the program
html program generates only HTML output
display program generates a PDF graphic
style #CDATA no Style information for the output, e.g. width and height of an image
convert #CDATA no (-trim) options for the convert command for generating JPEG graphics from the PDF graphics
alt #CDATA no alternative text
onsave -- no Usually the extension/plugin checks if the generated output exists, if yes the R program is not run twice. onsave forces a rerun during saving of a page
iframe #CDATA no Style information for the <iframe...> with the result of the R program
name #CDATA no name of the <iframe...> element; has to be set if the parameters for the program should come from a form
workspace [A-Za-z0-9_] no name of the workspace which is loaded before the program is run and saved before the end of the program
<Rform>...</Rform> Form to enter parameters and run the program
name #CDATA no name of the <iframe...> element to display the result

Additional R functions and variables

Note that the variables and functions are only available when they are needed, e.g. if output="html" then rpdf is not available.

Variables

Name Meaning
rpdf name of a PDF file which can be used to store a graphics
rhtml name of a HTML file which can be used to store a HTML output generaed, e.g. by outHTML
rfiles the full path of the Rfiles directory

Functions

Name Parameter Meaning
outHTML creates a HTML table from a matrix
html filename for writing the HTML output
x matrix for the table entries, the colnames and rownames of the matrix will be used as border entries (top and left)
titlecolnames
rownamesx
title entry in the upper left cell (default: "")
style CSS type style information for the table (default: "width:100%;")
tstyle CSS type style information for the upper left cell (default: "background-color:#BBBBBB; vertical-align:top; text-align:left;")
cstyle CSS type style information for the column headers (default: "background-color:#BBBBBB; vertical-align:top; text-align:right;")
rstyle CSS type style information for the row headers (default: "background-color:#BBBBBB; vertical-align:top; text-align:right;")
ostyle CSS type style information for the odd rows (default: "background-color:#FFFFFF; vertical-align:top; text-align:left;")
estyle CSS type style information for the even rows (default: "background-color:#CCCCCC; vertical-align:top; text-align:left;")
... further parameters which are used for the formatC routine to format
trellisSK the trellis.device can be used, but the use of file=... is not allowed. Therefore we defined our own function which takes as first parameter the filename. So use trellisSK instead of trellis.device.
file in file is the graphic stored
... further named parameters given to trellis.device
readdataSK readdataSK reads uploaded data into R. It uses internally the functions read.table, read.csv and read.csv2. The filename must be unique otherwise readdataSK will throw a R error.
name name of the file with extension
format one of the format strings

csv (default) for english Excel files using read.csv
csv2 for german Excel files using read.csv2
table for raw ascii data using read.table
txt if name contains the ascii data using read.table in connection with textConnection

... further named parameters given to read.XXX commands

Contents of the files

Here are short overview what the files installed in the Mediawiki good for.

R.php

The PHP script is called when you use an interactive example.

extensions/Rext.php

Here all the work is done; only if you use interactive examples some work is done in R.php. You may change the following variables:

security
sets the security level (default 1), see the section about Security issues.
r_img
sets the type of the graphics generated (default '.jpg', other possibility is '.png'); for other types see the manual about convert. Do not forget the dot at the beginning!

extensions/extbase.php

Since I use further extensions in my wiki I put here all code which common for all my extensions.

extensions/R/StatWiki.r

The R program that contains outHTML , trellisSK, readdataSK.

Frequently asked questions

Q: My R programs worked perfectly when my base address was http://www.myaddress.de/mediawiki/xyz. Now I have made my own domain http://www.myXYZdomain.de which is my new base adress and the R programs do not work anymore. What I can do ?

A: To find the right base URL the R extension checks the current path, e.g. /srv/www/htdocs/mediawiki/xyz. Usually it is assumed that the URL starts after the 'htdocs' name (which gives the relative URL 'mediawiki/xyz'):

File translates to URL
/srv/www/htdocs/mediawiki/xyz/abc.html http://localhost/mediawiki/xyz/abc.html

If you want to change this behaviour then edit the file extbase.php and in extractUrl in the line $exparr = array('statwiki', 'teachwiki', 'xyz');. Now the URL is generated relatively to /srv/www/htdocs/mediawiki/xyz:

File translates to URL
/srv/www/htdocs/mediawiki/xyz/abc.html http://localhost/abc.html

Q: My R program works if I generate raw output, but does not work with graphics (just empty pictures). What can I do ?

A: See above.

Q: I uploaded an Excel CSV file, but it complains "more columns than column names".

A: If you have a german Excel CSV file then use readdataSK("myname.csv", format="csv2").

Q: I try to read in a file, but it complains "invalid 'description' argument".

A: Check if the file is uploaded under the given name. Mediawiki capitalizes usually the first letter of the file name and this matters on UNIX machines.

Q: If the security level is set to one or higher which R command can not be used ?

A: I have taken the list of commands from R-PHP people as basis.

Command Description
.C The functions '.C' and '.Fortran' can be used to make calls to C and Fortran code.
.Call '.Call' and '.Call.graphics' can be used call compiled code which makes use of internal R objects.
.Call.graphics '.Call' and '.Call.graphics' can be used call compiled code which makes use of internal R objects.
.External '.External' and '.External.graphics' can be used to call compiled code that uses R objects in the same way as internal R functions.
.External.graphics '.External' and '.External.graphics' can be used to call compiled code that uses R objects in the same way as internal R functions.
.Fortran The functions '.C' and '.Fortran' can be used to make calls to C and Fortran code.
.readRDS A simple low level interface for serializing to connections.
.saveRDS A simple low level interface for serializing to connections.
.Script Run a script through its interpreter with given arguments.
.Tcl Low-level Tcl/Tk Interface.
.Tcl.args Low-level Tcl/Tk Interface.
.Tcl.callback Low-level Tcl/Tk Interface.
.Tk.ID Low-level Tcl/Tk Interface.
.Tk.newwin Low-level Tcl/Tk Interface.
.Tk.subwin Low-level Tcl/Tk Interface.
.Tkroot Low-level Tcl/Tk Interface.
.Tkwin Low-level Tcl/Tk Interface.
basename 'basename' removes all of the path up to the last path separator (if any).
browseURL Load a given URL into a WWW browser.
bzfile Function to create, open and close connections.
call Create objects of mode "call"
capture.output Evaluates its arguments with the output being returned as a character string or sent to a file. Related to 'sink' in the same way that 'with' is related to 'attach'.
close 'close' closes and destroys a connection.
close.screen 'close.screen' removes the specified screen definition(s).
closeAllConnection 'closeAllConnections' closes (and destroys) all open user connections, restoring all 'sink' diversions as it does so.
data.entry A spreadsheet-like editor for entering or editing data.
data.restore Reads binary data files or 'data.dump' files that were produced in S version 3.
dataentry A spreadsheet-like editor for entering or editing data.
de A spreadsheet-like editor for entering or editing data.
dev.control 'dev.control' allows the user to control the recording of graphics operations in a device.
dev.copy2eps 'dev.copy2eps' is similar to 'dev.print' but produces an EPSF output file, in portrait orientation ('horizontal = FALSE').
dev.cur This function provides control over multiple graphics devices.
dev.list This function provides control over multiple graphics devices.
dev.next This function provides control over multiple graphics devices.
dev.prev This function provides control over multiple graphics devices.
dev.print This function provides control over multiple graphics devices.
dev.set This function provides control over multiple graphics devices.
dev2bitmap 'dev2bitmap' copies the current graphics device to a file in a graphics format.
dget Writes an ASCII text representation of an R object to a file or connection, or uses one to recreate the object.
dir This function produces a list containing the names of files in the named directory.
dir.create 'dir.create' creates a directory.
dirname "'dirname' returns the part of the 'path' up to (but excluding) the last path separator, or '"".""' if there is no path separator."
do.call 'do.call' executes a function call from the name of the function and a list of arguments to be passed to it.
download.file This function can be used to download a file from the Internet.
dput Writes an ASCII text representation of an R object to a file or connection, or uses one to recreate the object.
dump This function takes a vector of names of R objects and produces text representations of the objects on a file or connection.
dyn.load Load or unload shared libraries, and test whether a C function or Fortran subroutine is available.
edit Invoke a text editor on an R object.
edit.data.frame Use data editor on data frame or matrix contents.
emacs Invoke a text editor (emacs) on an R object.
erase.screen 'erase.screen' is used to clear a single screen, which it does by filling with the background colour.
eval Evaluate an R expression in a specified environment.
example Run all the R code from the *Examples* part of R's online help topic 'topic' with two possible exceptions, 'dontrun' and 'dontshow', see Details below.
fifo Function to create, open and close connections.
file Function to create, open and close connections.
file.access Utility function to access information about files on the user's file systems.
file.append 'file.append' attempts to append the files named by its second argument to those named by its first. The R subscript recycling rule is used to align names given in vectors of different lengths.
file.choose Choose a file interactively.
file.copy 'file.copy' works in a similar way to 'file.append' but with the arguments in the natural order for copying.
file.create 'file.create' creates files with the given names if they do not already exist and truncates them if they do.
file.exists 'file.exists' returns a logical vector indicating whether the files named by its argument exist.
file.info Utility function to extract information about files on the user's file systems.
file.path Construct the path to a file from components in a platform-independent way.
file.remove 'file.remove' attempts to remove the files named in its argument.
file.rename 'file.rename' attempts to rename a single file.
file.show Display one or more files.
file.symlink 'file.symlink' makes symbolic links on those Unix-like platforms which support them.
fix 'fix' invokes 'edit' on 'x' and then assigns the new (edited) version of 'x' in the user's workspace.
getConnection 'getConnection' returns a connection object, or 'NULL'.
getwd "'getwd' returns an absolute filename representing the current working directory of the R process; 'setwd(dir)' is used to set the working directory to 'dir'."
graphics.off 'graphics.off()' shuts down all open graphics devices.
gzcon 'gzcon' provides a modified connection that wraps an existing connection, and decompresses reads or compresses writes through that connection.
gzfile 'gzfile' applies to files compressed by 'gzip', and 'bzfile' to those compressed by 'bzip2': such connections can only be binary.
INSTALL Utility for installing add-on packages.
install.packages Download Packages from CRAN.
jpeg A graphics device for JPEG format bitmap files.
library.dynam Load the specified file of compiled code if it has not been loaded already, or unloads it.
list.files This function produces a list containing the names of files in the named directory. 'dir' is an alias.
loadhistory Load the commands history.
locator Reads the position of the graphics cursor when the (first) mouse button is pressed.
lookup.xport Lookup information on a SAS XPORT format library
make.packages.html Functions to re-create the HTML documentation files to reflect all installed packages.
make.socket Create a Socket Connection.
menu 'menu' presents the user with a menu of choices labelled from 1 to the number of choices.
open 'open' opens a connection.
parent.frame Function to Access the Function Call Stack.
path.expand Expand a path name, for example by replacing a leading tilde by the user's home directory (if defined on that platform).
pico Invoke a text editor (pico) on an R object.
pictex This function produces graphics suitable for inclusion in TeX and LaTeX documents.
pipe Function to create, open and close connections.
png A graphics device for PNG format bitmap files.
postscript 'postscript' starts the graphics device driver for producing PostScript graphics.
print.socket Related to 'make.socket'.
prompt Facilitate the constructing of files documenting R objects.
promptData Generates a shell of documentation for a data set.
quartz 'quartz' starts a graphics device driver for the MacOS X System. This can only be done on machines that run MacOS X.
R.home Return the R home directory.
R.version 'R.Version()' provides detailed information about the version of R running.
read.00Index Read 00Index-style Files.
read.dta Read Stata binary files.
read.epiinfo Read Epi Info data files
read.fwf "Read a ""table"" of *f*ixed *w*idth *f*ormatted data into a 'data.frame'."
read.mtp Read a Minitab Portable Worksheet.
read.socket 'read.socket' reads a string from the specified socket.
read.spss Read an SPSS data file.
read.ssd Obtain a data frame from a SAS permanent dataset.
read.xport Read a SAS XPORT format library.
readBin Read binary data from a connection.
readdataSK Read data from files (only banned when security=2).
readline 'readline' reads a line from the terminal.
readLines Read text lines from a connection.
remove.packages Removes installed packages and updates index information as necessary.
Rprof Enable or disable profiling of the execution of R expressions.
save 'save' writes an external representation of R objects to the specified file.
savehistory Save the commands history.
scan Read data into a vector or list from the console or file.
screen 'screen' is used to select which screen to draw in.
seek Functions to re-position connections.
setwd 'getwd' returns an absolute filename representing the current working directory of the R process.
showConnection 'showConnections' returns a character matrix of information with a row for each connection, by default only for open non-standard connections.
sink 'sink' diverts R output to a connection.
sink.number 'sink.number()' reports how many diversions are in use.
socketConnection Function to create, open and close connections.
source Read R Code from a File or a Connection.
split.screen 'split.screen' defines a number of regions within the current device which can, to some extent, be treated as separate graphics devices.
stderr 'stdin()', 'stdout()' and 'stderr()' return connection objects.
stdin 'stdin()', 'stdout()' and 'stderr()' return connection objects.
stdout 'stdin()', 'stdout()' and 'stderr()' return connection objects.
sys.call Function to Access the Function Call Stack.
sys.calls Function to Access the Function Call Stack.
sys.frame Function to Access the Function Call Stack.
sys.frames Function to Access the Function Call Stack.
sys.function Function to Access the Function Call Stack.
Sys.getenv 'Sys.getenv' obtains the values of the environment variables named by 'x'.
Sys.getlocale Get details of or set aspects of the locale for the R process.
Sys.info Reports system and user information.
sys.nframe Function to Access the Function Call Stack.
sys.on.exit Function to Access the Function Call Stack.
sys.parent Function to Access the Function Call Stack.
sys.parents Function to Access the Function Call Stack.
Sys.putenv 'putenv' sets environment variables (for other processes called from within R or future calls to 'Sys.getenv' from this R process).
Sys.sleep Suspend execution of R expressions for a given number of seconds.
Sys.source Parse and Evaluate Expressions from a File.
sys.source Parse and Evaluate Expressions from a File.
sys.status Function to Access the Function Call Stack.
Sys.time 'Sys.time' and 'Sys.Date' returns the system's idea of the current date with and without time.
system 'system' invokes the OS command specified by 'command'.
system.file Finds the full file names of files in packages etc.
tempfile 'tempfile' returns a vector of character strings which can be used as names for temporary files.
textConnection Input and output text connections.
tkpager Page file using Tk text widget.
tkStartGUI Tcl/Tk GUI startup.
unlink 'unlink' deletes the file(s) or directories specified by 'x'.
unz 'unz' reads (only) single files within zip files, in binary mode.
update.packages Download Packages from CRAN.
url Function to create, open and close connections.
url.show Display a text URL.
vi Invoke a text editor (vi) on an R object.
write Write Data to a File.
write.dta Write files in Stata binary format.
write.ftable Manipulate Flat Contingency Tables.
write.socket 'write.socket' writes to the specified socket.
write.table 'write.table' prints its required argument 'x' (after converting it to a data frame if it is not one already) to 'file'.
writeBin Write binary data to a connection.
writeLines Write Lines to a Connection.
x11 'X11' starts a graphics device driver for the X Window System (version 11).
X11 'X11' starts a graphics device driver for the X Window System (version 11).
xedit Invoke a text editor on an R object.
xemacs Invoke a text editor on an R object.
xfig 'xfig' starts the graphics device driver for producing XFig (version 3.2) graphics.
zip.file.extract This will extract the file named 'file' from the zip archive, if possible, and write it in a temporary location.