Dans ce tutoriel, je me place dans le cadre où j’ai un certain nombre de fichiers de données de structure similaire que je veux traiter à l’aide d’une fonction R qui produira en sortie des graphiques au format png ou n’importe quelle sortie de type « fichier ». J’explique ici comment faire en sorte que les fichiers produits par la fonction soient enregistrés automatiquement dans le même répertoire que le fichier de données d’origine choisi par l’utilisateur et comment personnaliser leurs noms.

Je remercie Soraya avec qui j’ai regardé ce problème durant son stage et qui a fourni une partie de la réponse…

In this post, I want to address the following issue: several data files with a common trame have to be dealt with by an R function. The function should export files (such as images or data files or any other file type). I explain how to create filenames such that the function automatically exports files in the same directory than the input file chosen by the user and how to customize the names of the exported files.
In this post, I want to address the following issue: several data files with a common trame have to be dealt with by an R function. The function should export files (such as images or data files or any other file type). I explain how to create filenames such that the function automatically exports files in the same directory than the input file chosen by the user and how to customize the names of the exported files.

I thank Soraya with whom I’ve been looking at this problem (during her work placement) and who helps me find the answer (especially by pointing out the use of the function file.choose).

Suppose that the following file (it is the famous iris data set):

ex-data.txt

is in a directory named /home/tuxette/data1/ (for instance) and that you want to create a function extractNum that has no input, make the user chose a dataset (this one for instance) and export two files (Rdata and csv formats) with only the numerical variables included in the original data set. The exported files must be saved in the same directory than the original file (whatever this directory is) and must be named from the original name by adding the post indication -num.Rdata and -num.csv (respectively).

The following function can be used to make the user chose a data set (that can be this data set but any other one also)

selectFile = function(){
	file = file.choose()
	file
}

Then, start the function by making the user select the original data set. The function then load the data set and grepexpr, substr and paste are used to create new filename as described above:

extractNum = function(){
	# Make the user choose a file
	filename = selectFile()
	# Load the file
	d = read.table(filename,header=T)
	# Select numerical variables
	# (on the basis of the first observation only: might be improved)
	index.num = is.numeric(d[1,])
	# Create new data set with only the numerical variables
	new.d = d[,index.num]
	# Extract from "filename" the pattern to export the new data set
	# (that is, everything before the final dot)
	pat = grepexpr("[.]",filename,grep=F)
	# (in our example, pat is 28 because 28 is the only dot in filename)
	pat = substr(filename,1,max(pat[[1]])-1)
	# (in our example, pat is then /home/tuxette/data1/ex-data)

	# Save the data in Rdata and csv formats at home/tuxette/data1/ex-data-num.Rdata
	# and home/tuxette/data1/ex-data-num.csv
	save(new.d,file=paste(pat,"-num.Rdata",sep=""))
	write.table(new.d,file=paste(pat,"-num.csv",sep=""),row.names=F)
}

In this file, note that the dot (pattern argument in the function grepexpr) is a rationnal expression that has to be specified by “[.]” and not only “.”. Then just use:

extractNum()

Write the link to the data set /home/tuxette/data1/ex-data.txt and you should obtain two files with the numerical variables from the iris data set in the original directory of ex-data.txt. Does it work? À suivre… (mais vous pouvez regarder la version anglaise)