scratch/content/html/en/blog/2010-02-18-split-a-file-by-keyword.md

-----
isHidden:       false
menupriority:   1
kind:           article
created_at:           2010-02-18T15:29:14+02:00
title: split a file by keyword
author_name: Yann Esposito
author_uri: yannesposito.com
tags:
    - awk
    - shell
    - script

-----

Strangely enough, I didn't find any built-in tool to split a file by keyword. I made one myself in `awk`. I put it here mostly for myself. But it could also helps someone else.
The following code split a file for each line containing the word `UTC`.

<div><code class="perl">
#!/usr/bin/env awk
BEGIN{i=0;}
/UTC/ { 
    i+=1;
    FIC=sprintf("fic.%03d",i); 
} 
{print $0>>FIC}
</code></div>

In my real world example, I wanted one file per day, each line containing UTC being in the following format:

<pre class="twilight">
Mon Dec  7 10:32:30 UTC 2009
</pre>

I then finished with the following code:

<div><code class="perl">
#!/usr/bin/env awk
BEGIN{i=0;}
/UTC/ {
    date=$1$2$3; 
    if ( date != olddate ) {
        olddate=date;
        i+=1;
        FIC=sprintf("fic.%03d",i); 
    }
} 
{print $0>>FIC}
</code></div>
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00			`-----`
			`isHidden: false`
			`menupriority: 1`
			`kind: article`
Etapes avec lnkto + feeds non fonctionnels 2010-03-30 14:39:12 +00:00			`created_at: 2010-02-18T15:29:14+02:00`
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00			`title: split a file by keyword`
Added author_name et uri for feed 2010-05-09 12:53:46 +00:00			`author_name: Yann Esposito`
			`author_uri: yannesposito.com`
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00			`tags:`
			`- awk`
			`- shell`
			`- script`

			`-----`

			Strangely enough, I didn't find any built-in tool to split a file by keyword. I made one myself in `awk`. I put it here mostly for myself. But it could also helps someone else.
			The following code split a file for each line containing the word `UTC`.

new version working (almost) 2010-04-15 09:45:50 +00:00			`<div><code class="perl">`
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00			`#!/usr/bin/env awk`
			`BEGIN{i=0;}`
			`/UTC/ {`
			`i+=1;`
			`FIC=sprintf("fic.%03d",i);`
			`}`
			`{print $0>>FIC}`
new version working (almost) 2010-04-15 09:45:50 +00:00			`</code></div>`
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00
			`In my real world example, I wanted one file per day, each line containing UTC being in the following format:`

			`<pre class="twilight">`
			`Mon Dec 7 10:32:30 UTC 2009`
			`</pre>`

			`I then finished with the following code:`

new version working (almost) 2010-04-15 09:45:50 +00:00			`<div><code class="perl">`
Récupération de la dernière version du contenu. 2010-03-12 13:30:42 +00:00			`#!/usr/bin/env awk`
			`BEGIN{i=0;}`
			`/UTC/ {`
			`date=$1$2$3;`
			`if ( date != olddate ) {`
			`olddate=date;`
			`i+=1;`
			`FIC=sprintf("fic.%03d",i);`
			`}`
			`}`
			`{print $0>>FIC}`
new version working (almost) 2010-04-15 09:45:50 +00:00			`</code></div>`