scratch/output/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/index.html
Yann Esposito (Yogsototh) c53114f72a Merge branch 'master' into next
(Recompiled)

Conflicts:
	output/Scratch/assets/css/main.css
	output/Scratch/en/blog/Yesod-excellent-ideas/index.html
	output/Scratch/en/blog/feed/feed.xml
	output/Scratch/en/blog/index.html
	output/Scratch/en/index.html
	output/Scratch/fr/blog/Yesod-excellent-ideas/index.html
	output/Scratch/fr/blog/feed/feed.xml
	output/Scratch/fr/blog/index.html
	output/Scratch/fr/index.html
	output/Scratch/sitemap.xml
	output/index.html
2011-11-16 13:08:26 +01:00

208 lines
No EOL
9.7 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="keywords" content="awk, shell, script">
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/twilight.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
<link rel="alternate" lang="fr" xml:lang="fr" title="découper un fichier par mots clés" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-18-split-a-file-by-keyword/" />
<link rel="alternate" lang="en" xml:lang="en" title="split a file by keyword" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/" />
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
<script type="text/javascript" src="/Scratch/js/index.js"></script>
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<title>split a file by keyword</title>
</head>
<body lang="en" class="article">
<script type="text/javascript">// <![CDATA[
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
// ]]>
</script>
<div id="content">
<div id="choix">
<div class="return"><a href="#entete">&darr; Menu &darr;</a></div>
<div id="choixlang">
<a href="/Scratch/fr/blog/2010-02-18-split-a-file-by-keyword/" onclick="setLanguage('fr')">en Français</a>
</div>
<div class="flush"></div>
</div>
<div id="titre">
<h1>
split a file by keyword
</h1>
</div>
<div class="flush"></div>
<div class="flush"></div>
<div id="afterheader">
<div class="corps">
<p>Strangely enough, I didnt find any built-in tool to split a file by keyword. I made one myself in <code>awk</code>. I put it here mostly for myself. But it could also helps someone else.
The following code split a file for each line containing the word <code>UTC</code>.</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env awk</span>
<span class="Entity">BEGIN</span>{i=0;}
<span class="StringRegexp"><span class="StringRegexp">/</span>UTC<span class="StringRegexp">/</span></span> {
i+=1;
FIC=<span class="SupportFunction">sprintf</span>(<span class="String"><span class="String">&quot;</span>fic.%03d<span class="String">&quot;</span></span>,i);
}
{<span class="SupportFunction">print</span> <span class="Variable"><span class="Variable">$</span>0</span>&gt;&gt;FIC}
</pre></div>
<p>In my real world example, I wanted one file per day, each line containing UTC being in the following format:</p>
<pre class="twilight">
Mon Dec 7 10:32:30 UTC 2009
</pre>
<p>I then finished with the following code:</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env awk</span>
<span class="Entity">BEGIN</span>{i=0;}
<span class="StringRegexp"><span class="StringRegexp">/</span>UTC<span class="StringRegexp">/</span></span> {
date=<span class="Variable"><span class="Variable">$</span>1</span><span class="Variable"><span class="Variable">$</span>2</span><span class="Variable"><span class="Variable">$</span>3</span>;
<span class="Keyword">if</span> ( date&nbsp;!= olddate ) {
olddate=date;
i+=1;
FIC=<span class="SupportFunction">sprintf</span>(<span class="String"><span class="String">&quot;</span>fic.%03d<span class="String">&quot;</span></span>,i);
}
}
{<span class="SupportFunction">print</span> <span class="Variable"><span class="Variable">$</span>0</span>&gt;&gt;FIC}
</pre></div>
</div>
<div id="choixrss">
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
Subscribe
</a>
</div>
<script type="text/javascript">
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
document.write('<div id="clickcomment">Comments</div>');
</script>
<div class="flush"></div>
<div class="corps" id="comment">
<h2 class="first">comments</h2>
<noscript>
You must enable javascript to comment.
</noscript>
<script type="text/javascript">
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
var idcomments_post_id = '/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/';
</script>
<span id="IDCommentsPostTitle" style="display:none"></span>
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
</div>
<div id="entete" class="corps_spaced">
<div id="liens">
<ul><li><a href="/Scratch/en/">Home</a></li>
<li><a href="/Scratch/en/blog/">Blog</a></li>
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
<li><a href="/Scratch/en/about/">About</a></li></ul>
</div>
<div class="flush"></div>
<hr/>
<div id="next_before_articles">
<div id="previous_articles">
previous entries
<div class="previous_article">
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/"><span class="nicer">«</span>&nbsp;Pragmatic Regular Expression Exclude (2)</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/"><span class="nicer">«</span>&nbsp;Pragmatic Regular Expression Exclude</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-12-antialias-font-in-Firefox-under-Ubuntu/"><span class="nicer">«</span>&nbsp;antialias font in Firefox under Ubuntu</a>
</div>
</div>
<div id="next_articles">
next entries
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/">When regexp is not the best solution&nbsp;<span class="nicer">»</span></a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-03-22-Git-Tips/">Git Tips&nbsp;<span class="nicer">»</span></a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-03-23-Encapsulate-git/">Encapsulate git&nbsp;<span class="nicer">»</span></a>
</div>
</div>
<div class="flush"></div>
</div>
</div>
<div id="bottom">
<div>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
</div>
<div id="lastmod">
Created: 02/18/2010
Modified: 05/09/2010
</div>
<div>
Entirely done with
<a href="http://www.vim.org">Vim</a>
and
<a href="http://nanoc.stoneship.org">nanoc</a>
</div>
<div>
<a href="/Scratch/en/validation/">Validation</a>
<a href="http://validator.w3.org/check?uri=referer"> [xhtml] </a>
.
<a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3"> [css] </a>
.
<a href="http://validator.w3.org/feed/check.cgi?url=http%3A//yannesposito.com/Scratch/en/blog/feed/feed.xml">[rss]</a>
</div>
</div>
<div class="clear"></div>
</div>
</body>
</html>