<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://diliprajbaral.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://diliprajbaral.com/" rel="alternate" type="text/html" /><updated>2026-01-06T16:21:47+00:00</updated><id>https://diliprajbaral.com/feed.xml</id><title type="html">Dilip Raj Baral</title><subtitle>Software engineering notes, lessons, and opinions.</subtitle><author><name>Dilip Raj Baral</name></author><entry><title type="html">Quick SVN guide for Git users; SVN: The Git Way</title><link href="https://diliprajbaral.com/blog/quick-svn-guide-for-git-users-svn-the-git-way/" rel="alternate" type="text/html" title="Quick SVN guide for Git users; SVN: The Git Way" /><published>2018-01-13T00:00:00+00:00</published><updated>2018-01-13T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/quick-svn-guide-for-git-users-svn-the-git-way</id><content type="html" xml:base="https://diliprajbaral.com/blog/quick-svn-guide-for-git-users-svn-the-git-way/"><![CDATA[<p>Why would a Git user want to switch to SVN, you ask?</p>

<p>Well, sometimes you just don’t have a choice. Imagine working on a project that has been maintained in SVN for a decade. “But migrating an SVN codebase to Git is not a big deal at all.” But there are things like CI/CD integrations to worry about too. That isn’t a really big deal either but sometimes people take “Don’t fix what ain’t broke.” a little too seriously.</p>

<p>Reasons aside, having good Version Control System (Distributed VCS for that matter) concepts, I didn’t want to go through SVN guides from scratch to start with. While there were plenty of resources on the web regarding SVN to Git migration, I couldn’t find a quick and concise guide that would help me work with an SVN repo right away. If you are like me, you will find this article helpful. The following steps show you how you can work with SVN the Git way.</p>

<h2 id="cloning-a-new-repo">Cloning a new repo</h2>
<p>Checking out a repo is similar to how we do it in Git.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn checkout &lt;path-to-your-repo-branch&gt; &lt;path-to-checkout&gt;
</code></pre></div></div>

<h4 id="example">Example</h4>
<p>The following checks out your code to your current working directory.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn checkout https://mysvnrepo.com/myrepo/trunk <span class="nb">.</span>
</code></pre></div></div>

<h2 id="creating-a-new-topic-branch">Creating a new topic branch</h2>
<p>In SVN, branches (and tags) are nothing but simply a copy of one branch. A literal copy-paste of the files, unlike a pointer to commits in Git. This fact took me a while to digest and get used to.</p>

<p>The following commands are SVN equivalent to <code class="language-plaintext highlighter-rouge">git checkout -b branch</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn copy &lt;path-to-a-branch&gt; &lt;path-for-new-branch&gt; <span class="nt">-m</span> <span class="s2">"Message"</span>
</code></pre></div></div>

<h4 id="example-1">Example</h4>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn copy <span class="nt">--parents</span> https://mysvnrepo.com/myrepo/trunk https://mysvnrepo.com/myrepo/branches/feature-branch
svn switch https://mysvnrepo.com/myrepo/branches/feature-branch
</code></pre></div></div>

<h2 id="working-on-the-repo">Working on the repo</h2>
<h3 id="adding-new-files">Adding new files</h3>
<p>To add new files, you would use:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn add &lt;path-to-file&gt;
</code></pre></div></div>

<p>As for modified files, we don’t need to add them. We can commit straight away.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn commit <span class="nt">-m</span> <span class="s2">"Commit message"</span>
</code></pre></div></div>

<p>To commit only specific files, we need to list files after the commit message.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn commit <span class="nt">-m</span> <span class="s2">"Commit message"</span> &lt;path-to-file-1&gt; &lt;path-to-file-2&gt;
</code></pre></div></div>

<p>If we want to commit a single file, we can do the following too.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn commit &lt;path-to-file&gt; <span class="nt">-m</span> <span class="s2">"Commit message"</span>
</code></pre></div></div>

<h3 id="checking-out-new-changes">Checking out new changes</h3>
<p>The following is the SVN equivalent to <code class="language-plaintext highlighter-rouge">git fetch &amp;&amp; git merge</code> or <code class="language-plaintext highlighter-rouge">git pull</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn update
</code></pre></div></div>

<h3 id="merging-your-feature-branch-to-trunk">Merging your feature branch to trunk</h3>
<p>Merging a branch in SVN is similar to how we do it in Git.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn merge &lt;path-to-branch-to-branch&gt;
</code></pre></div></div>

<h4 id="example-2">Example</h4>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn update
svn switch https://mysvnrepo.com/myrepo/trunk
svn update
svn merge https://mysvnrepo.com/myrepo/branches/feature-branch
svn commit <span class="nt">-m</span> <span class="s2">"Merge feature branch to trunk"</span>
</code></pre></div></div>

<h3 id="deleting-feature-branch-after-merging">Deleting feature branch after merging</h3>
<p>To delete a feature branch (or any branch for that matter), <code class="language-plaintext highlighter-rouge">svn delete</code> is used.</p>

<h4 id="example-3">Example</h4>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>svn delete https://mysvnrepo.com/myrepo/branches/feature-branch <span class="nt">-m</span> <span class="s2">"Delete feature branch after merging"</span>
</code></pre></div></div>]]></content><author><name>Dilip Raj Baral</name></author><category term="version-control" /><category term="svn" /><category term="git" /><category term="svn" /><category term="git" /><category term="version-control" /><category term="workflow" /><summary type="html"><![CDATA[This is a quick Subversion (SVN) guide for Git users. It helps you get started with SVN right away, the Git way.]]></summary></entry><entry><title type="html">Topic Modelling using LDA with MALLET</title><link href="https://diliprajbaral.com/blog/topic-modelling-lda-mallet/" rel="alternate" type="text/html" title="Topic Modelling using LDA with MALLET" /><published>2017-06-04T00:00:00+00:00</published><updated>2017-06-04T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/topic-modelling-lda-mallet</id><content type="html" xml:base="https://diliprajbaral.com/blog/topic-modelling-lda-mallet/"><![CDATA[<p>Machine Learning for Language Toolkit, in short MALLET, is a tool written in Java for applications of machine learning such as natural language processing, document classification, clustering, topic modeling, and information extraction to texts. To learn what MALLET has to offer in detail, <a href="http://mallet.cs.umass.edu/index.php">visit this page</a>.</p>

<p>In this post, we see how we can create topic models from a large collection of unlabeled text documents and use the model to infer topics in new documents.</p>

<p>Topic models use different algorithms to extract <em>topics</em> from a <em>corpus of texts</em>. MALLET uses Gibbs sampling based implementations of Latent Dirichlet Allocation (LDA), Pachinko Allocation and Hierarchical LDA. Check <a href="https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/">this page</a> if you want to know about topic modeling in detail.</p>

<h2 id="setting-up-mallet">Setting up MALLET</h2>

<p>Go to the MALLET <a href="http://mallet.cs.umass.edu/download.php">download page</a> and download the latest version of MALLET. At the time of writing this post, the latest version is 2.0.8.</p>

<h3 id="installation-on-windows">Installation on Windows</h3>

<p>Ideally, unzip MALLET into your <em>C:</em>. Your path to MALLET will then be something similar to <em>C:\mallet-2.0.8</em>. <strong>This directory is referred to as the MALLET directory here onwards.</strong> Now you will be able to access MALLET from anywhere on the command prompt using <em>C:\mallet-2.0.8\bin\mallet</em>. To avoid typing the full path every time, we can set up an environment variable. To do so, go to <em>Start Menu &gt; Control Panel &gt; System &gt; Advanced System Settings &gt; Environment Variables</em>. Under <em>User variables</em> section, select <em>PATH</em> and click <em>Edit</em>. Go to the end of the text, type <em>;</em> followed by <em>C:\mallet-2.0.8\bin\</em> and click <em>Save</em>. Now you will be able to access MALLET with just the <em>mallet</em> command. To verify it is working, type the following on the command prompt.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; mallet --help
</code></pre></div></div>

<p>You should see a list of MALLET commands.</p>

<p><em><strong>Note</strong>: Windows uses backslash (<code class="language-plaintext highlighter-rouge">\\</code>) as a directory separator while *nix systems use forward slash (<code class="language-plaintext highlighter-rouge">/</code>). Examples in this post were run on a *nix system (macOS). Hence, forward slash has been used as a directory separator. You should remember to change them to a backslash while running them on Windows Command Prompt.</em></p>

<h3 id="installation-on-nix-linux-freebsd-mac-os-x">Installation on *nix (Linux, FreeBSD, Mac OS X)</h3>

<p>Unzip MALLET. Typically, you would unzip to paths like <em>/usr/local/bin</em> or <em>/opt</em>. For this post, I have unzipped to <em>/usr/local/opt/mallet-2.0.8</em>. <strong>This path is referred to as the MALLET directory here onwards.</strong> To avoid typing the full path every time, we can set up a path variable. To do so, open <em>~/.bashrc</em> or <em>~/.bash_profile</em> (for <em>bash</em> shell) depending upon your distribution and add the following line.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export PATH=$PATH:/usr/local/opt/mallet-2.0.8/bin
</code></pre></div></div>

<p>To put the changes into effect, type the following in your shell:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ . ~/[.bashrc | .bash_profile]
</code></pre></div></div>

<p>You can now access MALLET from anywhere. To verify that it works type:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet --help
</code></pre></div></div>

<p>It should list all the MALLET commands.</p>

<h2 id="working-with-mallet">Working with MALLET</h2>

<p>Topic modeling with MALLET is all about three simple steps:</p>

<ol>
  <li>Import data (documents) into MALLET format</li>
  <li>Train your model using the imported data</li>
  <li>Use the trained model to infer the topic composition of a new document</li>
</ol>

<p>In this tutorial, we will use the sample data that comes pre-packaged with MALLET. The sample data is found in <em>sample-data</em> directory inside MALLET directory. Before proceeding further, change your current directory to MALLET directory by typing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd [Your MALLET directory]
</code></pre></div></div>

<p><em>Note: <strong>tree</strong> command may not be available by default in your system and you might have to install it manually.</em></p>

<h2 id="importing-data">Importing Data</h2>

<p>There are two methods of importing data into MALLET format.</p>

<h3 id="importing-directories">Importing directories</h3>

<p>You would import a directory if the source data consists of many separate files. In this case, each file is considered as one instance. The following command imports all files from a directory <em>sample-data/web/en</em> and converts to single MALLET file named <em>train.mallet</em> in your current directory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet import-dir \
--input sample-data/web/en/ \
--output train.mallet \
--remove-stopwords TRUE \
--keep-sequence TRUE
</code></pre></div></div>

<p>Here, options except <em>input</em> and <em>output</em> are optional. You can also pass more than one directory; directory names should be separated by spaces.</p>

<p><em>remove-stopwords TRUE</em> removes words such as <em>a</em>, <em>an</em>, <em>the</em>, <em>if</em> and so on. By default, MALLET’s default English dictionary of <a href="https://en.wikipedia.org/wiki/Stop_words">stop words</a> is used. If you wish to supply your own list of stopwords, which you would customize for your application, you can do so by passing the file name to the <em>stoplist-file</em> option. The stoplist contains stop words separated by space, a tab character, or a line break.</p>

<p>The MALLET toolkit requires <em>keep-sequence</em> option set to <em>TRUE</em> for topic modeling.</p>

<p>To see more options type</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet import-dir --help
</code></pre></div></div>

<p>In this tutorial, we are using this method.</p>

<h3 id="importing-a-file">Importing a file</h3>

<p>You’d use this method if all of your data is in a single file, one instance per line, in the following format:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[instance_name] [label] [text without line breaks]
</code></pre></div></div>

<p><em>instance_name</em> uniquely identifies each instance. For topic modeling, <em>instance_name</em> and <em>label</em> can be the same.</p>

<p>You’d type the following command.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet import-file \
--input [file_name] \
--output train.mallet \
</code></pre></div></div>

<p>All the options that apply to <em>import-dir</em> also apply to <em>import-file</em>.</p>

<p><em><strong>Note</strong>: If you are importing extremely large file or file collections, you might get ‘Exception in thread “main” java.lang.OutOfMemoryError: Java heap space’ error. If you encounter this error, you have run into your memory limit which is 1 GB by default. To update the limit, open the file named mallet (or mallet.bat in case of Windows) in ‘bin’ directory inside mallet directory with a text editor, find the line ‘MEMORY=1g’ and update the value ‘1g’ to higher values like ‘2g’, ‘4g’ or higher depending on your system’s RAM.</em></p>

<h2 id="training-the-model">Training the model</h2>

<p>After you have imported documents into MALLET format, you need to build a topic model. The following command takes the file <em>train.mallet</em> which we created in the previous section, creates 5 topics <em>(topics.txt)</em> and calculates the topic proportion for each instance <em>(topic-composition.txt)</em>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet train-topics \
--input train.mallet \
--inferencer-filename inferencer.mallet \
--num-topics 5 \
--output-topic-keys topics.txt \
--output-doc-topics topic-composition.txt
</code></pre></div></div>

<p>If you open <em>topics.txt</em>, you will see 5 lines. In each line, the first number is the topic number, the second number is the indication of the <em>weight</em> of that topic and the words following them are the most frequently occurring words that fall into that topic.</p>

<p><em>topic-composition.txt</em> file lists the composition of each instance or document under the topics listed in <em>topic.txt</em>. In each line, the first value is the instance number, the second value is an instance or document name and the numbers following are the weight of corresponding topics in <em>topics.txt</em>.</p>

<p>To see more options, type</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet train-topics --help
</code></pre></div></div>

<h3 id="deciding-the-number-of-topics">Deciding the number of topics</h3>

<p>There is no <em>natural</em> number of topics. To find a suitable number of topics, we have to run <em>train-topics</em> with a varying number of topics and see how the topic composition breaks down. If the majority of the words group to a very narrow number of topics, we need to increase the number of topics. On the other hand, if related words fall under different topics, the setting is too broad and we need to narrow it down by reducing the number of topics.</p>

<h2 id="inferring-topic-composition-of-new-documents">Inferring topic composition of new documents</h2>

<p>To infer the topic composition of new documents, you first need to import the new documents into MALLET format similar to what we did in the first section.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet [import-dir | import-file] \
--input [directory_name | file_name] \
--output new.mallet \
--remove-stopwords TRUE \
--keep-sequence TRUE \
--use-pipe-from train.mallet
</code></pre></div></div>

<p>Notice the <em>use-pipe-from</em> option though. <strong>It is very important that you include this option at this stage.</strong> This option is used to make sure that the new data is compatible with our training data, i.e. new data and training data have the same alphabet mappings.</p>

<p>Finally, the following command infers the topic composition of the new documents and stores it in <em>new-topic-composition.txt</em>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet infer-topics \
--input new.mallet \
--inferencer inferencer.mallet \
--output-doc-topics new-topic-composition.txt
</code></pre></div></div>

<p>To see more options type</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mallet infer-topics --help
</code></pre></div></div>

<p>This will infer the topic composition of new documents and save it to <em>new-topic-composition.txt</em>.</p>

<p>Please leave your comments or any query you have in the comment section below. I will be happy to help.</p>]]></content><author><name>Dilip Raj Baral</name></author><category term="machine-learning" /><category term="nlp" /><category term="topic-modeling" /><category term="lda" /><category term="mallet" /><category term="topic-modeling" /><category term="nlp" /><category term="machine-learning" /><summary type="html"><![CDATA[Intro to MALLET and a practical guide to building topic models from text.]]></summary></entry><entry><title type="html">Tendencies-based collaborative filtering algorithm</title><link href="https://diliprajbaral.com/blog/tendencies-based-collaborative-filtering-algorithm-recommender-system/" rel="alternate" type="text/html" title="Tendencies-based collaborative filtering algorithm" /><published>2016-12-06T00:00:00+00:00</published><updated>2016-12-06T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/tendencies-based-collaborative-filtering-algorithm-recommender-system</id><content type="html" xml:base="https://diliprajbaral.com/blog/tendencies-based-collaborative-filtering-algorithm-recommender-system/"><![CDATA[<p>As part of my academic research project titled <em>Impact of Recommender System</em>, I got to study various collaborative filtering algorithms. I was supposed to study, implement, and compare them. Tendencies-based was the best among them in terms of accuracy and computational efficiency. It was proposed by Fidel Cacheda and his team of researchers from University of A Coruna in their paper titled <strong><em>Comparison of Collaborative Filtering Algorithms: Limitations of Current Techniques and Proposals for Scalable, High-Performance Recommender Systems.</em></strong> It was as accurate as other collaborative filtering algorithms like item-based, similarity fusion, and others, if not more accurate than them. It was the most computationally efficient.</p>

<h2 id="algorithm">Algorithm</h2>
<p>Tendencies-based algorithm, instead of looking for relations between users or items, looks at the differences between them.</p>

<p>Often, users with similar opinions rate items in a different way: some users mostly give positive ratings and rate really bad items negative while others usually rate negative and give positive ratings to the best items only. This algorithm deals with these variations using the concept of user tendency and item tendency.</p>

<h3 id="notation">Notation</h3>
<p>$r_{ui}$ denotes the rating given by user <em>u</em> to item <em>i</em>. $\hat{r}_{ui}$ denotes the prediction made by the algorithm for the rating of item <em>i</em> by user <em>u</em>. $\mu_u$ denotes user mean rating and $\mu_i$ denotes item mean rating. $I_u$ is the set of items rated by user <em>u</em>, and $U_i$ is the set of users who rated item <em>i</em>.</p>

<h3 id="tendency-calculation">Tendency Calculation</h3>
<p><strong>Tendency of a user ($\tau_u$)</strong> tells if a user tends to rate items positively. It is defined as the average difference between their ratings and the item mean.</p>

\[\tau_u = \frac{1}{|I_u|} \sum_{i \in I_u} (r_{ui} - \mu_i)\]

<p><strong>Tendency of an item ($\tau_i$)</strong> refers to whether users consider it an especially good or especially bad item.</p>

\[\tau_i = \frac{1}{|U_i|} \sum_{u \in U_i} (r_{ui} - \mu_u)\]

<h3 id="prediction-calculation">Prediction Calculation</h3>
<p>The algorithm defines four cases based on the signs of the user and item tendencies.</p>

<p>If both the user and the item have a positive tendency:</p>

\[\hat{r}_{ui} = \max(\mu_u + \tau_i, \mu_i + \tau_u)\]

<p>If both the user and the item have a negative tendency:</p>

\[\hat{r}_{ui} = \min(\mu_u + \tau_i, \mu_i + \tau_u)\]

<p>If the user has a negative tendency but the item has a positive tendency:</p>

\[\hat{r}_{ui} = \mu_u + \mu_i + \beta(\tau_u + \tau_i)\]

<p>If the user has a positive tendency but the item has a negative tendency:</p>

\[\hat{r}_{ui} = \mu_u + \mu_i + \beta(\tau_u + \tau_i)\]

<p>Here, $\beta$ is a parameter that controls the contribution of the item and user mean.</p>

<p>As observed, a simple formula is used in the four cases and the calculation is highly efficient: training time complexity is <strong><em>O(mn)</em></strong> and rating can be predicted in <strong><em>O(1)</em></strong> time.</p>

<p><a href="https://github.com/rajbdilip/tendencies-based-recommender-system">Implementation code can be downloaded from my GitHub repository.</a></p>]]></content><author><name>Dilip Raj Baral</name></author><category term="machine-learning" /><category term="recommender-systems" /><category term="algorithms" /><category term="collaborative-filtering" /><category term="recommender" /><category term="machine-learning" /><category term="algorithm" /><summary type="html"><![CDATA[Notes on tendencies-based collaborative filtering and why it is efficient and accurate.]]></summary></entry><entry><title type="html">Text search using Stochastic Diffusion Search</title><link href="https://diliprajbaral.com/blog/text-search-using-stochastic-diffusion-search/" rel="alternate" type="text/html" title="Text search using Stochastic Diffusion Search" /><published>2016-11-06T00:00:00+00:00</published><updated>2016-11-06T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/text-search-using-stochastic-diffusion-search</id><content type="html" xml:base="https://diliprajbaral.com/blog/text-search-using-stochastic-diffusion-search/"><![CDATA[<p>Stochastic Diffusion Search (SDS), a multi-agent population-based global search and optimization algorithm, is a distributed mode of computation utilizing interaction between simple agents. SDS shows off a strong mathematical framework. It is robust, has minimal convergence criteria and linear time complexity.</p>

<p>SDS has been applied to diverse problems such as text search, object recognition, feature tracking, mobile robot self-localization and site selection for wireless networks. As a part of my <em>Optimization Techniques</em> laboratory project, I implemented a text search using SDS.</p>

<h2 id="basic-sds-algorithm">Basic SDS Algorithm</h2>

<p>SDS algorithm has many variations. The following is the basic version.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. For all agents do
2. INITIALIZE: Agent picks a random hypothesis 
3. TEST: Agent partially evaluates her hypothesis 
- If test criterion = TRUE, agent = Active (satisfied) 
- Else agent = Inactive (dissatisfied) 
4. DIFFUSE
- Inactive agent meets a randomly chosen agent 
- Inactive agent updates/changes hypothesis 
5. REPEAT until Halting criterion.
</code></pre></div></div>

<h2 id="sds-text-search-algorithm">SDS Text Search Algorithm</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INITIALIZATION PHASE
Each agent selects a haystack offset (hypothesis) at random.
WHILE (NOT all agents are active)
TESTING PHASE
Each agent randomly selects an offset less than or equal to the length of needle (needle offset) and matches the character in haystack at (haystack offset + needle offset) with the character in needle at needle offset
IF the letters match
Agent is active.
ELSE
Agent is inactive.
DIFFUSION PHASE
Each inactive agent selects another agent at random.
IF the selected agent is active
The inactive agent adopts the hypothesis (index) of that agent.
ELSE
The inactive agent selects a new index (hypothesis) at random.
END WHILE
Each agent's haystack offset is the starting index of needle.
</code></pre></div></div>

<h2 id="implementation">Implementation</h2>

<p>The following link links to the Java implementation of above algorithm on GitHub.</p>

<p><a href="https://github.com/rajbdilip/stochastic-diffusion-search">https://github.com/rajbdilip/stochastic-diffusion-search</a></p>

<h2 id="observation">Observation</h2>

<p>Stochastic Diffusion Search text search is linear and fast. Agents initially search, finish communicating with each other, and hence finish searching in very few iterations, i.e., around 100 iterations with 5 agents in the haystack of length around 500. For smaller increments in the number of agents, the number of iterations required to converge to the solution decreases. But it cannot be ensured that big increments will significantly reduce the number of iterations. The number of iterations can rather increase.</p>

<p>However, if the needle is present in more than one offset, it cannot be guaranteed that this algorithm will find all of the occurrences regardless of the number of agents used. Since SDS is random, a different offset is returned every time.</p>

<h2 id="references">References</h2>

<p>al-Rifaie, Mohammad Majid, and John Mark Bishop. “Stochastic diffusion search review.” <em>Paladyn, Journal of Behavioral Robotics</em> 4.3 (2013): 155-173.</p>]]></content><author><name>Dilip Raj Baral</name></author><category term="algorithms" /><category term="search" /><category term="optimization" /><category term="stochastic-diffusion-search" /><category term="search" /><category term="optimization" /><category term="java" /><category term="algorithms" /><summary type="html"><![CDATA[Overview of Stochastic Diffusion Search and a simple algorithm outline for text search.]]></summary></entry><entry><title type="html">Social Media Integration into CiviCRM - CiviSocial</title><link href="https://diliprajbaral.com/blog/social-media-integration-into-civicrm/" rel="alternate" type="text/html" title="Social Media Integration into CiviCRM - CiviSocial" /><published>2016-10-01T00:00:00+00:00</published><updated>2016-10-01T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/social-media-integration-into-civicrm</id><content type="html" xml:base="https://diliprajbaral.com/blog/social-media-integration-into-civicrm/"><![CDATA[<p>So my proposal for <a href="https://developers.google.com/open-source/gsoc/">Google Summer of Code</a> 2016 was accepted and I was one of the 1206 Google student developers all around the globe. I got a chance to work on <a href="https://en.wikipedia.org/wiki/CiviCRM">CiviCRM</a> alongside its developers from all around the world. CiviCRM is a web-based, open source constituency relationship management specifically designed for the needs of non-profit, non-governmental, and advocacy groups, and serves as an association management system. Volunteers, activists, voters as well as more general sorts of business contacts such as employees, clients, or vendors can be managed using CiviCRM.</p>

<p>My project was titled “<a href="https://summerofcode.withgoogle.com/projects/#5737064465170432">Social Media Integration</a>” and the project aimed to boost the exposure of CiviCRM as a platform and make it even easier for people to connect. Specifically, I had to develop an extension to CiviCRM that would allow users to more easily fill forms and sign petitions using social login. It would also allow event registrations in CiviCRM to be reflected in RSVPs for parallel Facebook events. Moreover, it would allow CiviCRM admins to integrate multiple social networks and pull any relevant users activity data.</p>

<p>The coding began on May 22, 2016 and went through August 23, 2016. By the end of the program, most of the project goals were met with a few pending updates. Exact features of the extension can be found <a href="https://github.com/rajbdilip/org.civicrm.civisocial#civisocial---social-media-integration">here</a>. The extension is hosted on <a href="https://github.com/rajbdilip/org.civicrm.civisocial">GitHub</a>. The installation and configuration instructions can be found <a href="https://github.com/rajbdilip/org.civicrm.civisocial/blob/master/README.md">here</a>.</p>

<p>I will further work on the extension to add more features. Any code contributions or feature suggestions are welcome.</p>]]></content><author><name>Dilip Raj Baral</name></author><category term="open-source" /><category term="civicrm" /><category term="project" /><category term="gsoc" /><category term="civicrm" /><category term="social-media" /><category term="integration" /><category term="crm" /><summary type="html"><![CDATA[Summary of my Google Summer of Code project building a CiviCRM social media integration extension.]]></summary></entry><entry><title type="html">Facebook PHP SDK 4.0 - Re-asking declined permissions</title><link href="https://diliprajbaral.com/blog/facebook-php-sdk-4-0-re-asking-declined-permissions/" rel="alternate" type="text/html" title="Facebook PHP SDK 4.0 - Re-asking declined permissions" /><published>2014-07-16T00:00:00+00:00</published><updated>2014-07-16T00:00:00+00:00</updated><id>https://diliprajbaral.com/blog/facebook-php-sdk-4-0-re-asking-declined-permissions</id><content type="html" xml:base="https://diliprajbaral.com/blog/facebook-php-sdk-4-0-re-asking-declined-permissions/"><![CDATA[<p><strong>UPDATE:</strong></p>

<p>Facebook PHP SDK now uses the <em>getReRequestUrl()</em> method of the <em>FacebookRedirectLoginHelper</em> class to generate a URL to re-request denied permissions from a user.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public string getReRequestUrl(string $redirectUrl, array $scope = [], string $separator = '&amp;')
</code></pre></div></div>

<p>Read the documentation <a href="https://developers.facebook.com/docs/php/FacebookRedirectLoginHelper/5.0.0#get-re-request-url">here</a>.</p>

<hr />

<p>So I was testing Facebook Login integration on the website <a href="http://www.treasherlocked.com/">www.treasherlocked.com</a> that I have been developing for a while. Permissions like <em>email</em> and <em>user_location</em> were required by the web app. So, it was programmed to re-ask the denied permissions in the Login Dialog if a user denies any.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$facebook = new Facebook(APP_ID, APP_SECRET, REDIRECT_URI);
if ( $facebook-&gt;IsAuthenticated() ) {
// Verify if all of the scopes have been granted
if ( !$facebook-&gt;verifyScopes( unserialize(SCOPES) ) ) {
header( "Location: " . $facebook-&gt;getLoginURL( $facebook-&gt;denied_scopes) );
exit;
}
...
}
</code></pre></div></div>

<p><em>Note: $facebook is a custom class that I built and not a part of Facebook PHP SDK.</em></p>

<p>But it wasn’t showing the Login Dialog again when permissions were denied. Instead Facebook was redirecting to <em>Redirect URI</em>, creating a redirect loop. I even <a href="http://stackoverflow.com/questions/24716168/facebook-php-sdk-4-0-cannot-re-ask-read-permissions-once-denied">asked on Stack Overflow</a> but got no answer. As I was approaching the deadline, I couldn’t afford waiting much and kept on looking for solutions. After hours of googling, I landed on a <a href="https://developers.facebook.com/docs/facebook-login/login-flow-for-web/v2.0#re-asking-declined-permissions">Facebook Login API doc</a> page that actually addresses this issue. All that needs to be done is append a <em>rerequest = true</em> parameter to the login URL’s query string. But this feature was not yet implemented in Facebook PHP SDK 4.0. There was a <a href="https://github.com/facebook/facebook-php-sdk-v4/issues/146">proposal on GitHub</a> for this feature though. So I took the liberty of forking the project and made a small change in the <em>getLoginURL()</em> method’s prototype and definition in the FacebookRedirectLoginHelper class. The <em>getLoginURL()</em> prototype then looked like</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public function getLoginUrl($redirectUrl, $scope = array(), $rerequest = false, $version = null)
</code></pre></div></div>

<p>I sent a pull request to the project repo which was quickly merged. (<a href="https://www.sammyk.me/how-to-contribute-to-an-open-source-project-on-github">Read this article</a> if you want to know how to contribute to an open source project if you aren’t contributing already.)</p>

<p>So, if you need to re-ask declined permissions, all you have to do is pass <em>true</em> to the third parameter. The code on the callback script will look something like the following.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;?php
...
$helper = new FacebookRedirectLoginHelper();
if ($permissions_were_declined) {
header("Location: " . $helper-&gt;getLoginUrl( $redirect_uri, $declined_scopes, true );
exit;
}
...
?&gt;
</code></pre></div></div>]]></content><author><name>Dilip Raj Baral</name></author><category term="web-dev" /><category term="php" /><category term="facebook" /><category term="php" /><category term="facebook" /><category term="sdk" /><category term="permissions" /><category term="oauth" /><summary type="html"><![CDATA[How to re-request declined Facebook Login permissions with the PHP SDK 4.0 helper method.]]></summary></entry></feed>