NYCPHP Meetup

[nycphp-talk] Dynamically Add Links to Text

Petros Ziogas petros.ziogas at gmail.com
Fri Aug 28 15:11:02 EDT 2009


Hi Tedd,
That were some good ideas about the problems that I thought will arise.

What you forgeting is that you can't just decide what is more important and
what is not. I mean it might be that logically the short article (e.g.
America) should be linked and not the long one. If the articles are many and
have organic (sorry I can't think of a better word) titles then it will
produce weird results.

I still think that automatically cross linking articles that might have
small titles is a bad idea. Imagine imdb doing that and trying to crosslink
the movie "Z" and the movie "IT" inside it's editors articles and reviews.

I think that the best living example of this is the ad networks that link
words to advertisements inside web site content. They can make a text almost
unreadable when they insert the ad links.

Petros Ziogas
http://www.royalblue.gr


On Fri, Aug 28, 2009 at 6:05 PM, Chuck Reeves <chuck.reeves at gmail.com>wrote:

> Another little add on to these solutions would be to include some kind of
> counter that will prevent the bot form linking more then X times.  This way
> when the article loads, it will not be just one big stream of links.
>
> Thank You
> Chuck Reeves
> Cell: 631-374-0772
> Email: chuck.reeves at gmail.com
>
>
> On Fri, Aug 28, 2009 at 10:51 AM, tedd <tedd at sperling.com> wrote:
>
>> At 1:00 PM +0300 8/28/09, Petros Ziogas wrote:
>>
>>> I would just like to mention a point of failure in that automated
>>> proccess. I had to deal with this in a previous project so it's quite fresh.
>>>
>>> What will happen if:
>>>
>>
>> Problem 1
>>
>>  There are 3 articles. Article A is titled "History of America". Article B
>>> is titled "Glorious History of America". In article C there is this text
>>> "The book is talking about the glorious history of America". If you run an
>>> automated proccess and the test for article A comes first then the text will
>>> be  "The book is talking about the glorious <a href="/id1111/">history of
>>> America</a>" and the next test will fail.
>>>
>>> If you run a test for article B first the text will become "The book is
>>> talking about the <a href="/id2222/">glorious history of America</a>". Then
>>> if you test for article A it might end up being "The book is talking about
>>> the <a href="/id2222/">glorious <a href="/id1111/">history of
>>> America</a></a>"
>>>
>>> The possibilities of such procedured practically ruining your content are
>>> endless. If you want to dive into tag nesting and html validation you will
>>> be opening another whole.
>>>
>>
>> Problem 2
>>
>>  Also what will happen if an editor want to insert this "I loved the book
>>> <a href="LINKTOAMAZON">George Washington and the Glorious history of
>>> America</a>." and there are articles with titles using "George Washington",
>>> "Glorious history", "History of America", "America"?
>>>
>>> I think you get my point...
>>>
>>
>> Petros:
>>
>> Yes,  I see your point and the two problems you raise (good concerns).
>>
>> Problem 1
>>
>> My initial solution would solve the first problem *provided* that the
>> titles were unique and not contained within another title, right? So why not
>> start with the longest title and search/replace downwards?
>>
>> For example, "Glorious History of America" is searched, found, and made a
>> link. Then "History of America" is searched -- however -- the search
>> excludes links! The phrase "History of America" in "Glorious History of
>> America" would never be considered because it's within a link.
>>
>> The process would continue until you run out of titles -- simple, right?
>>
>> Problem 2
>>
>> The second problem can be solved two ways:
>>
>> Way one -- by removing all organic links from the initial search. In other
>> words, when the FULL TEXT search is started the search is done on articles
>> absent of all organic links. You can easily add the organic links back-in
>> after the search/replace is finished.
>>
>> Please note when the automated links are added, they also have an unique
>> class attribute, such as class="autotag", which will allow them to be easily
>> identified and removed for a rebuild.
>>
>> Way two -- you could solve the problem by excluding organic links from the
>> search because they DO NOT have the unique class attribute identifier --
>> thus no real reason to remove them at all for the search/replace routine
>> (i.e., Way 1). I only presented "Way 1" to get you to think in terms of
>> removing the organic links from the problem.
>>
>> Possible problem
>>
>> The only fly in the ointment here would be if an editor wants to manually
>> link an article by trying to mimic the automated process. For example,
>> he/she inserts a "<a href="/id1111/">History of America</a>" using the
>> *index* of the article. Everything would still work unless that article is
>> deleted. In such case the link would become dead.
>>
>> However, if the editor simply added the class identifier tag (i.e.,
>> class="autotag") to the link, then the automated process would treat his
>> entry like it's own and adjust accordingly.
>>
>> If the editors simply followed the rules, which aren't complicated, then
>> editors could participate as they want in the process.
>>
>> The solution presented here doesn't require tag nesting or html
>> validation. As such, I don't see any additional problems -- do you?
>>
>> Cheers,
>>
>> tedd
>>
>>
>> --
>> -------
>> http://sperling.com  http://ancientstones.com  http://earthstones.com
>> _______________________________________________
>> New York PHP User Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/show_participation.php
>>
>
>
> _______________________________________________
> New York PHP User Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/show_participation.php
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20090828/0e23bccd/attachment.html>


More information about the talk mailing list