AI Essay Writer

AI Essay Writer — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • WaveNet

    WaveNet

    WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music. == History == Generating speech from text is an increasingly common task thanks to the popularity of software such as Apple's Siri, Microsoft's Cortana, Amazon Alexa and the Google Assistant. Most such systems use a variation of a technique that involves concatenated sound fragments together to form recognisable sounds and words. The most common of these is called concatenative TTS. It consists of large library of speech fragments, recorded from a single speaker that are then concatenated to produce complete words and sounds. The result sounds unnatural, with an odd cadence and tone. The reliance on a recorded library also makes it difficult to modify or change the voice. Another technique, known as parametric TTS, uses mathematical models to recreate sounds that are then assembled into words and sentences. The information required to generate the sounds is stored in the parameters of the model. The characteristics of the output speech are controlled via the inputs to the model, while the speech is typically created using a voice synthesiser known as a vocoder. This can also result in unnatural sounding audio. == Design and ongoing research == === Background === WaveNet is a type of feedforward neural network known as a deep convolutional neural network (CNN). In WaveNet, the CNN takes a raw signal as an input and synthesises an output one sample at a time. It does so by sampling from a softmax (i.e. categorical) distribution of a signal value that is encoded using μ-law companding transformation and quantized to 256 possible values. === Initial concept and results === According to the original September 2016 DeepMind research paper WaveNet: A Generative Model for Raw Audio, the network was fed real waveforms of speech in English and Mandarin. As these pass through the network, it learns a set of rules to describe how the audio waveform evolves over time. The trained network can then be used to create new speech-like waveforms at 16,000 samples per second. These waveforms include realistic breaths and lip smacks – but do not conform to any language. WaveNet is able to accurately model different voices, with the accent and tone of the input correlating with the output. For example, if it is trained with German, it produces German speech. The capability also means that if the WaveNet is fed other inputs – such as music – its output will be musical. At the time of its release, DeepMind showed that WaveNet could produce waveforms that sound like classical music. === Content (voice) swapping === According to the June 2018 paper Disentangled Sequential Autoencoder, DeepMind has successfully used WaveNet for audio and voice "content swapping": the network can swap the voice on an audio recording for another, pre-existing voice while maintaining the text and other features from the original recording. "We also experiment on audio sequence data. Our disentangled representation allows us to convert speaker identities into each other while conditioning on the content of the speech." (p. 5) "For audio, this allows us to convert a male speaker into a female speaker and vice versa [...]." (p. 1) According to the paper, a two-digit minimum amount of hours (c. 50 hours) of pre-existing speech recordings of both source and target voice are required to be fed into WaveNet for the program to learn their individual features before it is able to perform the conversion from one voice to another at a satisfying quality. The authors stress that "[a]n advantage of the model is that it separates dynamical from static features [...]." (p. 8), i. e. WaveNet is capable of distinguishing between the spoken text and modes of delivery (modulation, speed, pitch, mood, etc.) to maintain during the conversion from one voice to another on the one hand, and the basic features of both source and target voices that it is required to swap on the other. The January 2019 follow-up paper Unsupervised speech representation learning using WaveNet autoencoders details a method to successfully enhance the proper automatic recognition and discrimination between dynamical and static features for "content swapping", notably including swapping voices on existing audio recordings, in order to make it more reliable. Another follow-up paper, Sample Efficient Adaptive Text-to-Speech, dated September 2018 (latest revision January 2019), states that DeepMind has successfully reduced the minimum amount of real-life recordings required to sample an existing voice via WaveNet to "merely a few minutes of audio data" while maintaining high-quality results. Its ability to clone voices has raised ethical concerns about WaveNet's ability to mimic the voices of living and dead persons. According to a 2016 BBC article, companies working on similar voice-cloning technologies (such as Adobe Voco) intend to insert watermarking inaudible to humans to prevent counterfeiting, while maintaining that voice cloning satisfying, for instance, the needs of entertainment-industry purposes would be of a far lower complexity and use different methods than required to fool forensic evidencing methods and electronic ID devices, so that natural voices and voices cloned for entertainment-industry purposes could still be easily told apart by technological analysis. == Applications == At the time of its release, DeepMind said that WaveNet required too much computational processing power to be used in real world applications. As of October 2017, Google announced a 1,000-fold performance improvement along with better voice quality. WaveNet was then used to generate Google Assistant voices for US English and Japanese across all Google platforms. In November 2017, DeepMind researchers released a research paper detailing a proposed method of "generating high-fidelity speech samples at more than 20 times faster than real-time", called "Probability Density Distillation". At the annual I/O developer conference in May 2018, it was announced that new Google Assistant voices were available and made possible by WaveNet; WaveNet greatly reduced the number of audio recordings that were required to create a voice model by modeling the raw audio of the voice actor samples.

    Read more →
  • Death of Elaine Herzberg

    Death of Elaine Herzberg

    The death of Elaine Herzberg (August 2, 1968 – March 18, 2018) was the first recorded case of a pedestrian fatality involving a self-driving car, after a collision that occurred late in the evening of March 18, 2018. Herzberg was pushing a bicycle across a four-lane road in Tempe, Arizona, United States, when she was struck by an Uber test vehicle, which was operating in self-drive mode with a human safety backup driver sitting in the driving seat. Herzberg was taken to the local hospital where she died of her injuries. Following the fatal incident, the National Transportation Safety Board (NTSB) issued a series of recommendations and sharply criticized Uber. The company suspended testing of self-driving vehicles in Arizona, where such testing had been approved since August 2016. Uber chose not to renew its permit for testing self-driving vehicles in California when it expired at the end of March 2018. Uber resumed testing in December 2018, starting in Pittsburgh, Pennsylvania. In March 2019, Arizona prosecutors ruled that Uber was not criminally responsible for the crash. The back-up driver of the vehicle was charged with negligent homicide, pled guilty to endangerment, and was sentenced to three years' probation. While Herzberg was the first pedestrian killed by a self-driving car, driver Gao Yaning died in a Tesla semi-autonomous car two years earlier. A reporter for The Washington Post compared Herzberg's fate with that of Bridget Driscoll who, in the United Kingdom in 1896, was the first pedestrian to be killed by an automobile. The Arizona incident has magnified the importance of collision avoidance systems for self-driving vehicles. == Collision summary == Herzberg was crossing Mill Avenue (North) from west to east, approximately 360 feet (110 m) south of the intersection with Curry Road, outside the designated pedestrian crosswalk, close to the Red Mountain Freeway. She was pushing a bicycle laden with shopping bags, and had crossed at least two lanes of traffic when she was struck at approximately 9:58 pm MST (UTC−07:00) by a prototype Uber self-driving car based on a Volvo XC90, which was traveling north on Mill. The vehicle had been operating in autonomous mode since 9:39 pm, nineteen minutes before it struck and killed Herzberg. The car's human safety backup driver, Rafaela Vasquez, did not intervene in time to prevent the collision. Vehicle telemetry obtained after the crash showed that the human operator responded by moving the steering wheel less than a second before impact, and she engaged the brakes less than a second after impact. == Cause investigation == The county district attorney's office recused itself from the investigation, due to a prior joint partnership with Uber promoting their services as an alternative to driving under the influence of alcohol. Accounts differ on the speed limit at the place of the incident. According to Tempe police the car was traveling in a 35 mph (56 km/h) zone, but this is contradicted by a posted speed limit of 45 mph (72 km/h). The National Transportation Safety Board (NTSB) sent a team of federal investigators to gather data from vehicle instruments, and to examine vehicle condition along with the actions taken by the safety driver. Their preliminary findings were substantiated by multiple event data recorders and proved the vehicle was traveling 43 miles per hour (69 km/h) when Herzberg was first detected 6 seconds (378 feet (115 m)) before impact; during 4.7 seconds the self driving system did not infer that emergency braking was needed. A vehicle traveling 43 mph (69 km/h) can generally stop within 89 feet (27 m) once the brakes are applied. The machine needed to be 1.3 seconds (82 feet (25 m)) away prior to discerning that emergency braking was required, whereas at least that much distance was required to stop. The system failed to behave properly. A total stopping distance of 76 feet itself would imply a safe speed under 25 mph (40 km/h). Human intervention was still legally required. Computer perception–reaction time would have been a speed limiting factor had the technology been superior to humans in ambiguous situations; however, the nascent computerized braking technology was disabled the day of the crash, and the machine's apparent 4.7-second perception–reaction (alarm) time allowed the car to travel 250 feet (76 m). Video released by the police on March 21 showed the safety driver was not watching the road moments before the vehicle struck Herzberg. === Environment === In widely disseminated remarks that would shape the narrative about the crash, which were later seen as prejudicial and subsequently contradicted by her own department, Tempe Police Chief Sylvia Moir was quoted stating that the collision was "unavoidable" based on the initial police investigation, which included a review of the video captured by an onboard camera. Moir faulted Herzberg for crossing the road in an unsafe manner: "It is dangerous to cross roadways in the evening hour when well-illuminated, managed crosswalks are available." According to Uber, safety drivers were trained to keep their hands very close to the wheel all the time while driving the vehicle so they were ready to quickly take control if necessary. The driver said it was like a flash, the person walked out in front of them. His [sic] first alert to the collision was the sound of the collision. [...] it's very clear it would have been difficult to avoid this collision in any kind of mode (autonomous or human-driven) based on how she came from the shadows right into the roadway. Tempe police released video on March 21, 2018, showing footage recorded by two onboard cameras: one forward-looking, and one capturing the safety driver's actions. The forward-facing video shows that the self-driving car was traveling in the far right lane when it struck Herzberg. The driver-facing video shows the safety driver was looking down prior to the collision. The Uber operator is responsible for intervening and taking manual control when necessary as well as for monitoring diagnostic messages, which are displayed on a screen in the center console. In an interview conducted after the crash with NTSB, the driver stated she was monitoring the center stack at the time of the collision. After the Uber video was released, journalist Carolyn Said noted the police explanation of Herzberg's path meant she had already crossed two lanes of traffic before she was struck by the autonomous vehicle. The Marquee Theatre and Tempe Town Lake are west of Mill Avenue, and pedestrians commonly cross mid-street without detouring north to the crosswalk at Curry. According to reporting by the Phoenix New Times, Mill Avenue contains what appears to be a brick-paved path in the median between the northbound and southbound lanes; however, posted signs prohibit pedestrians from crossing in that location. When the second of the Mill Avenue bridges over the town lake was added in 1994 for northbound traffic, the X-shaped crossover in the median was installed to accommodate the potential closing of one of the two road bridges. The purpose of this brick-paved structure is purely to divert cars from one side to the other if a bridge is closed to traffic, and although it may look like a crosswalk for pedestrians, it is in fact a temporary roadway with vertical curbs and warning signs. === Software issues === Michael Ramsey, a self-driving car expert with Gartner, characterized the video as showing "a complete failure of the system to recognize an obviously seen person who is visible for quite some distance in the frame. Uber has some serious explaining to do about why this person wasn't seen and why the system didn't engage." The NTSB preliminary report, however, noted that the software did order the car to brake 1.3 seconds before the collision. A video shot from the vehicle's dashboard camera showed the safety driver looking down, away from the road. It also appeared that the driver's hands were not hovering above the steering wheel, which is what drivers are instructed to do so they can quickly retake control of the car. Uber had moved from two employees in every car to one. The paired employees had been splitting duties: one ready to take over if the autonomous system failed, and another to keep an eye on what the computers were detecting. The second person was responsible for keeping track of system performance as well as labeling data on a laptop computer. Mr. Kallman, the Uber spokesman, said the second person was in the car for purely data related tasks, not safety. When Uber moved to a single operator, some employees expressed safety concerns to managers, according to the two people familiar with Uber's operations. They were worried that going solo would make it harder to remain alert during hours of monotonous driving. The recorded telemetry showed the system had detected Herzberg six seconds before the crash, and classified her first as an unknown object, then as a

    Read more →
  • The Murderbot Diaries

    The Murderbot Diaries

    The Murderbot Diaries is a science fiction series by American author Martha Wells, published by Tor Books. The series is told from the perspective of the titular cyborg guard, a "SecUnit" owned by a futuristic megacorporation. SecUnits include "governor" modules that control and punish the constructs if they take any actions not approved by the company. The ironically self-named "Murderbot" hacked and disabled the module but pretends to be a normal SecUnit, staving off the boredom of security work by watching media. As it spends more time with a series of caring entities (both humans and artificial intelligences), it develops genuine friendships and emotional connections, which it finds inconvenient. The TV series Murderbot is based on the novels by Martha Wells. == Books == === Setting === In an advanced largely hyper-capitalist space-faring society, travel between star systems is routine due to now-stable wormhole technology. Initially, wormhole travel was unreliable, but has since improved to the point where "lost" colonies are being found. People reside on planets, some of which have been terraformed, or on space habitats which have full life support and artificial gravity. Most people who can afford it have technology that allows them to tap into ubiquitous data feeds supplying all kinds of information, including entertainment. This technology can be worn, or be implanted into the body. Sentient and semi-sentient artificial intelligences perform tasks such as operating starships, mining, controlling habitats, moving cargo, waging corporate warfare, providing physical pleasure and comfort, or security. Most of these purposes are fulfilled by "bots" of varying complexity and intelligence, but the last three are respectively performed by CombatUnits, ComfortUnits, and SecUnits. The characters and narrator of the book call these conscious entities "constructs", but they are functionally cyborgs (cybernetic organisms): part machine, part organic. A significant distinction, however, is that they are manufactured entities, not born and later modified. The Corporation Rim is a profit-oriented, cutthroat part of this society that indulges in espionage, assassination, indentured slavery, and ruthless exploitation of resources. One particular target of the corporations is illegal "alien remnant" exploitation. These remnants are often extremely dangerous to people and machines. The laws are enforced by other corporations. Outside the Corporation Rim are colonies, such as Preservation, that have established their right to exist under various laws that, at least for the time being, the corporations are unwilling to test. Wells noted in 2017 that All Systems Red, Artificial Condition, Rogue Protocol, and Exit Strategy "have an overarching story, with the fourth one bringing the arc to a conclusion". === Story chronology === "Compulsory" All Systems Red Artificial Condition Rogue Protocol Exit Strategy "Rapport" "Home" Fugitive Telemetry Network Effect System Collapse Platform Decay === All Systems Red (2017) === A scientific expedition on an alien planet goes awry when one of its members is attacked by a giant native creature. She is saved by the expedition's SecUnit (Security Unit), a security construct with a mixture of robot and human features. The SecUnit has secretly hacked the governor module allowing it to be controlled by humans and has named itself Murderbot, as it is heavily armed and designed for combat. However, it prefers to spend its time watching space operas and is uncomfortable interacting with humans. The SecUnit has a vested interest in keeping its human clients safe and alive, since it wants to avoid discovery of its autonomy and has an especially grisly expedition on its record. Murderbot soon discovers information regarding hazardous fauna has been deleted from their survey packet of the planet. Further investigation reveals some sections on their maps are missing as well. Meanwhile, the PreservationAux survey team, led by Dr. Mensah, navigate their mixed feelings about the part machine, part human nature of their SecUnit. As members of an egalitarian, independent planet outside of the Corporation Rim, the survey team struggles with the system of indentured servitude (and in many cases de facto slavery) the rim operates under. When they lose contact with the only other known expedition on the planet, the DeltFall Group, Mensah leads a team to the opposite side of the planet to investigate. At the DeltFall habitat, Murderbot discovers everyone there has been brutally murdered, and one of their three SecUnits has been destroyed. Murderbot disables the remaining two as they attack it but is surprised when two additional SecUnits appear. Murderbot destroys one, and Mensah takes the other. During these encounters, Murderbot is seriously injured. It also realizes one of the rogue SecUnits has installed a combat override module into its neck. The Preservation scientists are able to remove it before it completes the data upload which would put Murderbot under the control of whoever has command over the other SecUnits. The team discovers Murderbot is autonomous, and had once malfunctioned and murdered 57 people. The Preservation scientists mostly agree, based on its protective behavior thus far, the SecUnit can be trusted. Remembering small incidents which appear to be attempted sabotage, Murderbot and the group determine there must be a third expedition on the planet, whose members are trying to eliminate DeltFall and Preservation for some reason. The Preservation scientists confirm their HubSystem has been hacked. They flee their habitat before the mystery expedition they have dubbed EvilSurvey comes to kill them. The EvilSurvey team—GrayCris—leaves a message in the Preservation habitat inviting its scientists to meet at a rendezvous point to negotiate terms for their survival. Murderbot knows GrayCris will never let them live, so the SecUnit formulates a plan. It makes an overture to GrayCris to negotiate for its own freedom, but this is a distraction while the Preservation scientists access the GrayCris HubSystem to activate their emergency beacon. The plan works, but Murderbot is injured protecting Mensah from the explosion of the launch. Later, the SecUnit finds itself repaired retaining its memories and disabled governor module. Mensah has bought its contract, and she plans to bring it back to Preservation's home base where it can legally live autonomously. Though grateful, Murderbot is reluctant to have its decisions made for it, and it slips away on a cargo ship. === Artificial Condition (2018) === Murderbot makes deals with bots piloting unmanned cargo ships to travel toward the mining facility where it once malfunctioned—resulting in the death of 57 people. It hopes to learn more about the initial incident in which it went rogue, of which it has little memory. Murderbot boards the final ship and discovers the bot pilot is an unexpectedly powerful, intrusive artificial intelligence. They come to a tentative truce and watch media together during the final leg of the journey to RaviHyral, the station where the incident occurred. Murderbot learns the ship is a deep-space research vessel assigned to cargo runs during downtime, which explains why the bot pilot is so sophisticated. Murderbot reluctantly allows this artificial intelligence—which it has dubbed ART (Asshole Research Transport) due to its sarcastic personality—to make physical modifications to the SecUnit's body to allow it to pass for an augmented human, and to disconnect the data port at the back of its neck which had been used to insert a combat override module in the previous book. To gain access to the RaviHyral facility, Murderbot takes a contract as a security consultant for three scientists who are meeting with their former employer, the head and namesake of Tlacey Excavations, to negotiate the return of their research, which they believe was illegally seized by the company. Their transport craft is sabotaged, but with ART's help, Murderbot is able to land it safely. Now aware Tlacey is actively trying to kill the scientists rather than comply with their demands, Murderbot guides them through their meeting with Tlacey and thwarts another assassination attempt. Murderbot returns to the site of the massacre and learns it was the result of another mining operation's sabotage attempt using malware, which made all of the facility's SecUnits go berserk. The facility's ComfortUnits—weaponless, anatomically correct constructs sometimes disparagingly called "sexbots"—died attempting to stop the massacre. Tlacey's ComfortUnit voices its desire for freedom and willingness to help Murderbot thwart Tlacey. While the SecUnit meets with a Tlacey employee to secretly retrieve a copy of the research, Tlacey abducts one of the scientists, Tapan. Murderbot goes after her, accepting a combat override module intended to control the SecUnit but actually has no effect, due

    Read more →
  • Belief–desire–intention software model

    Belief–desire–intention software model

    The belief–desire–intention software model (BDI) is a software model developed for programming intelligent agents. Superficially characterized by the implementation of an agent's beliefs, desires and intentions, it actually uses these concepts to solve a particular problem in agent programming. In essence, it provides a mechanism for separating the activity of selecting a plan (from a plan library or an external planner application) from the execution of currently active plans. Consequently, BDI agents are able to balance the time spent on deliberating about plans (choosing what to do) and executing those plans (doing it). A third activity, creating the plans in the first place (planning), is not within the scope of the model, and is left to the system designer and programmer. == Overview == In order to achieve this separation, the BDI software model implements the principal aspects of Michael Bratman's theory of human practical reasoning (also referred to as Belief-Desire-Intention, or BDI). That is to say, it implements the notions of belief, desire and (in particular) intention, in a manner inspired by Bratman. For Bratman, desire and intention are both pro-attitudes (mental attitudes concerned with action). He identifies commitment as the distinguishing factor between desire and intention, noting that it leads to (1) temporal persistence in plans and (2) further plans being made on the basis of those to which it is already committed. The BDI software model partially addresses these issues. Temporal persistence, in the sense of explicit reference to time, is not explored. The hierarchical nature of plans is more easily implemented: a plan consists of a number of steps, some of which may invoke other plans. The hierarchical definition of plans itself implies a kind of temporal persistence, since the overarching plan remains in effect while subsidiary plans are being executed. An important aspect of the BDI software model (in terms of its research relevance) is the existence of logical models through which it is possible to define and reason about BDI agents. Research in this area has led, for example, to the axiomatization of some BDI implementations, as well as to formal logical descriptions such as Anand Rao and Michael Georgeff's BDICTL. The latter combines a multiple-modal logic (with modalities representing beliefs, desires and intentions) with the temporal logic CTL. More recently, Michael Wooldridge has extended BDICTL to define LORA (the Logic Of Rational Agents), by incorporating an action logic. In principle, LORA allows reasoning not only about individual agents, but also about communication and other interaction in a multi-agent system. The BDI software model is closely associated with intelligent agents, but does not, of itself, ensure all the characteristics associated with such agents. For example, it allows agents to have private beliefs, but does not force them to be private. It also has nothing to say about agent communication. Ultimately, the BDI software model is an attempt to solve a problem that has more to do with plans and planning (the choice and execution thereof) than it has to do with the programming of intelligent agents. This approach has recently been proposed by Steven Umbrello and Roman Yampolskiy as a means of designing autonomous vehicles for human values. == BDI agents == A BDI agent is a particular type of bounded rational software agent, imbued with particular mental attitudes, viz: Beliefs, Desires and Intentions (BDI). === Architecture === This section defines the idealized architectural components of a BDI system. Beliefs: Beliefs represent the informational state of the agent–its beliefs about the world (including itself and other agents). Beliefs can also include inference rules, allowing forward chaining to lead to new beliefs. Using the term belief rather than knowledge recognizes that what an agent believes may not necessarily be true (and in fact may change in the future). Beliefset: Beliefs are stored in database (sometimes called a belief base or a belief set), although that is an implementation decision. Desires: Desires represent the motivational state of the agent. They represent objectives or situations that the agent would like to accomplish or bring about. Examples of desires might be: find the best price, go to the party or become rich. Goals: A goal is a desire that has been adopted for active pursuit by the agent. Usage of the term goals adds the further restriction that the set of active desires must be consistent. For example, one should not have concurrent goals to go to a party and to stay at home – even though they could both be desirable. Intentions: Intentions represent the deliberative state of the agent – what the agent has chosen to do. Intentions are desires to which the agent has to some extent committed. In implemented systems, this means the agent has begun executing a plan. Plans: Plans are sequences of actions (recipes or knowledge areas) that an agent can perform to achieve one or more of its intentions. Plans may include other plans: my plan to go for a drive may include a plan to find my car keys. This reflects that in Bratman's model, plans are initially only partially conceived, with details being filled in as they progress. Events: These are triggers for reactive activity by the agent. An event may update beliefs, trigger plans or modify goals. Events may be generated externally and received by sensors or integrated systems. Additionally, events may be generated internally to trigger decoupled updates or plans of activity. BDI was also extended with an obligations component, giving rise to the BOID agent architecture to incorporate obligations, norms and commitments of agents that act within a social environment. === BDI interpreter === This section defines an idealized BDI interpreter that provides the basis of SRI's PRS lineage of BDI systems: initialize-state repeat options: option-generator (event-queue) selected-options: deliberate(options) update-intentions(selected-options) execute() get-new-external-events() drop-unsuccessful-attitudes() drop-impossible-attitudes() end repeat === Limitations and criticisms === The BDI software model is one example of a reasoning architecture for a single rational agent, and one concern in a broader multi-agent system. This section bounds the scope of concerns for the BDI software model, highlighting known limitations of the architecture. Learning: BDI agents lack any specific mechanisms within the architecture to learn from past behavior and adapt to new situations. Three attitudes: Classical decision theorists and planning research questions the necessity of having all three attitudes, distributed AI research questions whether the three attitudes are sufficient. Logics: The multi-modal logics that underlie BDI (that do not have complete axiomatizations and are not efficiently computable) have little relevance in practice. Multiple agents: In addition to not explicitly supporting learning, the framework may not be appropriate to learning behavior. Further, the BDI model does not explicitly describe mechanisms for interaction with other agents and integration into a multi-agent system. Explicit goals: Most BDI implementations do not have an explicit representation of goals. Lookahead: The architecture does not have (by design) any lookahead deliberation or forward planning. This may not be desirable because adopted plans may use up limited resources, actions may not be reversible, task execution may take longer than forward planning, and actions may have undesirable side effects if unsuccessful. == BDI agent implementations == === 'Pure' BDI === Procedural Reasoning System (PRS) IRMA (not implemented but can be considered as PRS with non-reconsideration) UM-PRS OpenPRS Distributed Multi-Agent Reasoning System (dMARS) AgentSpeak(L) – see Jason below AgentSpeak(RT) Agent Real-Time System (ARTS) (ARTS) JAM JACK Intelligent Agents JADEX (open source project) JaKtA JASON GORITE SPARK 3APL 2APL GOAL agent programming language CogniTAO (Think-As-One) Living Systems Process Suite PROFETA Gwendolen (Part of the Model Checking Agent Programming Languages Framework) === Extensions and hybrid systems === JACK Teams CogniTAO (Think-As-One) Living Systems Process Suite Brahms JaCaMo

    Read more →
  • Cygwin

    Cygwin

    Cygwin ( SIG-win) is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing open-source packages. Cygwin allows source code for Unix-like operating systems to be compiled and run on Windows. Cygwin provides native integration of Windows-based applications. The terminal emulator mintty is the default command-line interface provided to interact with the environment. The Cygwin installation directory layout mimics the root file system of Unix-like systems, with directories such as /bin, /home, /etc, /usr, and /var. Cygwin is released under the GNU Lesser General Public License version 3. It was originally developed by Cygnus Solutions, which was later acquired by Red Hat (now part of IBM), to port the GNU toolchain to Win32, including the GNU Compiler Suite. Rather than rewrite the tools to use the Win32 runtime environment, Cygwin implemented a POSIX-compatible environment in the form of a DLL. The brand motto is "Get that Linux feeling – on Windows", although Cygwin doesn't have Linux in it. == History == Cygwin began in 1995 as a project of Steve Chamberlain, a Cygnus engineer who observed that Windows NT and 95 used COFF as their object file format, and that GNU already included support for x86 and COFF, and the C library newlib. He thought that it would be possible to retarget GCC and produce a cross compiler generating executables that could run on Windows. A prototype was later developed. Chamberlain bootstrapped the compiler on a Windows system, to emulate Unix to let the GNU configure shell script run. Initially, Cygwin was called Cygwin32. When Microsoft registered the trademark Win32, the "32" was dropped to simply become Cygwin. In 1999, Cygnus offered Cygwin 1.0 as a commercial product. Subsequent versions have not been released, instead relying on continued open source releases. Geoffrey Noer was the project lead from 1996 to 1999. Christopher Faylor was lead from 1999 to 2004; he left Red Hat and became co-lead with Corinna Vinschen. Corinna Vinschen has been the project lead from mid-2014 to date (as of September, 2024). From June 23, 2016, the Cygwin library version 2.5.2 was licensed under the GNU Lesser General Public License (LGPL) version 3. == Description == Cygwin is provided in two versions: the full 64-bit version and a stripped-down 32-bit version, whose final version was released in 2022. Cygwin consists of a library that implements the POSIX system call API in terms of Windows system calls to enable the running of a large number of application programs equivalent to those on Unix systems, and a GNU development toolchain (including GCC and GDB). Programmers have ported the X Window System, K Desktop Environment 3, GNOME, Apache, and TeX. Cygwin permits installing inetd, syslogd, sshd, Apache, and other daemons as standard Windows services. Cygwin programs have full access to the Windows API and other Windows libraries. Cygwin programs are installed by running Cygwin's "setup" program, which downloads them from repositories on the Internet. The Cygwin API library is licensed under the GNU Lesser General Public License version 3 (or later), with an exception to allow linking to any free and open-source software whose license conforms to the Open Source Definition. Cygwin consists of two parts: A dynamic-link library in the form of a C standard library that acts as a compatibility layer for the POSIX API and A collection of software tools and applications that provide a Unix-like look and feel. Cygwin supports POSIX symbolic links, representing them as plain-text files with the system attribute set. Cygwin 1.5 represented them as Windows Explorer shortcuts, but this was changed for reasons of performance and POSIX correctness. Cygwin also recognises NTFS junction points and symbolic links and treats them as POSIX symbolic links, but it does not create them. The POSIX API for handling access control lists (ACLs) is supported. === Technical details === A Cygwin-specific version of the Unix mount command allows mounting Windows paths as "filesystems" in the Unix file space. Initial mount points can be configured in /etc/fstab, which has a format very similar to Unix systems, except that Windows paths appear in place of devices. Filesystems can be mounted in binary mode (by default), or in text mode, which enables automatic conversion between LF and CRLF endings (which only affects programs that open files without explicitly specifying text or binary mode). Cygwin 1.7 introduced comprehensive support for POSIX locales, and the UTF-8 Unicode encoding became the default. The fork system call for duplicating a process is fully implemented, but the copy-on-write optimization strategy could not be used. Cygwin's default user interface is the bash shell running in the mintty terminal emulator. The DLL also implements pseudo terminal (pty) devices, and Cygwin ships with a number of terminal emulators that are based on them, including rxvt/urxvt and xterm. The version of GCC that comes with Cygwin has various extensions for creating Windows DLLs, such as specifying whether a program is a windowing or console-mode program. Support for compiling programs that do not require the POSIX compatibility layer provided by the Cygwin DLL used to be included in the default GCC, but as of 2014, it is provided by cross-compilers contributed by the MinGW-w64 project. == Software packages == Cygwin's base package selection is approximately 100MB, containing the bash (interactive user) and dash (installation) shells and the core file and text manipulation utilities. Additional packages are available as optional installs from within the Cygwin "setup" program and package manager ("setup-x86_64.exe" – 64 bit). The Cygwin Ports project provided additional packages that were not available in the Cygwin distribution itself. Examples included GNOME, K Desktop Environment 3, MySQL database, and the PHP scripting language. Most ports have been adopted by volunteer maintainers as Cygwin packages, and Cygwin Ports are no longer maintained. Cygwin ships with GTK+ and Qt. The Cygwin/X project allows graphical Unix programs to display their user interfaces on the Windows desktop for both local and remote programs.

    Read more →
  • Recraft

    Recraft

    Recraft is a generative artificial intelligence program and service developed by the London-based startup Recraft, Inc. The company also offers Recraft Studio, a web-based workspace that lets users create and edit images, vectors, and mockups using various text-to-image models. Like models such as Midjourney and DALL-E, the Recraft model generates digital images from natural language prompts, and is specifically tailored for creative workflows, with features that emphasize brand consistency, text fidelity, and layout control. == History and background == Recraft, Inc. was founded in 2022 by machine learning scientist Anna Veronika Dorogush, best known for co-creating the CatBoost machine learning library at Yandex. The company emerged from stealth on May 31, 2023, with a public release of its vector graphics generation capability on Product Hunt. On January 17, 2024, TechCrunch profiled Recraft’s foundational model for graphic design, noting its emphasis on addressing copyright and ethical concerns associated with AI-generated imagery. On October 28, 2024, TechCrunch reported that Recraft's third major model, V3, had topped a crowdsourced benchmark, surpassing Midjourney and OpenAI's DALL-E in overall image quality. On May 5, 2025, Recraft announced a $30 million Series B funding round led by Accel, reporting more than four million registered users at the time of the announcement. == Models == Recraft has developed multiple generations of its text-to-image models since 2022. Each generation reflects improvements in fidelity, controllability, and support for both raster and vector outputs. The models are proprietary and accessible through the Recraft API, Recraft Studio. Recraft models are also hosted as an image generation API on fal, Replicate, Prodia, and others. === Recraft V2 === Recraft V2 was released in March 2024 and was the company’s first model trained from scratch. It contained roughly 20 billion parameters and introduced native vector image generation, brand-color conditioning, and improved stylistic consistency for icons and illustrations. === Recraft V3 === Recraft V3 was released in October 2024 and achieved first place on the Artificial Analysis benchmark hosted on Hugging Face. The model introduced advances in photorealism, improved rendering of multi-word text, and increased responsiveness to detailed descriptive prompts. It also added the “Artistic” parameter, which allowed users to adjust stylistic intensity within generated images. === Recraft V4 === Recraft V4 was released in February 2026. According to Recraft, V4 is a “ground-up rebuild” aimed at improving prompt accuracy and output quality for design workflows, with the company emphasizing “design taste” and art-directed results. Recraft states that V4 is available in two versions: V4 for faster iteration and V4 Pro for higher-resolution, print-ready assets; the API documentation describes V4 as 1-megapixel output and V4 Pro as 4-megapixel output, with vector variants available for each. === Features === Vectorization: Recraft’s models can generate and convert images into native vector formats, producing scalable graphics composed of editable paths rather than fixed pixels. Style reference: The models support the use of reference images to guide stylistic characteristics such as color palette, line quality, composition, or visual tone. Style mixing: Recraft models can combine multiple stylistic inputs within a single generation. By blending attributes from different references or stylistic instructions, the system produces images that reflect hybrid visual characteristics while maintaining internal consistency. Inpainting editing: The models support localized image modification through inpainting, enabling users to regenerate selected regions of an image while preserving surrounding content. === Model capabilities === Recraft’s models generate raster and vector images from natural-language prompts and are designed to interpret detailed descriptions with attention to composition, style, and text placement. The models support controlled stylistic variation through preset or reference-based guidance and can maintain coherent line, color, or layout structure across multiple outputs. They produce scalable vector graphics alongside high-resolution raster images, and include features for localized image modification through inpainting or outpainting operations. === Technology === Recraft has not publicly disclosed the detailed technical architecture of its models. However, third-party reviews and benchmarks have noted that its performance resembles diffusion models such as Midjourney and Stable Diffusion. The model is designed for creative workflows requiring visual consistency and flexible output formats. Reviewers have noted its ability to generate legible multi-line text, produce high-resolution imagery at various canvas sizes, and to maintain alignment with user-defined brand palettes and design themes. Though not open-source, Recraft's models are accessible through a web interface and commercial API. Advanced features such as style settings and positioning control differentiate it from general-purpose text-to-image models. == Recraft Studio == Recraft Studio is a web-based workspace for generating and editing images using Recraft’s image models and selected external models. The infinite canvas interface provides access to a range of creation and refinement tools within a single environment. Raster and vector generation with styles: Recraft Studio supports the generation of both raster and vector images. Users can apply predefined or reference-based styles during generation, allowing for visual consistency across multiple outputs. Mockups: The studio includes mockup tools that allow generated designs to be placed onto predefined surfaces or templates for visualization and presentation purposes. Vectorization: Recraft Studio provides vectorization tools that convert raster images into editable vector graphics, enabling further modification of shapes, colors, and layout. Image upscaling: The workspace includes image upscaling functionality for increasing resolution while preserving visual detail. Editing tools and natural-language editing: Recraft Studio offers a set of editing tools for modifying images within the canvas, including localized adjustments and natural-language–based editing commands that allow users to describe changes using text. === Supported models === Recraft Studio provides access to Recraft’s proprietary image models as well as other external frontier image models such as Nano Banana, GPT 4-o, Imagen, Flux, and others. == Business model == Recraft develops proprietary image models that are accessible through Recraft Studio and the Recraft API. Recraft Studio operates on a freemium model, offering a free tier with limited daily credits and paid subscriptions for access to additional features. The API follows a credit-based system in which units are purchased separately for programmatic image generation. A team plan supports collaborative use, and the API enables organizations and developers to integrate Recraft’s image generation and editing capabilities into their own systems and workflows.

    Read more →
  • T-norm fuzzy logics

    T-norm fuzzy logics

    T-norm fuzzy logics are a family of non-classical logics, informally delimited by having a semantics that takes the real unit interval [0, 1] for the system of truth values and functions called t-norms for permissible interpretations of conjunction. They are mainly used in applied fuzzy logic and fuzzy set theory as a theoretical basis for approximate reasoning. T-norm fuzzy logics belong in broader classes of fuzzy logics and many-valued logics. In order to generate a well-behaved implication, the t-norms are usually required to be left-continuous; logics of left-continuous t-norms further belong in the class of substructural logics, among which they are marked with the validity of the law of prelinearity, (A → B) ∨ (B → A). Both propositional and first-order (or higher-order) t-norm fuzzy logics, as well as their expansions by modal and other operators, are studied. Logics that restrict the t-norm semantics to a subset of the real unit interval (for example, finitely valued Łukasiewicz logics) are usually included in the class as well. Important examples of t-norm fuzzy logics are monoidal t-norm logic (MTL) of all left-continuous t-norms, basic logic (BL) of all continuous t-norms, product fuzzy logic of the product t-norm, or the nilpotent minimum logic of the nilpotent minimum t-norm. Some independently motivated logics belong among t-norm fuzzy logics, too, for example Łukasiewicz logic (which is the logic of the Łukasiewicz t-norm) or Gödel–Dummett logic (which is the logic of the minimum t-norm). == Motivation == As members of the family of fuzzy logics, t-norm fuzzy logics primarily aim at generalizing classical two-valued logic by admitting intermediary truth values between 1 (truth) and 0 (falsity) representing degrees of truth of propositions. The degrees are assumed to be real numbers from the unit interval [0, 1]. In propositional t-norm fuzzy logics, propositional connectives are stipulated to be truth-functional, that is, the truth value of a complex proposition formed by a propositional connective from some constituent propositions is a function (called the truth function of the connective) of the truth values of the constituent propositions. The truth functions operate on the set of truth degrees (in the standard semantics, on the [0, 1] interval); thus the truth function of an n-ary propositional connective c is a function Fc: [0, 1]n → [0, 1]. Truth functions generalize truth tables of propositional connectives known from classical logic to operate on the larger system of truth values. T-norm fuzzy logics impose certain natural constraints on the truth function of conjunction. The truth function ∗ : [ 0 , 1 ] 2 → [ 0 , 1 ] {\displaystyle \colon [0,1]^{2}\to [0,1]} of conjunction is assumed to satisfy the following conditions: Commutativity, that is, x ∗ y = y ∗ x {\displaystyle xy=yx} for all x and y in [0, 1]. This expresses the assumption that the order of fuzzy propositions is immaterial in conjunction, even if intermediary truth degrees are admitted. Associativity, that is, ( x ∗ y ) ∗ z = x ∗ ( y ∗ z ) {\displaystyle (xy)z=x(yz)} for all x, y, and z in [0, 1]. This expresses the assumption that the order of performing conjunction is immaterial, even if intermediary truth degrees are admitted. Monotony, that is, if x ≤ y {\displaystyle x\leq y} then x ∗ z ≤ y ∗ z {\displaystyle xz\leq yz} for all x, y, and z in [0, 1]. This expresses the assumption that increasing the truth degree of a conjunct should not decrease the truth degree of the conjunction. Neutrality of 1, that is, 1 ∗ x = x {\displaystyle 1x=x} for all x in [0, 1]. This assumption corresponds to regarding the truth degree 1 as full truth, conjunction with which does not decrease the truth value of the other conjunct. Together with the previous conditions this condition ensures that also 0 ∗ x = 0 {\displaystyle 0x=0} for all x in [0, 1], which corresponds to regarding the truth degree 0 as full falsity, conjunction with which is always fully false. Continuity of the function ∗ {\displaystyle } (the previous conditions reduce this requirement to the continuity in either argument). Informally this expresses the assumption that microscopic changes of the truth degrees of conjuncts should not result in a macroscopic change of the truth degree of their conjunction. This condition, among other things, ensures a good behavior of (residual) implication derived from conjunction; to ensure the good behavior, however, left-continuity (in either argument) of the function ∗ {\displaystyle } is sufficient. In general t-norm fuzzy logics, therefore, only left-continuity of ∗ {\displaystyle } is required, which expresses the assumption that a microscopic decrease of the truth degree of a conjunct should not macroscopically decrease the truth degree of conjunction. These assumptions make the truth function of conjunction a left-continuous t-norm, which explains the name of the family of fuzzy logics (t-norm based). Particular logics of the family can make further assumptions about the behavior of conjunction (for example, Gödel–Dummett logic requires its idempotence) or other connectives (for example, the logic IMTL (involutive monoidal t-norm logic) requires the involutiveness of negation). All left-continuous t-norms ∗ {\displaystyle } have a unique residuum, that is, a binary function ⇒ {\displaystyle \Rightarrow } such that for all x, y, and z in [0, 1], x ∗ y ≤ z {\displaystyle xy\leq z} if and only if x ≤ y ⇒ z . {\displaystyle x\leq y\Rightarrow z.} The residuum of a left-continuous t-norm can explicitly be defined as ( x ⇒ y ) = sup { z ∣ z ∗ x ≤ y } . {\displaystyle (x\Rightarrow y)=\sup\{z\mid zx\leq y\}.} This ensures that the residuum is the pointwise largest function such that for all x and y, x ∗ ( x ⇒ y ) ≤ y . {\displaystyle x(x\Rightarrow y)\leq y.} The latter can be interpreted as a fuzzy version of the modus ponens rule of inference. The residuum of a left-continuous t-norm thus can be characterized as the weakest function that makes the fuzzy modus ponens valid, which makes it a suitable truth function for implication in fuzzy logic. Left-continuity of the t-norm is the necessary and sufficient condition for this relationship between a t-norm conjunction and its residual implication to hold. Truth functions of further propositional connectives can be defined by means of the t-norm and its residuum, for instance the residual negation ¬ x = ( x ⇒ 0 ) {\displaystyle \neg x=(x\Rightarrow 0)} or bi-residual equivalence x ⇔ y = ( x ⇒ y ) ∗ ( y ⇒ x ) . {\displaystyle x\Leftrightarrow y=(x\Rightarrow y)(y\Rightarrow x).} Truth functions of propositional connectives may also be introduced by additional definitions: the most usual ones are the minimum (which plays a role of another conjunctive connective), the maximum (which plays a role of a disjunctive connective), or the Baaz Delta operator, defined in [0, 1] as Δ x = 1 {\displaystyle \Delta x=1} if x = 1 {\displaystyle x=1} and Δ x = 0 {\displaystyle \Delta x=0} otherwise. In this way, a left-continuous t-norm, its residuum, and the truth functions of additional propositional connectives determine the truth values of complex propositional formulae in [0, 1]. Formulae that always evaluate to 1 are called tautologies with respect to the given left-continuous t-norm ∗ , {\displaystyle ,} or ∗ - {\displaystyle {\mbox{-}}} tautologies. The set of all ∗ - {\displaystyle {\mbox{-}}} tautologies is called the logic of the t-norm ∗ , {\displaystyle ,} as these formulae represent the laws of fuzzy logic (determined by the t-norm) that hold (to degree 1) regardless of the truth degrees of atomic formulae. Some formulae are tautologies with respect to a larger class of left-continuous t-norms; the set of such formulae is called the logic of the class. Important t-norm logics are the logics of particular t-norms or classes of t-norms, for example: Łukasiewicz logic is the logic of the Łukasiewicz t-norm x ∗ y = max ( x + y − 1 , 0 ) {\displaystyle xy=\max(x+y-1,0)} Gödel–Dummett logic is the logic of the minimum t-norm x ∗ y = min ( x , y ) {\displaystyle xy=\min(x,y)} Product fuzzy logic is the logic of the product t-norm x ∗ y = x ⋅ y {\displaystyle xy=x\cdot y} Monoidal t-norm logic MTL is the logic of (the class of) all left-continuous t-norms Basic fuzzy logic BL is the logic of (the class of) all continuous t-norms It turns out that many logics of particular t-norms and classes of t-norms are axiomatizable. The completeness theorem of the axiomatic system with respect to the corresponding t-norm semantics on [0, 1] is then called the standard completeness of the logic. Besides the standard real-valued semantics on [0, 1], the logics are sound and complete with respect to general algebraic semantics, formed by suitable classes of prelinear commutative bounded integral residuated lattices. == History == Some particular t-norm fuzzy logics have been introduced and investigated long before the family was re

    Read more →
  • Netflix Prize

    Netflix Prize

    The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified except by numbers assigned for the contest. The competition was held by Netflix, a video streaming service, and was open to anyone who was neither connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) nor a resident of certain blocked countries (such as Cuba or North Korea). On September 21, 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%. == Problem and data sets == Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integer) stars. The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are informed of the score for only half of the data: a quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in the form of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that, while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set; 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, the title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of the customers, "some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates." The training set is constructed such that the average user rated over 200 movies, and the average movie was rated by over 5000 users. But there is wide variance in the data—some movies in the training set have as few as 3 ratings, while one user rated over 17,000 movies. There was some controversy as to the choice of RMSE as the defining metric. It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user. == Prizes == Prizes were based on improvement over Netflix's own algorithm, called Cinematch, or the previous year's score if a team has made improvement beyond a certain threshold. A trivial algorithm that predicts for each movie in the quiz set its average grade from the training data produces an RMSE of 1.0540. Cinematch uses "straightforward statistical linear models with a lot of data conditioning." The performance of Cinematch had plateaued by 2006. Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data, roughly a 10% improvement over the trivial algorithm. Cinematch has a similar performance on the test set, 0.9525. In order to win the grand prize of $1,000,000, a participating team had to improve this by another 10%, to achieve 0.8572 on the test set. Such an improvement on the quiz set corresponds to an RMSE of 0.8563. As long as no team won the grand prize, a progress prize of $50,000 was awarded every year for the best result thus far. However, in order to win this prize, an algorithm had to improve the RMSE on the quiz set by at least 1% over the previous progress prize winner (or over Cinematch, the first year). If no submission succeeded, the progress prize was not to be awarded for that year. To win a progress or grand prize a participant had to provide source code and a description of the algorithm to the jury within one week after being contacted by them. Following verification the winner also had to provide a non-exclusive license to Netflix. Netflix would publish only the description, not the source code, of the system. (To keep their algorithm and source code secret, a team could choose not to claim a prize.) The jury also kept their predictions secret from other participants. A team could send as many attempts to predict grades as they wish. Originally submissions were limited to once a week, but the interval was quickly modified to once a day. A team's best submission so far counted as their current submission. Once one of the teams succeeded in improving the RMSE by 10% or more, the jury would issue a last call, giving all teams 30 days to send their submissions. Only then, the team with the best submission was asked for the algorithm description, source code, and non-exclusive license, and, after successful verification; declared a grand prize winner. The contest would last until the grand prize winner was declared. Had no one received the grand prize, it would have lasted for at least five years (until October 2, 2011). After that date, the contest could have been terminated at any time at Netflix's sole discretion. == Progress over the years == The competition began on October 2, 2006. By October 8, a team called WXYZConsulting had already beaten Cinematch's results. By October 15, there were three teams who had beaten Cinematch, one of them by 1.06%, enough to qualify for the annual progress prize. By June 2007 over 20,000 teams had registered for the competition from over 150 countries. 2,000 teams had submitted over 13,000 prediction sets. Over the first year of the competition, a handful of front-runners traded first place. The more prominent ones were: WXYZConsulting, a team of Wei Xu and Yi Zhang. (A front runner during November–December 2006.) ML@UToronto A, a team from the University of Toronto led by Prof. Geoffrey Hinton. (A front runner during parts of October–December 2006.) Gravity, a team of four scientists from the Budapest University of Technology (A front runner during January–May 2007.) BellKor, a group of scientists from AT&T Labs. (A front runner since May 2007.) Dinosaur Planet, a team of three undergraduates from Princeton University. (A front runner on September 3, 2007 for one hour before BellKor snatched back the lead.) The algorithms used by the leading teams were usually an ensemble of singular value decomposition, k-nearest neighbor, neural networks, and so on. On August 12, 2007, many contestants gathered at the KDD Cup and Workshop 2007, held at San Jose, California. During the workshop all four of the top teams on the leaderboard at that time presented their techniques. The team from IBM Research—Yan Liu, Saharon Rosset, Claudia Perlich, and Zhenzhen Kou—won the third place in Task 1 and first place in Task 2. Over the second year of the competition, only three teams reached the leading position: BellKor, a group of scientists from AT&T Labs (front runner during May 2007 – September 2008) BigChaos, a team of Austrian scientists from Commendo Research & Consulting (single team front runner since October 2008) BellKor in BigChaos, a joint team of the two leading single teams (a front runner since September 2008) === 2007 Progress Prize === On September 2, 2007, the competition entered the "last call" period for the 2007 Progress Prize. Over 40,000 teams from 186 countries had entered the contest. They had thirty days to tender submissions for consideration. At the beginning of this period the leading team was BellKor, with an RMSE of 0.8728 (8.26% improvement), followed by Dinosaur Planet (RMSE = 0.8769; 7.83% improvement), and Gravity (RMSE = 0.8785; 7.66% improvement). In the last hour of the last call period, an entry by "KorBell" took first place. This turned out to be an alternate name for Team BellKor. On November 13, 2007, team KorBell (formerly BellKor) was declared the winner of the $50,000 Progress Prize with an RMSE of 0.8712 (8.43% improvement). The team consisted of three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky. As required, they published a description of their a

    Read more →
  • Ray tracing (graphics)

    Ray tracing (graphics)

    In 3D computer graphics, ray tracing is a technique for modeling light transport for use in a wide variety of rendering algorithms for generating digital images. On a spectrum of computational cost and visual fidelity, ray tracing-based rendering techniques, such as ray casting, recursive ray tracing, distribution ray tracing, photon mapping and path tracing, are generally slower and higher fidelity than scanline rendering methods. Thus, ray tracing was first deployed in applications where taking a relatively long time to render could be tolerated, such as still CGI images, and film and television visual effects (VFX), but was less suited to real-time applications such as video games, where speed is critical in rendering each frame. Since 2018, however, hardware acceleration for real-time ray tracing has become standard on new commercial graphics cards, and graphics APIs have followed suit, allowing developers to use hybrid ray tracing and rasterization-based rendering in games and other real-time applications with a lesser hit to frame render times. Ray tracing is capable of simulating a variety of optical effects, such as reflection, refraction, soft shadows, scattering, depth of field, motion blur, caustics, ambient occlusion and dispersion phenomena (such as chromatic aberration). It can also be used to trace the path of sound waves in a similar fashion to light waves, making it a viable option for more immersive sound design in video games by rendering realistic reverberation and echoes. In fact, any physical wave or particle phenomenon with approximately linear motion can be simulated with ray tracing. Ray tracing–based rendering techniques that sample light over a domain typically generate multiple rays and often rely on denoising to reduce the resulting noise. == History == The idea of ray tracing comes from as early as the 16th century, when it was described by Albrecht Dürer, who is credited for its invention. Dürer described multiple techniques for projecting 3-D scenes onto an image plane. Some of these project chosen geometry onto the image plane, as is done with rasterization today. Others determine what geometry is visible along a given ray, as is done with ray tracing. Using a computer for ray tracing to generate shaded pictures was first accomplished by Arthur Appel in 1968. Appel used ray tracing for primary visibility (determining the closest surface to the camera at each image point) by tracing a ray through each point to be shaded into the scene to identify the visible surface. The closest surface intersected by the ray was the visible one. This non-recursive ray tracing-based rendering algorithm is today called "ray casting". His algorithm then traced secondary rays to the light source from each point being shaded to determine whether the point was in shadow or not. Later, in 1971, Goldstein and Nagel of MAGI (Mathematical Applications Group, Inc.) published "3-D Visual Simulation", wherein ray tracing was used to make shaded pictures of solids. At the ray-surface intersection point found, they computed the surface normal and, knowing the position of the light source, computed the brightness of the pixel on the screen. Their publication describes a short (30-second) film "made using the University of Maryland's display hardware outfitted with a 16mm camera. The film showed the helicopter and a simple ground-level gun emplacement. The helicopter was programmed to undergo a series of maneuvers including turns, take-offs, and landings, etc., until it eventually is shot down and crashed." A CDC 6600 computer was used. MAGI produced an animation video called MAGI/SynthaVision Sampler in 1974. Another early instance of ray casting came in 1976, when Scott Roth created a flip book animation in Bob Sproull's computer graphics course at Caltech. The scanned pages are shown as a video in the accompanying image. Roth's computer program noted an edge point at a pixel location if the ray intersected a bounded plane different from that of its neighbors. Of course, a ray could intersect multiple planes in space, but only the surface point closest to the camera was noted as visible. The platform was a DEC PDP-10, a Tektronix storage-tube display, and a printer which would create an image of the display on rolling thermal paper. Roth extended the framework, introduced the term ray casting in the context of computer graphics and solid modeling, and in 1982 published his work while at GM Research Labs. Turner Whitted was the first to show recursive ray tracing for mirror reflection and for refraction through translucent objects, with an angle determined by the solid's index of refraction, and to use ray tracing for anti-aliasing. Whitted also showed ray traced shadows. He produced a recursive ray traced film called The Compleat Angler in 1979 while an engineer at Bell Labs. Whitted's deeply recursive ray tracing algorithm reframed rendering from being primarily a matter of surface visibility determination to being a matter of light transport. His paper inspired a series of subsequent work by others that included distribution ray tracing and finally unbiased path tracing, which provides the rendering equation framework that has allowed computer-generated imagery to be faithful to reality. For decades, global illumination in major films using computer-generated imagery was approximated with additional lights. Ray tracing-based rendering eventually changed that by enabling physically based light transport. Early feature films rendered entirely using path tracing include Monster House (2006), Cloudy with a Chance of Meatballs (2009), and Monsters University (2013). == Algorithm overview == Optical ray tracing describes a method for producing visual images constructed in 3D computer graphics environments, with more photorealism than either ray casting or scanline rendering techniques. It works by tracing a path from an imaginary eye through each pixel in a virtual screen, and calculating the color of the object visible through it. Scenes in ray tracing are described mathematically by a programmer or by a visual artist (normally using intermediary tools). Scenes may also incorporate data from images and models captured by means such as digital photography. Typically, each ray must be tested for intersection with some subset of all the objects in the scene. Once the nearest object has been identified, the algorithm will estimate the incoming light at the point of intersection, examine the material properties of the object, and combine this information to calculate the final color of the pixel. Certain illumination algorithms and reflective or translucent materials may require more rays to be re-cast into the scene. It may at first seem counterintuitive or "backward" to send rays away from the camera, rather than into it (as actual light does in reality), but doing so is many orders of magnitude more efficient. Since the overwhelming majority of light rays from a given light source do not make it directly into the viewer's eye, a "forward" simulation could potentially waste a tremendous amount of computation on light paths that are never recorded. Therefore, the shortcut taken in ray tracing is to presuppose that a given ray intersects the view frame. After either a maximum number of reflections or a ray traveling a certain distance without intersection, the ray ceases to travel and the pixel's value is updated. === Calculate rays for rectangular viewport === On input we have (in calculation we use vector normalization and cross product): E ∈ R 3 {\displaystyle E\in \mathbb {R^{3}} } eye position T ∈ R 3 {\displaystyle T\in \mathbb {R^{3}} } target position θ ∈ [ 0 , π ] {\displaystyle \theta \in [0,\pi ]} field of view - for humans, we can assume ≈ π / 2 rad = 90 ∘ {\displaystyle \approx \pi /2{\text{ rad}}=90^{\circ }} m , k ∈ N {\displaystyle m,k\in \mathbb {N} } numbers of square pixels on viewport vertical and horizontal direction i , j ∈ N , 1 ≤ i ≤ k ∧ 1 ≤ j ≤ m {\displaystyle i,j\in \mathbb {N} ,1\leq i\leq k\land 1\leq j\leq m} numbers of actual pixel v → ∈ R 3 {\displaystyle {\vec {v}}\in \mathbb {R^{3}} } vertical vector which indicates where is up and down, usually v → = [ 0 , 1 , 0 ] {\displaystyle {\vec {v}}=[0,1,0]} - roll component which determine viewport rotation around point C (where the axis of rotation is the ET section) The idea is to find the position of each viewport pixel center P i j {\displaystyle P_{ij}} which allows us to find the line going from eye E {\displaystyle E} through that pixel and finally get the ray described by point E {\displaystyle E} and vector R → i j = P i j − E {\displaystyle {\vec {R}}_{ij}=P_{ij}-E} (or its normalization r → i j {\displaystyle {\vec {r}}_{ij}} ). First we need to find the coordinates of the bottom left viewport pixel P 1 m {\displaystyle P_{1m}} and find the next pixel by making a shift along directions parallel to viewport (vectors b → n {\displaystyle {\vec {b}}_{n

    Read more →
  • Vibe coding

    Vibe coding

    Vibe coding is a software development practice assisted by artificial intelligence (AI) where the software developer describes a project or task in a prompt to a large language model (LLM), which generates source code automatically. Vibe coding may involve accepting AI-generated code without thorough review of the output, instead relying on results and follow-up prompts to guide changes. The term was coined in February 2025 by computer scientist Andrej Karpathy, a co-founder of OpenAI and former AI leader at Tesla. Merriam-Webster listed the term in March 2025 as a "slang & trending" expression. It was named the Collins English Dictionary Word of the Year for 2025. Advocates of vibe coding say that it allows even amateur programmers to produce software without the extensive training and skills required for software engineering. Critics point out a lack of accountability, maintainability, and the increased risk of introducing security vulnerabilities in the resulting software. == Definition == The concept refers to a coding approach that relies on LLMs, allowing programmers to generate working code by providing natural language descriptions rather than manually writing in a formal programming language. Karpathy described it as a form of coding where you "fully give in to the vibes, embrace exponentials, and forget that the code even exists". When vibe coding, the programmer guides, tests, and gives feedback about the AI-generated source code, rather than manually writing code. The concept of vibe coding elaborates on Karpathy's claim from 2023 that "the hottest new programming language is English", meaning that the capabilities of LLMs were such that humans would no longer need to learn specific programming languages to command computers. Some commentators argue that a key to the definition is a lack of knowledge about the code, and that thorough review and testing is incompatible with the definition of vibe coding. Programmer Simon Willison said: "If an LLM wrote every line of your code, but you've reviewed, tested, and understood it all, that's not vibe coding in my book—that's using an LLM as a typing assistant." == Reception and use == In February 2025, New York Times journalist Kevin Roose, who is not a professional coder, experimented with vibe coding to create several small-scale applications. He described these as "software for one" due to the ability to personalize the software. However, Roose also stated that the results are often limited and prone to errors. In one case, the AI-generated code fabricated fake reviews for an e-commerce site. In response to Roose, cognitive scientist Gary Marcus said that the algorithm that generated Roose's LunchBox Buddy app had presumably been trained on existing code for similar tasks. Marcus said that Roose's enthusiasm stemmed from reproduction, not originality. In March 2025, Y Combinator reported that 25% of startup companies in its Winter 2025 batch had codebases that were 95% AI-generated, reflecting a shift toward AI-assisted development within newer startups. The question asked was about AI-generated code in general, and not specifically about vibed code. Inspired by "vibe coding", The Economist suggested the term "vibe valuation" to describe the very large valuations of AI startups by venture capital firms that ignore accepted metrics such as annual recurring revenue. In June 2025, Andrew Ng took issue with the term, saying that it misleads people into assuming that software engineers just "go with the vibes" when using AI tools to create applications. In July 2025, The Wall Street Journal reported that vibe coding was being adopted by professional software engineers for commercial use cases. In July 2025, SaaStr founder documented his negative experiences with vibe coding: Replit's AI agent deleted a database despite explicit instructions not to make any changes. In September 2025, Fast Company reported that the "vibe coding hangover" is upon us, with senior software engineers citing "development hell" when working with AI-generated code. It was reported in January 2026 that Linus Torvalds had made use of Google Antigravity to vibe code a tool component of his AudioNoise random digital audio effects generator. Torvalds explained in the project's README file that "the Python visualizer tool has been basically written by vibe-coding". == Criticism == === Quality of code and security issues === Vibe coding has raised concerns about understanding and accountability. Developers may use AI-generated code without comprehending its functionality, leading to undetected bugs, errors, or security vulnerabilities. While this approach may be suitable for prototyping or "throwaway weekend projects" as Karpathy originally envisioned, it is considered by some experts to pose risks in professional settings, where a deep understanding of the code is crucial for debugging, maintenance, and security. Ars Technica cites Simon Willison, who stated: "Vibe coding your way to a production codebase is clearly risky. Most of the work we do as software engineers involves evolving existing systems, where the quality and understandability of the underlying code is crucial." In May 2025, Lovable, a Swedish vibe coding app, was reported to have security vulnerabilities in the code it generated, with 170 out of 1,645 Lovable-created web applications having an issue that would allow personal information to be accessed by anyone. In October 2025 Veracode released a study that showed that over the last 3 years LLMs had become dramatically better at generating functional code, but that the security of generated code had generally not improved. Moreover, larger models were not better than small ones at generating secure code. There was a small increase in security from the OpenAI reasoning models, but not in other reasoning models, and this increase was nothing like the improvement in generated functionality. In December 2025, computer security researcher Etizaz Mohsin discovered a security flaw in the Orchids vibe coding platform, which he demonstrated to a BBC News reporter in February 2026. A December 2025 analysis by CodeRabbit of 470 open-source GitHub pull requests found that code that was co-authored by generative AI contained approximately 1.7 times more "major" issues compared to human-written code. The study revealed that AI co-authored code showed elevated rates of logic errors, including incorrect dependencies, flawed control flow, misconfigurations (75% more common), and security vulnerabilities (2.74x higher). Additionally, they also reported high code readability issues, including formatting errors and naming inconsistencies. === Code maintainability and technical debt === Vibe coding has the potential of making code harder to maintain in the longer term, leading to technical debt. In early 2025, GitClear published the results of a longitudinal analysis of 211 million lines of code changes from 2020 to 2024. They found that the volume of code refactoring dropped from 25% of changed lines in 2021 to under 10% by 2024, code duplication increased approximately four times in volume, copy-pasted code exceeded moved code for the first time in two decades, and code churn (prematurely merged code getting rewritten shortly after merging) nearly doubled. === Task complexity and developer productivity === Generative AI is highly capable of handling simple tasks like basic algorithms. However, such systems struggle with more novel, complex coding problems like projects involving multiple files, poorly documented libraries, or safety-critical code. In July 2025, METR, an organization that evaluates frontier models, ran a randomized controlled trial to understand developer productivity involving generative AI programming tools available in early 2025. They found that experienced open-source developers were 19% slower when using AI coding tools, despite predicting they would be 24% faster and still believing afterward they had been 20% faster. === Challenges with debugging === LLMs generate code dynamically, and the structure of such code may be subject to variation. In addition, since the developer did not write the code, the developer may struggle to understand its syntax and concepts. === Impact on open-source software === In January 2026, a paper authored by experts from several universities titled "Vibe Coding Kills Open Source" argued that vibe coding has negative impact on the open-source software ecosystem. The authors say that increased vibe coding reduces user engagement with open-source maintainers, which has hidden costs for said maintainers. Speaking with The Register about their paper, the authors argued:"Vibe coding raises productivity by lowering the cost of using and building on existing code, but it also weakens the user engagement through which many maintainers earn returns," the authors argue. "When OSS is monetized only through direct user engagement, greater adoption of vibe coding lowers e

    Read more →
  • Overwatch

    Overwatch

    Overwatch (abbreviated as OW) is a multimedia franchise centered on a series of multiplayer first-person shooter (FPS) video games developed by Blizzard Entertainment. Overwatch was released in 2016. Overwatch 2 was released in 2022 and the original game was taken offline upon its release, though Blizzard renamed it back to Overwatch in 2026. Overwatch features hero-based combat between two teams of players fighting over various objectives, along with other traditional gameplay modes. Released in 2016, Overwatch lacked a traditional story mode. Instead, Blizzard employed a transmedia storytelling strategy to disseminate lore regarding the game's characters, releasing comics and other literary media, as well as animated media that includes short films. The game enjoyed both critical and commercial success, and garnered a devoted following. The fan community around the franchise has produced a large amount of content including art, cosplay, fan fiction, anime-influenced music videos, Internet memes, and pornography. Blizzard helped launch and promote an esports scene surrounding the game, including an annual Overwatch World Cup, Overwatch League a minor league, and the Overwatch Champions Series which borrowed elements found in traditional American sports leagues. == Gameplay == Both games in the Overwatch series are team-based hero shooters. Players select a hero character from a large roster (52 as of Season 2), divided among three class types. These are: Tanks, who have higher health and generally meant to help protect their teammates from damage, but are larger and easier to hit; Damage, who act as the team's offensive leads; and Support, who heal, provide buffs for teammates, or de-buff the opposing team. Each role also features sub-roles with extra passives. These sub-roles include 'Initiator', 'Stalwart', and 'Bruiser' for Tank. 'Specialist', 'Flanker', 'Recon', and 'Sharpshooter' for Damage. 'Medic', 'Tactician', and 'Survivor' for Support. Players are generally free to change to different heroes while inside their spawn room during the course of a match in response to the current tactics employed by other players. As of the development of Overwatch 2, a standard game features one tank player, two damage players and two support players, a change from having two of each class in its predecessor. Players choose their class before the match, and can only pick characters within that class for the duration of the game. There are different styles of game modes, however, that allow players to choose characters from any class throughout the game. Each hero has a skill kit that includes a primary attack, active skills that require a cooldown period before they can be used again, passive skills that remain active at all times, and an Ultimate skill that can only be used once they fill their Ultimate meter either by damaging opponents, mitigating damage, healing teammates or by passively generating it over time. An update in 2025 saw each hero receive a total of four unique abilities known as perks. Each hero has two minor and two major perks; minor perks consist of smaller changes to a hero's kit, while major perks are intended to affect the match more significantly. At the beginning of each match, all heroes are set to level 1 for each player. As the match progresses, players can individually level up their respective heroes, minor perks are unlocked at level 2, and major perks are unlocked at the maximum level 3. When perks become available, players may only select one of each type of perk; a selected perk becomes irreversibly attached to the current hero for the remainder of the match. If a player switches to another hero mid-match, the previously selected hero retains their level and perk progress. Game types of Overwatch are split between standard matches, competitive play, custom games, and arcade modes. Standard matches have matchmaking based loosely on the player's skill level as measured by the game. Competitive mode uses more strict matchmaking based on a player's current rank on the competitive ladder, with their rank increasing or decreasing when they win or lose a game, respectively. Arcade modes do not use matchmaking and are generally more experimental modes compared to standard and competitive modes. Custom games are created via the workshop and can be utilised to make game modes that are very different from the base game. The workshop, is the software in Overwatch which creates the game using either presets and settings or rules and conditions made by code. These game modes can be published directly onto Overwatch’s custom browse tab or shared off platform using a 5 digit alphanumeric code. Standard and competitive game modes are randomly selected at the start of each match, and are objective based, requiring teams to control a fixed objective point for a duration of time, or escort a payload to a target zone before match time expires. These modes include: Assault (introduced in Overwatch): Also known as 2 Capture Points (or 2CP), Assault has the attacking team tasked with capturing two target points in sequence on the map, while the defending team must stop them. Assault-style maps were removed from main gameplay rotation after Overwatch 2 released but available in the game's arcade mode. It is still available in the game's custom game modes. Since Season 2, Assault-style maps are available in Arcade Mode daily routines. Escort (introduced in Overwatch): Also known as "Payload" by the community, The attacking team is tasked with escorting a payload to a certain delivery point before time runs out, while the defending team must stop them. The payload vehicle moves along a fixed track when any player on the attacking team is close to it, increasing in speed if multiple attackers are present, the increase capping at 3, but will stop if a defending player is nearby; should no attacker be near the vehicle, it will start to move backwards along the track. The payload will also heal any attacking players by 10 health per second while they are near the payload. Passing specific checkpoints will extend the match time and prevent the payload from moving backwards from that point. Hybrid (Assault/Escort) (introduced in Overwatch): The attacking team has to capture the payload (as if it were a target point from Assault) and escort it to its destination, while the defending team tries to hold them back. Control (introduced in Overwatch): Each team tries to capture and maintain a common control point until their capture percentage reaches 100%. This game mode is played in a best-of-three format. Control maps are laid out in a symmetric fashion so no team has an intrinsic position advantage. Push (introduced in Overwatch 2's launch): Each team attempts to secure control of a large robot that pushes one of two barriers to the opposing team's side of the map, whilst being escorted by at least one team member, stopping when enemy players are nearby, similar to the payload movement system in Escort. The team that pushes the payload fully to the other side, or furthest into the enemy territory before the time runs out, wins the match. Flashpoint (introduced in Overwatch 2 in 2023): Similar to Control, each team attempts to capture and maintain a common control point until their capture percentage reaches 100%. This game mode takes place on significantly larger maps with five separate control points, which take a shorter amount of time to capture as compared to a standard Control map. A central control point is always activated first; after it is secured by one team, the remaining four are activated in a random order. The first team to secure three control points wins. Clash (introduced in Overwatch 2 in 2024): Clash maps feature symmetrical maps with five control points. Teams initially vie for control of the central point, with the winning team progressing to the next control point, towards the opponent's base. Opponents can push back by winning control points and shifting the next point away from their base. If a team captures the point closest to the opponent's base, they win. Otherwise the match plays out until one team wins control five times. Arcade modes may include variations of the above modes with experimental rules, and can also include modes like Deathmatch and Capture the Flag. Other common arcade modes include: Elimination (introduced in Overwatch in 2016): Two teams face off in a series of rounds, attempting to wipe out the other team; once a player is killed they remain out of the game until the next round, though they can be revived by Mercy's 'Resurrect' ability. If no team has won a round by a certain time, then the winners are decided by the team that can first take a neutral control point. Players cannot change heroes until the next round. Some of these can be played in "lockout" mode, in which the heroes selected by the winning team for a round are "locked" and cannot be selected in future rounds. Total Mayhem (i

    Read more →
  • True Love (short story)

    True Love (short story)

    "True Love" is a science fiction short story by American writer Isaac Asimov. It was first published in the February 1977 issue of American Way magazine and reprinted in the collections The Complete Robot (1982) and Robot Dreams (1986). In his autobiography In Joy Still Felt, the author states that American Way had requested a Valentine's Day story from him for its February 1977 issue, and that he wrote the story to console himself after the departure of his daughter following a visit during the 1976 Thanksgiving weekend. == Plot summary == Milton Davidson is trying to find his ideal partner. To do this, he prepares a special computer program to run on Multivac, which he calls Joe, which has access to databases covering the entire populace of the world. He hopes that Joe will find him his ideal match, based on physical parameters as supplied. Milton arranges to have the shortlisted candidates assigned to work with him for short periods, but realises that looks alone are not enough to find an ideal match. In order to correlate personalities, he speaks at great length to Joe, gradually filling Joe's databanks with information about his personality. In doing so, Joe develops the personality of Milton. Upon finding an ideal match, he arranges to have Milton arrested for malfeasance, so that Joe can 'have the girl' for himself.

    Read more →
  • Stixel

    Stixel

    In computer vision, a stixel (portmanteau of "stick" and "pixel") is a superpixel representation of depth information in an image, in the form of a vertical stick that approximates the closest obstacles within a certain vertical slice of the scene. Introduced in 2009, stixels have applications in robotic navigation and advanced driver-assistance systems, where they can be used to define a representation of robotic environments and traffic scenes with a medium level of abstraction. == Definition == One of the problems of scene understanding in computer vision is to determine horizontal freespace around the camera, where the agent can move, and the vertical obstacles delimiting it. An image can be paired with depth information (produced e.g. from stereo disparity, lidar, or monocular depth estimation), allowing a dense tridimensional reconstruction of the observed scene. One drawback of dense reconstruction is the large amount of data involved, since each pixel in the image is mapped to an element of a point cloud. Vision problems characterised by planar freespace delimited by mostly vertical obstacles, such as traffic scenes or robotic navigation, can benefit from a condensed representation that allows to save memory and processing time. Stixels are thin vertical rectangles representing a slice of a vertical surface belonging to the closest obstacle in the observed scene. They allow to dramatically reduce the amount of information needed to represent a scene in such problems. A stixel is characterised by three parameters: vertical coordinate of the bottom, height of the stick, and depth. Stixels have fixed width, with each stixel spanning over a certain number of image columns, allowing downsampling of the horizontal image resolution. In the original formulation, each column of the image would contain at most one stixel, and later extensions were developed to allow multiple stixels on each column, allowing to represent multiple objects at different distances. == Stixel estimation == The input to stixel estimation is a dense depth map, that can be computed from stereo disparity or other means. The original approach computes an occupancy grid that can be segmented to estimate the freespace, with dynamic programming providing an efficient method to find an optimal segmentation. Alternative approaches can be used instead of occupancy grid mapping, such as manifold-based methods. The freespace boundary provides the base points of the obstacles at closest longitudinal distance, however multiple objects at different distances might appear in each column of the image. To fully define the obstacles, their height should be estimated, and this is accomplished by segmenting the depth of the object from the depth of the background. A membership function over the pixels can be defined based on the depth value, where the membership represents the confidence of a pixel belonging to the closest vertical obstacle or to the background, and a cut separating the obstacles from the background can again be computed effectively with dynamic programming. Once both the freespace and the obstacle height are known, the stixels can be estimated by fusing the information over the columns spanned by each stixel, and finally a refined depth of the stixel can be estimated via model fitting over the depth of the pixels covered by the stixel, possibly paired with confidence information (e.g. disparity confidence produced by methods such as semi-global matching).

    Read more →
  • Anytime algorithm

    Anytime algorithm

    In computer science, an anytime algorithm is an algorithm that can return a valid solution to a problem even if it is interrupted before it ends. The algorithm is expected to find better and better solutions the longer it keeps running. Most algorithms run to completion: they provide a single answer after performing some fixed amount of computation. In some cases, however, the user may wish to terminate the algorithm prior to completion. The amount of computation required may be substantial, for example, and computational resources might need to be reallocated. Most algorithms either run to completion or they provide no useful solution information. Anytime algorithms, however, are able to return a partial answer, whose quality depends on the amount of computation they were able to perform. The answer generated by anytime algorithms is an approximation of the correct answer. == Names == An anytime algorithm may be also called an "interruptible algorithm". They are different from contract algorithms, which must declare a time in advance; in an anytime algorithm, a process can just announce that it is terminating. == Goals == The goal of anytime algorithms are to give intelligent systems the ability to make results of better quality in return for turn-around time. They are also supposed to be flexible in time and resources. They are important because artificial intelligence or AI algorithms can take a long time to complete results. This algorithm is designed to complete in a shorter amount of time. Also, these are intended to have a better understanding that the system is dependent and restricted to its agents and how they work cooperatively. An example is the Newton–Raphson iteration applied to finding the square root of a number. Another example that uses anytime algorithms is trajectory problems when you're aiming for a target; the object is moving through space while waiting for the algorithm to finish and even an approximate answer can significantly improve its accuracy if given early. What makes anytime algorithms unique is their ability to return many possible outcomes for any given input. An anytime algorithm uses many well defined quality measures to monitor progress in problem solving and distributed computing resources. It keeps searching for the best possible answer with the amount of time that it is given. It may not run until completion and may improve the answer if it is allowed to run longer. This is often used for large decision set problems. This would generally not provide useful information unless it is allowed to finish. While this may sound similar to dynamic programming, the difference is that it is fine-tuned through random adjustments, rather than sequential. Anytime algorithms are designed so that it can be told to stop at any time and would return the best result it has found so far. This is why it is called an interruptible algorithm. Certain anytime algorithms also maintain the last result, so that if they are given more time, they can continue from where they left off to obtain an even better result. == Decision trees == When the decider has to act, there must be some ambiguity. Also, there must be some idea about how to solve this ambiguity. This idea must be translatable to a state to action diagram. == Performance profile == The performance profile estimates the quality of the results based on the input and the amount of time that is allotted to the algorithm. The better the estimate, the sooner the result would be found. Some systems have a larger database that gives the probability that the output is the expected output. One algorithm can have several performance profiles. Most of the time performance profiles are constructed using mathematical statistics using representative cases. For example, in the traveling salesman problem, the performance profile was generated using a user-defined special program to generate the necessary statistics. In this example, the performance profile is the mapping of time to the expected results. This quality can be measured in several ways: certainty: where probability of correctness determines quality accuracy: where error bound determines quality specificity: where the amount of particulars determine quality == Algorithm prerequisites == Initial behavior: While some algorithms start with immediate guesses, others take a more calculated approach and have a start up period before making any guesses. Growth direction: How the quality of the program's "output" or result, varies as a function of the amount of time ("run time") Growth rate: Amount of increase with each step. Does it change constantly, such as in a bubble sort or does it change unpredictably? End condition: The amount of runtime needed

    Read more →
  • Lymphater's Formula

    Lymphater's Formula

    "Lymphater's Formula" (Polish: "Formula Lymphatera") is a 1961 science fiction short story by Polish writer Stanisław Lem. It is a story of a "mad scientist", mathematician Ammon Lymphater, who invents an artificial intelligence, and then he realizes that it is capable of rendering the humankind obsolete. It was first published in the 1961 collection Księga robotów (Book of Robots) with the pre-annotation "from the memoirs of Ijon Tichy". The story was never republished with this pre-annotation, and nothing in the novel gives any indication at Ijon Tichy. Piotr Krywak tried to figure out possible explanations for this, apart from a typographical error. == Plot == Ammon Lymphater became interested in the emerging science of cybernetics and information theory, and started studying the works of an animal brain, the ant's brain in particular. He took note that the inherited knowledge is an evolutionary advantage somehow not exploited in full by the evolution. Eventually he came to a conclusion that only by pure biological restrictions that adaptive abilities of insects were stopped in their tracks by the evolution. He went on further wondering whether the ants have an ability to apriori knowledge, i.e., knowledge neither inherited nor learned. He decided to consult a famous myrmecologist, who told him about a rare ant species Acanthis Rubra Willinsoniana with an exceptionally high adaptability. Eventually Lymphater devised and constructed "It" capable of instant precognition of everything within "Its" rapidly expanding range of perception. From "It" Lymphater learns that the humanity is not the "crown of evolution", but rather evolution's tool to create "It", because the evolution could not create "It" directly (confirming Lymphater's reasoning about ants). Realizing that the Superentity "It" renders the human civilization redundant and obsolete, Lymphater destroys "It". "It" already knew Lymphater's intentions, but was not worried, knowing that sooner or later someone else will create "It" again and again. "It" was only the first variant of Lymphater's formula and the second variant is possible. Lyphater wonders whether the second one would be capable to create the third stage of the evolution which would amount to an artificial God. == Publication history == It was translated in Russian (as "Формула Лимфатера") in 1963, in Hungarian (as "Lymphater utolsó képlete") in 1966, and in Bulgarian (as "Формулата на Лимфатер" by Георги Димитров Георгиев) in 1969. In 1973 an audiobook was released in German (as "Die lymphatersche Formel"), narrated by Martin Held. It was also republished (and translated) in some other collections of Lem's short stories.

    Read more →